Seqanswers Leaderboard Ad

**aggp11** · 07-09-2012, 10:24 AM

Hi,

You could try the FASTQC package if you haven't already. It can take fastq/bam/sam files and gives most of the important statistics for a NGS run.

**husamia** · 07-09-2012, 10:32 AM

I suggest using native linux tools such as grep, sed, awk in multithreaded environment also 64 bit may be useful in some applications where it is supported. There is option of using CUDA with GPU to do super fast calculations.

**JackieBadger** · 07-09-2012, 11:17 AM

PRINSEQ and FASTQC

**Richard Finney** · 07-09-2012, 11:36 AM

If you're up for moding a couple of lines of code for your needs
this should do the trick ...

Code:

#include <stdio.h>
#include <string.h>
#include <ctype.h>
unsigned long int sum[5];
unsigned long int basecount;
unsigned long int readcount = 0;
char s[512];
int main()
{
    register int i,j;
    char ch;
    basecount = 0;
    memset(sum,0,sizeof(sum));
    while (gets(s))
    {
        if (s[0] == '>') continue; // skip fasta entry header
        readcount++;
        for (i=0;i<s[i];i++)
        {
            ch = toupper(s[i]);
            if (ch == 'A') { sum[0]++; basecount++; }
            else if (ch == 'C') { sum[1]++; basecount++; }
            else if (ch == 'G') { sum[2]++; basecount++; }
            else if (ch == 'T') { sum[3]++; basecount++; }
            else if (ch == 'N') { sum[4]++; basecount++; }
        }
        memset(s,0,sizeof(s));
    }
    for (j=0;j<5;j++)
    {
        if (j == 0) printf("A ");
        else if (j == 1) printf("C ");
        else if (j == 2) printf("G ");
        else if (j == 3) printf("T ");
        else if (j == 4) printf("N ");
        printf("%ld ",sum[j]);
        printf("\n");
    }
    printf("bases = %ld \n",basecount);
    printf("reads = %ld \n",readcount);
    return 0;
}

**maubp** · 07-09-2012, 12:20 PM

If you don't want error checking Heng Li has a very fast FASTA/FASTQ parser in C which could easily be used for the basic information you requested (read count and total bases):

FASTA/FASTQ Parser in C

http://lh3lh3.users.sourceforge.net/parsefastq.shtml

Topics	Statistics	Last Post
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, Yesterday, 07:17 AM	0 responses 11 views 0 likes	Last Post by seqadmin Yesterday, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM

Seqanswers Leaderboard Ad

Announcement

fastest way to 'parse' fasta or fastq?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News