Seqanswers Leaderboard Ad

**aggp11** · 07-09-2012, 10:24 AM

Hi,

You could try the FASTQC package if you haven't already. It can take fastq/bam/sam files and gives most of the important statistics for a NGS run.

**husamia** · 07-09-2012, 10:32 AM

I suggest using native linux tools such as grep, sed, awk in multithreaded environment also 64 bit may be useful in some applications where it is supported. There is option of using CUDA with GPU to do super fast calculations.

**JackieBadger** · 07-09-2012, 11:17 AM

PRINSEQ and FASTQC

**Richard Finney** · 07-09-2012, 11:36 AM

If you're up for moding a couple of lines of code for your needs
this should do the trick ...

Code:

#include <stdio.h>
#include <string.h>
#include <ctype.h>
unsigned long int sum[5];
unsigned long int basecount;
unsigned long int readcount = 0;
char s[512];
int main()
{
    register int i,j;
    char ch;
    basecount = 0;
    memset(sum,0,sizeof(sum));
    while (gets(s))
    {
        if (s[0] == '>') continue; // skip fasta entry header
        readcount++;
        for (i=0;i<s[i];i++)
        {
            ch = toupper(s[i]);
            if (ch == 'A') { sum[0]++; basecount++; }
            else if (ch == 'C') { sum[1]++; basecount++; }
            else if (ch == 'G') { sum[2]++; basecount++; }
            else if (ch == 'T') { sum[3]++; basecount++; }
            else if (ch == 'N') { sum[4]++; basecount++; }
        }
        memset(s,0,sizeof(s));
    }
    for (j=0;j<5;j++)
    {
        if (j == 0) printf("A ");
        else if (j == 1) printf("C ");
        else if (j == 2) printf("G ");
        else if (j == 3) printf("T ");
        else if (j == 4) printf("N ");
        printf("%ld ",sum[j]);
        printf("\n");
    }
    printf("bases = %ld \n",basecount);
    printf("reads = %ld \n",readcount);
    return 0;
}

**maubp** · 07-09-2012, 12:20 PM

If you don't want error checking Heng Li has a very fast FASTA/FASTQ parser in C which could easily be used for the basic information you requested (read count and total bases):

FASTA/FASTQ Parser in C

http://lh3lh3.users.sourceforge.net/parsefastq.shtml

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

fastest way to 'parse' fasta or fastq?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News