Seqanswers Leaderboard Ad

**mastal** · 12-17-2016, 01:29 PM

Code:

 wc -l

gives you the number of lines. Divide that by 4 to get the number of reads.

**wdecoster** · 12-17-2016, 02:45 PM

If the file is not compressed:

Code:

grep -c '^@' yourfile.fastq

If the file is gz compressed:

Code:

zcat yourfile.fastq.gz | grep -c '^@'

If the file is depressed:

Code:

talk to it about feelings and give chocolate

**jdk787** · 12-17-2016, 03:08 PM

You can also run it through FastQC before and after processing. This will give you number of reads and a lot of other useful information.

Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

**mastal** · 12-17-2016, 04:06 PM

The quality scores may sometimes have a value '@', so you may have some of the base quality lines also beginning with '@'.

**wdecoster** · 12-18-2016, 05:55 AM

Originally posted by mastal View Post

The quality scores may sometimes have a value '@', so you may have some of the base quality lines also beginning with '@'.

Right, good catch

**LacquerHead** · 12-18-2016, 03:09 PM

samtools flagstat

**hamcan** · 12-20-2016, 09:37 AM

thanks everyone!
how about to find the number of bases?

**Michael.Ante** · 12-20-2016, 02:07 PM

It should be something like

Code:

awk 'NR%4==2{print}' in.fastq | wc

With the awk command, you print the nucleotides, with wc you count the output's characters.

**Brian Bushnell** · 12-20-2016, 04:20 PM

I like to use BBMap's Reformat:

Code:

reformat.sh in=100x.fq

No output stream specified.  To write to stdout, please specify 'out=stdout.fq' or similar.
Input is being processed as paired
Input:                  	3072634 reads          	463967734 bases
Output:                 	3072634 reads (100.00%) 	463967734 bases (100.00%)

Time:                         	3.317 seconds.
Reads Processed:       3072k 	926.30k reads/sec
Bases Processed:        463m 	139.87m bases/sec

That has the advantage of working on fasta, fastq, or sam; compressed or raw. And various other formats.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 48 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Number of Reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News