Seqanswers Leaderboard Ad

**kwatts59** · 07-12-2011, 11:14 AM

Try the LINUX command "wc -l <filename>". This will count the number of lines in the file. If the file is in fastq format, you will have to divide the number of lines by 4. This will give you the number of reads.

**ashiq.hussain** · 07-12-2011, 11:28 AM

why we use -l in the command?

**upendra_35** · 07-12-2011, 11:45 AM

its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps

**kmcarr** · 07-12-2011, 11:58 AM

Originally posted by upendra_35 View Post

its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps

This will not work! The '@' character may appear in fastq quality lines as well as the seq-id line. See this thread for a discussion on the problems of using grep to count reads in fastq files.

**ashiq.hussain** · 07-12-2011, 12:17 PM

thank you all.

**upendra_35** · 07-12-2011, 02:52 PM

Sorry I missed "^" with @. It should be grep "^@" -c <filename>. However I agree with kmcarr that '@' is a character that may appear in fastq quality lines as well (sometimes at the start of the line) and so even that may not work. So the best way to do it is in two lines: wc -l <filename> followed by expr <no.of.lines> / 4

**JahnDavik** · 04-16-2013, 01:07 AM

compressed files

Does it work with .gz files, or do I have to unpack them?

**NicoBxl** · 04-16-2013, 01:51 AM

no use the power of piping

gzip -d -c input.fastq.gz | wc -l

**CGarde** · 08-16-2013, 05:10 AM

I don't mean to revive a closed thread, just in case someone needs to know the number of lines in multiple compressed fastq files...

for file in *.bz2; do echo $file; b=$(bzcat $file | wc -l); echo $(($b/4)); done;

**JahnDavik** · 08-16-2013, 06:03 AM

Great. Thanks. Very useful for me.

**anais.barray** · 04-22-2016, 02:31 AM

I didn't intend to revive this thread, but as I was searching for a solution to automatize the fastq read counting process, this solution may come in handy to some :

for i in `find . -name "*.fastq"`; do echo "$i" >> project_nbread.txt; egrep -c "`head -n 1 $i | awk -F '[@:]' '{ print $2 } '`" $i >> project_nbread.txt ; done

In the case of only one file, you can use this :

egrep -c "`head -n 1 file.fastq | awk -F '[@:]' '{ print $2 } '`" file.fastq

This solution will count the number of lines where the id is found in the header of a fastq seq, i.e the number of fastq reads.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

number of read in RNA-seq?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News