SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   RNA Sequencing (http://seqanswers.com/forums/forumdisplay.php?f=26)
-   -   number of read in RNA-seq? (http://seqanswers.com/forums/showthread.php?t=12672)

ashiq.hussain 07-12-2011 11:57 AM

number of read in RNA-seq?
 
another biologist here.

I got RNA sequence (in .txt format) back from sequencing facility. Each file of 2 lanes high throughput sequencing is nearly 25GB.

I want to know the number of read in each file. Which programmer or code I should use.

thanks you.

kwatts59 07-12-2011 12:14 PM

Try the LINUX command "wc -l <filename>". This will count the number of lines in the file. If the file is in fastq format, you will have to divide the number of lines by 4. This will give you the number of reads.

ashiq.hussain 07-12-2011 12:28 PM

why we use -l in the command?

upendra_35 07-12-2011 12:45 PM

its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps

kmcarr 07-12-2011 12:58 PM

Quote:

Originally Posted by upendra_35 (Post 46270)
its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps

This will not work! The '@' character may appear in fastq quality lines as well as the seq-id line. See this thread for a discussion on the problems of using grep to count reads in fastq files.

ashiq.hussain 07-12-2011 01:17 PM

thank you all.

upendra_35 07-12-2011 03:52 PM

Sorry I missed "^" with @. It should be grep "^@" -c <filename>. However I agree with kmcarr that '@' is a character that may appear in fastq quality lines as well (sometimes at the start of the line) and so even that may not work. So the best way to do it is in two lines: wc -l <filename> followed by expr <no.of.lines> / 4

JahnDavik 04-16-2013 02:07 AM

compressed files
 
Does it work with .gz files, or do I have to unpack them?

NicoBxl 04-16-2013 02:51 AM

no use the power of piping ;)

gzip -d -c input.fastq.gz | wc -l

CGarde 08-16-2013 06:10 AM

I don't mean to revive a closed thread, just in case someone needs to know the number of lines in multiple compressed fastq files...

for file in *.bz2; do echo $file; b=$(bzcat $file | wc -l); echo $(($b/4)); done;

JahnDavik 08-16-2013 07:03 AM

Great. Thanks. Very useful for me.

anais.barray 04-22-2016 03:31 AM

I didn't intend to revive this thread, but as I was searching for a solution to automatize the fastq read counting process, this solution may come in handy to some :

Quote:

for i in `find . -name "*.fastq"`; do echo "$i" >> project_nbread.txt; egrep -c "`head -n 1 $i | awk -F '[@:]' '{ print $2 } '`" $i >> project_nbread.txt ; done
In the case of only one file, you can use this :
Quote:

egrep -c "`head -n 1 file.fastq | awk -F '[@:]' '{ print $2 } '`" file.fastq
This solution will count the number of lines where the id is found in the header of a fastq seq, i.e the number of fastq reads.


All times are GMT -8. The time now is 08:49 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.