![]() |
number of read in RNA-seq?
another biologist here.
I got RNA sequence (in .txt format) back from sequencing facility. Each file of 2 lanes high throughput sequencing is nearly 25GB. I want to know the number of read in each file. Which programmer or code I should use. thanks you. |
Try the LINUX command "wc -l <filename>". This will count the number of lines in the file. If the file is in fastq format, you will have to divide the number of lines by 4. This will give you the number of reads.
|
why we use -l in the command?
|
its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps
|
Quote:
|
thank you all.
|
Sorry I missed "^" with @. It should be grep "^@" -c <filename>. However I agree with kmcarr that '@' is a character that may appear in fastq quality lines as well (sometimes at the start of the line) and so even that may not work. So the best way to do it is in two lines: wc -l <filename> followed by expr <no.of.lines> / 4
|
compressed files
Does it work with .gz files, or do I have to unpack them?
|
no use the power of piping ;)
gzip -d -c input.fastq.gz | wc -l |
I don't mean to revive a closed thread, just in case someone needs to know the number of lines in multiple compressed fastq files...
for file in *.bz2; do echo $file; b=$(bzcat $file | wc -l); echo $(($b/4)); done; |
Great. Thanks. Very useful for me.
|
I didn't intend to revive this thread, but as I was searching for a solution to automatize the fastq read counting process, this solution may come in handy to some :
Quote:
Quote:
|
All times are GMT -8. The time now is 08:49 AM. |
Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.