SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Number of reads anj_1 Illumina/Solexa 3 11-06-2014 02:58 AM
Number of reads per isotig fungalE 454 Pyrosequencing 4 08-09-2012 05:40 AM
Number of Reads Rfriedman Bioinformatics 0 01-11-2012 01:43 PM
reduce the number of reads shangool Bioinformatics 1 10-11-2010 11:25 PM
number of reads m_elena_bioinfo Bioinformatics 2 07-20-2010 08:47 AM

Reply
 
Thread Tools
Old 12-17-2016, 11:26 AM   #1
hamcan
Member
 
Location: Toronto

Join Date: Nov 2016
Posts: 19
Default Number of Reads

Hi there,
I have Illumina HiSeq fastq output files. I want to know how I can find out the number of reads per sample before and after processing using unix.
Any commands that you might know of?
I'd greatly appreciate it
Thanks!
hamcan is offline   Reply With Quote
Old 12-17-2016, 12:29 PM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 659
Default

Code:
 wc -l
gives you the number of lines. Divide that by 4 to get the number of reads.
mastal is offline   Reply With Quote
Old 12-17-2016, 01:45 PM   #3
wdecoster
Member
 
Location: Antwerp, Belgium

Join Date: Oct 2015
Posts: 95
Default

If the file is not compressed:
Code:
grep -c '^@' yourfile.fastq
If the file is gz compressed:
Code:
zcat yourfile.fastq.gz | grep -c '^@'
If the file is depressed:
Code:
talk to it about feelings and give chocolate
wdecoster is offline   Reply With Quote
Old 12-17-2016, 02:08 PM   #4
jdk787
josh kinman
 
Location: Austin

Join Date: Apr 2014
Posts: 57
Default

You can also run it through FastQC before and after processing. This will give you number of reads and a lot of other useful information.

http://www.bioinformatics.babraham.a...ojects/fastqc/
__________________
Josh Kinman
jdk787 is offline   Reply With Quote
Old 12-17-2016, 03:06 PM   #5
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 659
Default

The quality scores may sometimes have a value '@', so you may have some of the base quality lines also beginning with '@'.
mastal is offline   Reply With Quote
Old 12-18-2016, 04:55 AM   #6
wdecoster
Member
 
Location: Antwerp, Belgium

Join Date: Oct 2015
Posts: 95
Default

Quote:
Originally Posted by mastal View Post
The quality scores may sometimes have a value '@', so you may have some of the base quality lines also beginning with '@'.
Right, good catch
wdecoster is offline   Reply With Quote
Old 12-18-2016, 02:09 PM   #7
LacquerHead
Member
 
Location: New York

Join Date: Nov 2015
Posts: 31
Default

samtools flagstat
LacquerHead is offline   Reply With Quote
Old 12-20-2016, 08:37 AM   #8
hamcan
Member
 
Location: Toronto

Join Date: Nov 2016
Posts: 19
Default

thanks everyone!
how about to find the number of bases?
hamcan is offline   Reply With Quote
Old 12-20-2016, 01:07 PM   #9
Michael.Ante
Senior Member
 
Location: Vienna

Join Date: Oct 2011
Posts: 120
Default

It should be something like
Code:
awk 'NR%4==2{print}' in.fastq | wc
With the awk command, you print the nucleotides, with wc you count the output's characters.
Michael.Ante is offline   Reply With Quote
Old 12-20-2016, 03:20 PM   #10
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

I like to use BBMap's Reformat:
Code:
reformat.sh in=100x.fq

No output stream specified.  To write to stdout, please specify 'out=stdout.fq' or similar.
Input is being processed as paired
Input:                  	3072634 reads          	463967734 bases
Output:                 	3072634 reads (100.00%) 	463967734 bases (100.00%)

Time:                         	3.317 seconds.
Reads Processed:       3072k 	926.30k reads/sec
Bases Processed:        463m 	139.87m bases/sec
That has the advantage of working on fasta, fastq, or sam; compressed or raw. And various other formats.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
fastq, hiseq, illumina, read

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO