SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How DESeq2 Handles counts TheSeqGeek General 4 04-24-2014 02:32 PM
Preparing text files with counts for DESeq2 KHubbard Bioinformatics 2 10-12-2013 02:42 AM
Not report promiscuous reads in bowtie floydian_slip Bioinformatics 2 07-02-2012 08:47 AM
why does Cufflink does not report strand information for unspliced reads Chirag RNA Sequencing 1 03-01-2012 11:10 PM
Novoalign native report - Paired reads sandhya Bioinformatics 0 09-15-2010 07:33 AM

Reply
 
Thread Tools
Old 11-19-2014, 11:33 AM   #1
Starr_Hazard
Member
 
Location: South Carolina

Join Date: Nov 2010
Posts: 19
Default DESEq2 seems to under report counts for SE reads

Here are the Align summary data from TopHat for single-end reads. These SE Reads are NOT stranded.

cat /PATH/align_summary.txt | grep Mapped
Mapped : 66096436 (97.2% of input)
Mapped : 59491205 (97.4% of input)
Mapped : 58752388 (97.4% of input)
Mapped : 61205947 (97.4% of input)

So an average number of reads of about ~60 million

The default options for htseq-count are ( i.e. Like I do for paired-end reads): htseq-count -f bam -r name accepted_hits_sorted_QN.bam (name sorted bam)

21237878 22854704 23350366 21377177 (sums for four conditions -2 control 2 experimental)


So ~21 million. Roughly one third of the reads are reported.


Just running DESeq2 on the UN-sorted bam (i.e. The original *bam file from TopHat) with default value for s (default is yes assumed stranded).
Gives
23794422

Unsorted bam with s = no " htseq-count -f bam -r name s no accepted_hits.bam" (original bam from TopHat)
47138205
Name Sorted bam with s =no "htseq-count -f bam -r name s no accepted_hits_sorted_QN.bam (name sorted bam)

44074576
The closest I can get to number of reads reported by TopHat is 47/60 million.

Can someone explain why I might be seeing the low number of counts relative to the number of reported reads.

The issue does NOT appear for paired-end reads. For PE reads the counts reported TopHat matches closely the sum of counts reported by htseq-counts.
Starr_Hazard is offline   Reply With Quote
Old 11-19-2014, 12:44 PM   #2
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

hi,

Just to get more accurate attention, note that the title of the post might better mention htseq-count, which is the software in question and not DESeq2 (though DESeq2 does have an import function for htseq-count files, these are separate software packages).

I think the -r name option is only for use with paired-end files. Otherwise, I would guess that the file should be pos sorted.

Other questions I would have are to check the quality scores for these BAMs, as there is a quality score filter in htseq-count.

Also what is the organism and what is the GTF file you are using?
Michael Love is offline   Reply With Quote
Old 11-19-2014, 01:33 PM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

For what it's worth, it's MUCH faster to just use featureCounts rather than htseq-count.
dpryan is offline   Reply With Quote
Old 11-19-2014, 02:20 PM   #4
Starr_Hazard
Member
 
Location: South Carolina

Join Date: Nov 2010
Posts: 19
Default

Good point about the subject line
Mm_UCSC_Mm10_genome.gtf was used for TopHat/Bowtie2 to make the bam and by htseq-count to count

I believe default TopHat/Bowtie2 bams are coordinate/position sorted already so I am asking htseq-count to examine the original bam with no r option and s also set to no

Thanks for the tip.
Among other careless things, my tests did not include the r option as indicated in the original post.
The best count is still 47 million with "htseq-count -f bam -s no " and the unaltered TopHat bam

I'll check out featureCounts
Starr_Hazard is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:23 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO