SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
HTSeq-count warning message canhu Bioinformatics 27 02-11-2015 11:02 AM
DESeq estimateDispersions warning messages atkinson Bioinformatics 1 05-15-2013 08:13 AM
Exclude chrM, chrUn* from reference // htseq-count warning on chrM ocs Bioinformatics 10 11-02-2011 10:21 AM
htseq-count with warning for every read to represent all of zero counts in output hibachings2013 RNA Sequencing 10 07-15-2011 10:19 AM
cufflinks warning messages pinki999 Bioinformatics 0 04-15-2011 12:41 AM

Reply
 
Thread Tools
Old 07-01-2013, 11:44 PM   #1
scalefree
Junior Member
 
Location: US

Join Date: Jun 2013
Posts: 3
Default htseq-count warning messages: can they be ignored?

Dear all,

I'm new to RNA-seq analysis. I'm Trying out htseq-count to get the raw counts for downstream analysis by EdgeR or DESeq. I am getting some warning messages from HTSeq and I am not sure if I can ignore it, or what I should do about it.

While running htseq-count:
Code:
python -m HTSeq.scripts.count names_sorted.sam  genes.gtf
I got thousands of warning messages like these:

Warning: Read HWI-ST845:13032020JBACXX:8:1101:6938:66256 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

Warning: Read HWI-ST845:13032020JBACXX:8:1101:7019:18774 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

After looking into those reads that produce warning messages and from other posts on this forum, I think it is because some reads miss their mate.
For example:
Code:
grep "HWI-ST845:130320:D20JBACXX:8:1101:6938:66256" names_sorted.sam
HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr1 24615683 2 50M chr2 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:2 NM:i:0 SM:i:2 XQ:i:40 X2:i:40

HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr2 22588080 2 50M = 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:1 NM:i:0 SM:i:2 XQ:i:40 X2:i:40

HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M = 22588080 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0

HWI-ST845:13032020JBACXX:8:1101:6938:66256 145 chrM 6846 2 50M chr2 32387601 0 GAACCAACCTATGTAAAAGTAAAATAAGAAAGGAAGGAATCGAACCCCCT GHEJIHFIHGHDHIJIIHGJJIGJIJJIJHGIJIJGIHFAHHFDFFFC@@ MD:Z:50 NH:i:3 HI:i:3 NM:i:0 SM:i:2 XQ:i:40 X2:i:40

My guess is that these are what HTSeq complained about. However, is this something I need to worry about? Why do I get those unpaired missing mates?Do I simply disregard those warning messages?

--
Some additional information about the alignment I am looking at:

Code:
samtools flagstat mybam.bam
80760824 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
79363333 + 0 mapped (98.27%:nan%)
80760824 + 0 paired in sequencing
40378231 + 0 read1
40382593 + 0 read2
77654682 + 0 properly paired (96.15%:nan%)
78100006 + 0 with itself and mate mapped
1263327 + 0 singletons (1.56%:nan%)
354942 + 0 with mate mapped to a different chr
66138 + 0 with mate mapped to a different chr (mapQ>=5)

Last few lines of the results from HTseq-count:
a 0
l7Rn6 168
no_feature 15754100
ambiguous 109560
too_low_aQual 0
not_aligned 387345
alignment_not_unique 11975435
scalefree is offline   Reply With Quote
Old 07-08-2013, 01:41 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Those warnings occur for the reason you wrote. It's a bit odd, though, for a mapped mate to not be in the SAM file (you'll run into this frequently if only one read of a pair map and the unmapped mate is omitted from the resulting alignment file, as is done by tophat).
dpryan is offline   Reply With Quote
Old 07-09-2013, 07:59 AM   #3
scalefree
Junior Member
 
Location: US

Join Date: Jun 2013
Posts: 3
Default

Many thanks for the reply dpryan!! I really appreciate it.

You said that it's odd for a mapped mate to not be in the sam file - do you think it is possible that they just don't have a mapped mate? It looks like except for the 2nd and 3rd reads, the rest I posted in the above example were mapped to different chromosome.
On a side note, would you suggest doing some post processing on the alignment file ? (some of the reads that htseq-count gave warning messages were aligned to many places and have low MAPQ)

Thanks again!


Quote:
Originally Posted by dpryan View Post
Those warnings occur for the reason you wrote. It's a bit odd, though, for a mapped mate to not be in the SAM file (you'll run into this frequently if only one read of a pair map and the unmapped mate is omitted from the resulting alignment file, as is done by tophat).
scalefree is offline   Reply With Quote
Old 07-09-2013, 08:50 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

In this case, the aligner is just saving space in the output file. It looks like one read of a pair maps uniquely, but the other does not (and does not seem to map to a known splice form, if you're using an aligner that maps first to the transcriptome). In these cases, the reads should be considered multi-mapping anyway, so they should get ignored. You can ensure that this is correct by making a test SAM file with just the header and and something like:

Code:
HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr1 24615683 2 50M chr2 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:2 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M chr1 24615683 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr2 22588080 2 50M = 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:1 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M = 22588080 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST845:13032020JBACXX:8:1101:6938:66256 145 chrM 6846 2 50M chr2 32387601 0 GAACCAACCTATGTAAAAGTAAAATAAGAAAGGAAGGAATCGAACCCCCT GHEJIHFIHGHDHIJIIHGJJIGJIJJIJHGIJIJGIHFAHHFDFFFC@@ MD:Z:50 NH:i:3 HI:i:3 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M chrM 32387601 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
You'll need to fix the flag field, which I didn't adjust. If you use the resulting file with htseq-count, the correct output would be no counts (assuming that any of those align to a region that you're counting).
dpryan is offline   Reply With Quote
Old 07-09-2013, 09:08 AM   #5
scalefree
Junior Member
 
Location: US

Join Date: Jun 2013
Posts: 3
Default

Thank you dpryan for the suggestion. I"ll use a test sam file to confirm the result.

Quote:
Originally Posted by dpryan View Post
In this case, the aligner is just saving space in the output file. It looks like one read of a pair maps uniquely, but the other does not (and does not seem to map to a known splice form, if you're using an aligner that maps first to the transcriptome). In these cases, the reads should be considered multi-mapping anyway, so they should get ignored. You can ensure that this is correct by making a test SAM file with just the header and and something like:

Code:
HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr1 24615683 2 50M chr2 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:2 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M chr1 24615683 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr2 22588080 2 50M = 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:1 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M = 22588080 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST845:13032020JBACXX:8:1101:6938:66256 145 chrM 6846 2 50M chr2 32387601 0 GAACCAACCTATGTAAAAGTAAAATAAGAAAGGAAGGAATCGAACCCCCT GHEJIHFIHGHDHIJIIHGJJIGJIJJIJHGIJIJGIHFAHHFDFFFC@@ MD:Z:50 NH:i:3 HI:i:3 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M chrM 32387601 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
You'll need to fix the flag field, which I didn't adjust. If you use the resulting file with htseq-count, the correct output would be no counts (assuming that any of those align to a region that you're counting).
scalefree is offline   Reply With Quote
Reply

Tags
htseq, htseq-count, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO