Seqanswers Leaderboard Ad

**dpryan** · 07-08-2013, 01:41 AM

Those warnings occur for the reason you wrote. It's a bit odd, though, for a mapped mate to not be in the SAM file (you'll run into this frequently if only one read of a pair map and the unmapped mate is omitted from the resulting alignment file, as is done by tophat).

**scalefree** · 07-09-2013, 07:59 AM

Many thanks for the reply dpryan!! I really appreciate it.

You said that it's odd for a mapped mate to not be in the sam file - do you think it is possible that they just don't have a mapped mate? It looks like except for the 2nd and 3rd reads, the rest I posted in the above example were mapped to different chromosome.
On a side note, would you suggest doing some post processing on the alignment file ? (some of the reads that htseq-count gave warning messages were aligned to many places and have low MAPQ)

Thanks again!

Originally posted by dpryan View Post

Those warnings occur for the reason you wrote. It's a bit odd, though, for a mapped mate to not be in the SAM file (you'll run into this frequently if only one read of a pair map and the unmapped mate is omitted from the resulting alignment file, as is done by tophat).

**dpryan** · 07-09-2013, 08:50 AM

In this case, the aligner is just saving space in the output file. It looks like one read of a pair maps uniquely, but the other does not (and does not seem to map to a known splice form, if you're using an aligner that maps first to the transcriptome). In these cases, the reads should be considered multi-mapping anyway, so they should get ignored. You can ensure that this is correct by making a test SAM file with just the header and and something like:

Code:

HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr1 24615683 2 50M chr2 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:2 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M chr1 24615683 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr2 22588080 2 50M = 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:1 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M = 22588080 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST845:13032020JBACXX:8:1101:6938:66256 145 chrM 6846 2 50M chr2 32387601 0 GAACCAACCTATGTAAAAGTAAAATAAGAAAGGAAGGAATCGAACCCCCT GHEJIHFIHGHDHIJIIHGJJIGJIJJIJHGIJIJGIHFAHHFDFFFC@@ MD:Z:50 NH:i:3 HI:i:3 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M chrM 32387601 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0

You'll need to fix the flag field, which I didn't adjust. If you use the resulting file with htseq-count, the correct output would be no counts (assuming that any of those align to a region that you're counting).

**scalefree** · 07-09-2013, 09:08 AM

Thank you dpryan for the suggestion. I"ll use a test sam file to confirm the result.

Originally posted by dpryan View Post

In this case, the aligner is just saving space in the output file. It looks like one read of a pair maps uniquely, but the other does not (and does not seem to map to a known splice form, if you're using an aligner that maps first to the transcriptome). In these cases, the reads should be considered multi-mapping anyway, so they should get ignored. You can ensure that this is correct by making a test SAM file with just the header and and something like:

Code:

HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr1 24615683 2 50M chr2 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:2 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M chr1 24615683 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr2 22588080 2 50M = 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:1 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M = 22588080 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST845:13032020JBACXX:8:1101:6938:66256 145 chrM 6846 2 50M chr2 32387601 0 GAACCAACCTATGTAAAAGTAAAATAAGAAAGGAAGGAATCGAACCCCCT GHEJIHFIHGHDHIJIIHGJJIGJIJJIJHGIJIJGIHFAHHFDFFFC@@ MD:Z:50 NH:i:3 HI:i:3 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M chrM 32387601 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0

You'll need to fix the flag field, which I didn't adjust. If you use the resulting file with htseq-count, the correct output would be no counts (assuming that any of those align to a region that you're counting).

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

htseq-count warning messages: can they be ignored?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News