SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Find unmapped read from sam/bam file genelab Bioinformatics 9 03-18-2014 01:35 PM
MAPQ must should be 0 for unmapped read KevinLam Bioinformatics 24 09-23-2013 06:05 AM
GATK "MAPQ should be 0 for unmapped read" complaint for mapped read efoss Bioinformatics 12 10-18-2012 01:19 AM
mate unmapped and read unmapped rururara Bioinformatics 1 02-25-2011 01:31 AM
Tophat :: CIGAR for second read in pair polarise Bioinformatics 0 02-16-2011 03:33 AM

Reply
 
Thread Tools
Old 01-18-2011, 03:50 PM   #1
doc2r
Junior Member
 
Location: Huntington

Join Date: Aug 2010
Posts: 9
Default CIGAR should have zero elements for unmapped read

Hi Everyone

I am running MarkDuplicates.jar on my paired end-bwa-mapped BAM file. However I get a weird error message, in fact there are 1000’s of these actually (0.5M so far).

An example is

Ignoring SAM validation error: ERROR: Record 68, Read name MachineName:1:2:15520:114537#0, CIGAR should have zero elements for unmapped read.

Has anyone experienced this before?

When I pull out the reads from the bam file above I see the read details as follows
(read1)
MachineName1:2:15520:114537#0 73 chr10 50281 0 69M31S = 50281 0 CTGTGCAATAACTGTGTACAAAAGCCCCAAAGCTTAAATTGTGCAGTTGAGCGCATGTTCTGTTGTTCAGCATTTATGTTGGTTTATAGTGGAAAAGATT
?5<3;2><<62@3A<<<7>@@=B7BCC=BB:,<+:9/)<+0;*'+-'271B@BB2BC@CC=B0B<>BA################################
XC:i:69 XT:A:R NM:i:3 SM:i:0 AM:i:0 X0:i:2 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:8A24A3T31


(Read2)
MachineName:1:2:15520:114537#0 133 chr10 50281 0 88M12S = 50281 0 TTCATTGTTTGGCATAACAGTACTTCAGATTTGAATCATCTAATAACATTGTCATCATAGCATATTCTCCTGGAAGTAACACACAATAACTACTTCAAAA
E/EBEDDFDFEFFF=?@CBA.=>.<.9.::EE=EE33?<7D>BCB-<5<>.:37:@<B/<.8986<:9;->A@A@BADBB@BB@AEE############# XC:i:88


Oddly enough they map to the same position……Although the sequences are completely different. I BLAT’ed the sequences and found for read one and two respectively

SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
88 1 100 100 94.0% 10 + 50281 50380 100
86 1 100 100 93.0% 10 - 50520 50619 100


So sequence two really maps to a different position on chromosome 10 at a distance that’s roughly the expected insert size….

Could it be because the %Identity is low? that BWA mapped the pair incorrectly?

Last edited by doc2r; 01-18-2011 at 04:53 PM.
doc2r is offline   Reply With Quote
Old 01-19-2011, 08:41 AM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Read two didn't map at all. bwa gives it the mapping coordinates of the read that did map. It's so that they will sort together by coordinate. You can tell from the flag: 133 = 128+4+1 = second read of pair + unmapped + paired read.

The flag for read one is 73 = 64+8+1 = first read of a pair + mate didn't map + paired read

I'm not quite sure what Picard does with that in markduplicates. If you use stringency Lenient, it won't yell at you about the CIGAR in unmapped read, but I don't know if it will mark duplicates properly.

At the end, you can filter your .bam for read that don't have a 4 in the binary flag, that'll get rid of all the unmapped reads.
swbarnes2 is offline   Reply With Quote
Old 01-19-2011, 08:44 AM   #3
doc2r
Junior Member
 
Location: Huntington

Join Date: Aug 2010
Posts: 9
Default

Thanks swbarnes2,
Appreciate the response and the lesson.
This was most helpful
doc2r is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:02 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO