Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CIGAR should have zero elements for unmapped read

    Hi Everyone

    I am running MarkDuplicates.jar on my paired end-bwa-mapped BAM file. However I get a weird error message, in fact there are 1000’s of these actually (0.5M so far).

    An example is

    Ignoring SAM validation error: ERROR: Record 68, Read name MachineName:1:2:15520:114537#0, CIGAR should have zero elements for unmapped read.

    Has anyone experienced this before?

    When I pull out the reads from the bam file above I see the read details as follows
    (read1)
    MachineName1:2:15520:114537#0 73 chr10 50281 0 69M31S = 50281 0 CTGTGCAATAACTGTGTACAAAAGCCCCAAAGCTTAAATTGTGCAGTTGAGCGCATGTTCTGTTGTTCAGCATTTATGTTGGTTTATAGTGGAAAAGATT
    ?5<3;2><<62@3A<<<7>@@=B7BCC=BB:,<+:9/)<+0;*'+-'271B@BB2BC@CC=B0B<>BA################################
    XC:i:69 XT:A:R NM:i:3 SM:i:0 AM:i:0 X0:i:2 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:8A24A3T31


    (Read2)
    MachineName:1:2:15520:114537#0 133 chr10 50281 0 88M12S = 50281 0 TTCATTGTTTGGCATAACAGTACTTCAGATTTGAATCATCTAATAACATTGTCATCATAGCATATTCTCCTGGAAGTAACACACAATAACTACTTCAAAA
    E/EBEDDFDFEFFF=?@CBA.=>.<.9.::EE=EE33?<7D>BCB-<5<>.:37:@<B/<.8986<:9;->A@A@BADBB@BB@AEE############# XC:i:88


    Oddly enough they map to the same position……Although the sequences are completely different. I BLAT’ed the sequences and found for read one and two respectively

    SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
    88 1 100 100 94.0% 10 + 50281 50380 100
    86 1 100 100 93.0% 10 - 50520 50619 100


    So sequence two really maps to a different position on chromosome 10 at a distance that’s roughly the expected insert size….

    Could it be because the %Identity is low? that BWA mapped the pair incorrectly?
    Last edited by doc2r; 01-18-2011, 05:53 PM.

  • #2
    Read two didn't map at all. bwa gives it the mapping coordinates of the read that did map. It's so that they will sort together by coordinate. You can tell from the flag: 133 = 128+4+1 = second read of pair + unmapped + paired read.

    The flag for read one is 73 = 64+8+1 = first read of a pair + mate didn't map + paired read

    I'm not quite sure what Picard does with that in markduplicates. If you use stringency Lenient, it won't yell at you about the CIGAR in unmapped read, but I don't know if it will mark duplicates properly.

    At the end, you can filter your .bam for read that don't have a 4 in the binary flag, that'll get rid of all the unmapped reads.

    Comment


    • #3
      Thanks swbarnes2,
      Appreciate the response and the lesson.
      This was most helpful

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X