Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa: Read unmapped in proper pair

    Hello,

    I came across a bam file with one read having flag 103, which means:
    read paired
    read mapped in proper pair
    read unmapped
    mate reverse strand
    first in pair

    I think there is something weird as a read cannot be "mapped in proper pair" and "read unmapped" at the same time. It wasn't me doing the alignment, but as far as I know it was done with bwa 0.6.1 against hg19. Everything looks fine in the bam file (sort & indexed ok) apart from this read.

    This is the incriminated read pair:

    Code:
    1:488660:43	103	chr9_gl000201_random	36144	60	36M	chrM	267	0	CGATGGATCACAGGTCTATCACCCTATTAACCACTC	1:BDEFDHHHHABGH@>DHEEHJJJFHIFBFI>DD9	XT:A:U	NM:i:2	SM:i:25	AM:i:25	X0:i:1	X1:i:0	XM:i:2	XO:i:0	XG:i:0	MD:Z:0G3C31
    1:488660:43	147	chrM	267	60	36M	chr9_gl000201_random	36144	0	TCCACACAGACATCATAACAAAAAATTTCCACCAAA	HCD:<BDFGGE:?AAGFFIGHF>BFCA<A=2?B?@?	XT:A:U	NM:i:0	SM:i:37	AM:i:25	X0:i:1	X1:i:0	XM:i:0	XO:i:0	XG:i:0	MD:Z:36
    I think the problem comes from chr9_gl000201_random being 36148bp long and the read being aligned to the end of it (36144)?

    Am I missing something or is this a bug in bwa? Any thought appreciated!

    Dario

  • #2
    I have seen these strange results from BWA mappings as well, so you are not alone!

    It is my understanding that this behaviour is a result of something that happens during the reference indexing step. When indexing a reference file containing multiple sequences (e.g. >1 chromosome), bwa index concatenates the sequences together and indexes this conjoined sequence. Unless I am mistaken (definitely possible!), the inconsistent SAM flags that you get out are when a read pair happened to be mapped in a proper pair on the concatenated reference sequence, but actually - after the reference sequences are split up again - this pairing is split across two references.

    It should be clear that this phenomenon is expected to occur only very rarely, so I assume that your single offending read pair is part of a much larger dataset?

    Comment


    • #3
      In fact, looking more closely at your SAM, I see that your reads are mapped very close to the end of one chromosome and very close to the start of another.

      Comment


      • #4
        Originally posted by TobyH View Post
        In fact, looking more closely at your SAM, I see that your reads are mapped very close to the end of one chromosome and very close to the start of another.
        Thanks very much! Makes sense... Indeed this is the only read I found with flag 103 out of a few millions in this file.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X