Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • apratap
    Member
    • Jan 2009
    • 58

    BWA behaviour with Mate Pair data + Multi read mapping

    Hi All

    PS: This message was also posted on BWA mailing list but I did not get any response.


    I am not able to logically understand why BWA is/not able to work natively
    with mate pair data. Second question is whats the best work around if I
    want to obtain multiple read mappings (if any) for a read. I am also pasting
    the result of aligning mate pair data both before and after reverse
    complimenting. The mapping gets better after rev comp but I dont quite
    understand why. Appreciate you input on both my questions.


    A. Before Reverse comp : Mate pair data <----- ------>

    356766 + 0 in total (QC-passed reads + QC-failed reads)
    282236 + 0 mapped (79.11%:nan%)
    356766 + 0 paired in sequencing
    178383 + 0 read1
    178383 + 0 read2
    126094 + 0 properly paired (35.34%:nan%)


    B. After Reverse Comp : Mate pair data --------------->
    <----------------------

    356766 + 0 in total (QC-passed reads + QC-failed reads)
    265575 + 0 mapped (74.44%:nan%)
    356766 + 0 paired in sequencing
    178383 + 0 read1
    178383 + 0 read2
    11146 + 0 properly paired (3.12%:nan%)


    It seems the algo takes a major hit during the pairing of reads.

    Thanks!
    -Abhi
  • Jon_Keats
    Senior Member
    • Mar 2010
    • 279

    #2
    What was your experiment design? Standard Illumina mate-pair? Read length? Also how did you reverse complement and what was the bwa command line arguements?

    I've done standard Illumina MP preps, reverse complemented with fastx-toolkit and aligned with standard parameters using bwa before with good success. See below

    133724796 in total
    0 QC failure
    45170770 duplicates
    121800898 mapped (91.08%)
    133724796 paired in sequencing
    66862398 read1
    66862398 read2
    102897270 properly paired (76.95%)
    115119322 with itself and mate mapped
    6681576 singletons (5.00%)
    3387642 with mate mapped to a different chr
    2495203 with mate mapped to a different chr (mapQ>=5)

    Comment

    • apratap
      Member
      • Jan 2009
      • 58

      #3
      Hi Jon

      Thanks for your reply. The protocol is not standard we are trying to sequence the ends of transcripts using Mate Pair technique.

      I data that I get after linker removal is of variable read length 60+/-20 bp. I reverse compliment the reads based on the basic definition reverse the read and then compliment it and also reverse the quality header.

      One thing that could trick BWA is the variable fragment size as it dependent on the length of transcripts that we are trying to capture.

      As per BWA options I have pretty much used the standard ones. At this point I am not so concerned about the mapping % as I am about the need for reverse complimenting the reads before mapping with BWA and how it handles the multi read mapping.

      Thanks!
      -Abhi

      Comment

      • Jon_Keats
        Senior Member
        • Mar 2010
        • 279

        #4
        I'm assuming you are trying to get the 5' and 3' ends of each RNA species by circularizing the cDNA? Neat idea, definately the weird distribution when mapped to genome will give bwa some problems. You might want to try Tophat instead for the alignment.

        For reads that map at multiple locations bwa will report the other potential sites and will randomly select one unless the mate/pair read dictates the location, but even then it should report the alternative options.

        Comment

        • apratap
          Member
          • Jan 2009
          • 58

          #5
          Any idea why BWA needs reads to be inner directional (---> <---) for it to map them.

          I guess Tophat will not work as the read lengths are variable and as per my understanding of the version I used they require read 1/2 of equal length. In our case based on the identification of linker the read length will be variable.

          -Abhi

          Comment

          • Jon_Keats
            Senior Member
            • Mar 2010
            • 279

            #6
            Maybe try BWA in single end mode, filter out the reads aligning to multiple locations, then manually pair the reads using perl or something to find the mates/pairs that mark the ends of your RNA species

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM
            • SEQadmin2
              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
              by SEQadmin2


              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


              Introduction

              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
              05-22-2026, 06:42 AM
            • SEQadmin2
              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
              by SEQadmin2

              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
              05-06-2026, 09:04 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            21 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 11:40 AM
            0 responses
            14 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 05-28-2026, 11:40 AM
            0 responses
            29 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 05-26-2026, 10:12 AM
            0 responses
            31 views
            0 reactions
            Last Post SEQadmin2  
            Working...