Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA behaviour with Mate Pair data + Multi read mapping

    Hi All

    PS: This message was also posted on BWA mailing list but I did not get any response.


    I am not able to logically understand why BWA is/not able to work natively
    with mate pair data. Second question is whats the best work around if I
    want to obtain multiple read mappings (if any) for a read. I am also pasting
    the result of aligning mate pair data both before and after reverse
    complimenting. The mapping gets better after rev comp but I dont quite
    understand why. Appreciate you input on both my questions.


    A. Before Reverse comp : Mate pair data <----- ------>

    356766 + 0 in total (QC-passed reads + QC-failed reads)
    282236 + 0 mapped (79.11%:nan%)
    356766 + 0 paired in sequencing
    178383 + 0 read1
    178383 + 0 read2
    126094 + 0 properly paired (35.34%:nan%)


    B. After Reverse Comp : Mate pair data --------------->
    <----------------------

    356766 + 0 in total (QC-passed reads + QC-failed reads)
    265575 + 0 mapped (74.44%:nan%)
    356766 + 0 paired in sequencing
    178383 + 0 read1
    178383 + 0 read2
    11146 + 0 properly paired (3.12%:nan%)


    It seems the algo takes a major hit during the pairing of reads.

    Thanks!
    -Abhi

  • #2
    What was your experiment design? Standard Illumina mate-pair? Read length? Also how did you reverse complement and what was the bwa command line arguements?

    I've done standard Illumina MP preps, reverse complemented with fastx-toolkit and aligned with standard parameters using bwa before with good success. See below

    133724796 in total
    0 QC failure
    45170770 duplicates
    121800898 mapped (91.08%)
    133724796 paired in sequencing
    66862398 read1
    66862398 read2
    102897270 properly paired (76.95%)
    115119322 with itself and mate mapped
    6681576 singletons (5.00%)
    3387642 with mate mapped to a different chr
    2495203 with mate mapped to a different chr (mapQ>=5)

    Comment


    • #3
      Hi Jon

      Thanks for your reply. The protocol is not standard we are trying to sequence the ends of transcripts using Mate Pair technique.

      I data that I get after linker removal is of variable read length 60+/-20 bp. I reverse compliment the reads based on the basic definition reverse the read and then compliment it and also reverse the quality header.

      One thing that could trick BWA is the variable fragment size as it dependent on the length of transcripts that we are trying to capture.

      As per BWA options I have pretty much used the standard ones. At this point I am not so concerned about the mapping % as I am about the need for reverse complimenting the reads before mapping with BWA and how it handles the multi read mapping.

      Thanks!
      -Abhi

      Comment


      • #4
        I'm assuming you are trying to get the 5' and 3' ends of each RNA species by circularizing the cDNA? Neat idea, definately the weird distribution when mapped to genome will give bwa some problems. You might want to try Tophat instead for the alignment.

        For reads that map at multiple locations bwa will report the other potential sites and will randomly select one unless the mate/pair read dictates the location, but even then it should report the alternative options.

        Comment


        • #5
          Any idea why BWA needs reads to be inner directional (---> <---) for it to map them.

          I guess Tophat will not work as the read lengths are variable and as per my understanding of the version I used they require read 1/2 of equal length. In our case based on the identification of linker the read length will be variable.

          -Abhi

          Comment


          • #6
            Maybe try BWA in single end mode, filter out the reads aligning to multiple locations, then manually pair the reads using perl or something to find the mates/pairs that mark the ends of your RNA species

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-27-2024, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-27-2024, 06:07 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X