Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing output from Bowtie and BWA

    Hey all,

    I am mapping Solexa reads against a reference genome using Bowtie and BWA.

    With Bowtie I am allowing for 2 mismatches and obtained 1.5M hits.

    Using BWA (aln/samse) with default settings resulted in 1.4M hits.

    I was expecting BWA to produce more hits than Bowtie since BWA allows indels as well as mismatches. Now, of cause that comes down to the parameters used, and unfortunately I fail to see how BWA treat mismatches. I am also unsure of how BWA reports mismatches - the CIGAR strings only indicate full matches or matches with indels.

    So, what is the deal with BWA and mismatches?



    Martin

  • #2
    Note that bowtie allows 2 mismatches in the seed. It tolerates more mismatches after 28bp. On low-qual single-end reads, bowtie is probably more sensitive.

    Differences between bowtie and bwa on Illumina data (as I understand):

    Bowtie's advantage:

    1) Bowtie is aware of base quality but bwa do not. (However, when base quality is rubbish, using base quality may cause problems)

    2) Bowtie is probably more tolerant with low-qual bases at the tail and thus more sensitive given such input. (BWA has low-qual base trimming, but this is not the best solution)

    3) The default mode of bowtie is much faster (although arguably less accurate).

    4) Bowtie uses less memory.

    5) Bowtie is fully multi-thread. (In bwa, only "aln" supports multithreading)

    6) Bowtie is more convenient in some way. (e.g. single-end alignment done by a single command line)

    BWA's advantage:

    1) BWA counts occurrences and gives mapping quality. (BWA is slower largely due to this)

    2) BWA does gapped alignment. (This is another key reason why it is slower)

    3) BWA (arguably) has higher specificity, at least than bowtie's default mode.

    4) BWA possibly have comparable sensitivity given PE data because it rescues the unmapped end with smith-waterman algorithm.

    5) BWA is more convenient in some other way (e.g. inferring insert size distribution; mapping singletons; reading compressed fasta/fastq input).
    Last edited by lh3; 11-04-2009, 06:49 AM.

    Comment


    • #3
      Thanks, I understand the differences and that the two tools both are excellent, but not directly comparable.Howerver, where are the BWA mismatches?



      Martin

      Comment


      • #4
        Originally posted by maasha View Post
        Howerver, where are the BWA mismatches?
        The NM tag.

        Comment


        • #5
          Thanks a lot. Me bad. It never occurred to me that such a vital piece of information would be in the optional fields.


          M

          Comment


          • #6
            XM Number of mismatches in the alignment

            Comment


            • #7
              Which version of bowtie? The recent one allows for gaps. I haven't had an opportunity to compare that one yet.

              Comment


              • #8
                Regarding BWA, search for XM and MD in the SAM format output if they exist.

                SRR065026.49351 147 chr12 113379898 17 75M = 113379584 -389 TCAATCTGAATTCTGATGTCTTTGGGGCTGACAATTTTAACAACCACTTAAGTCTCACCCTGCACCTTTTATCTG ;6989>(8;;>9?@4;;8(:@@==7?9A=97=@8@>=0<@BBBBB4C@?BA@AB@<3>@CB=5-:B@ABBCBCBB XT:A:M NM:i:2 SM:i:17 AM:i:17 XM:i:2 XO:i:0 XG:i:0 MD:Z:6G11A56
                Look at MD:Z.

                6G11A56 means that 6 match, G is a mismatch where G is from the index, 11 match, 1 mismatch where A is from the index, 56 match.

                6+1+11+1+56 = 75, where the SAM flag is written 75M, so we can see it matches.

                Comment


                • #9
                  @rskr Haven't you noticed that this is a two-year-old thread? If you like to see the latest comparison:



                  Novoalign/smalt/bwa are still better when alignment accuracy is required.

                  @alexbmp Please ignore XM. It is mainly for the debugging purposes. Sometimes it gives a wrong number.

                  Comment


                  • #10
                    Originally posted by lh3 View Post
                    @rskr Haven't you noticed that this is a two-year-old thread? If you like to see the latest comparison:



                    Novoalign/smalt/bwa are still better when alignment accuracy is required.

                    @alexbmp Please ignore XM. It is mainly for the debugging purposes. Sometimes it gives a wrong number.
                    @lh3 Thank you for the reply. I'm surprised to hear(read) that XM can be ignored.

                    This is because I found some paired-end alignments BWA showed me turned out to be different compared to the BLAT results of the same reads.
                    The aligned positions of BWA and BLAT didn't match.

                    Interestingly, the "seemingly incorrect" read (with more XM:i: written mismatches than I first intended) almost always didn't match with its BLAT results,
                    while the paired read almost always matched.

                    That's why I still cannot understand the reason to ignore XM:i:#.

                    If it does not bother you, could you please elaborate your reply?
                    It would be really helpful if you give me some examples (e.g. regarding comparison with BLAT results).

                    Comment


                    • #11
                      blat does local alignment and does not use paired-end information. In (semi)repetitive regions, blat and bwa can barely agree.

                      Comment


                      • #12
                        Originally posted by lh3 View Post
                        blat does local alignment and does not use paired-end information. In (semi)repetitive regions, blat and bwa can barely agree.
                        Wow! I searched some of those "strange" reads that I mentioned: with more XM:i:# than I intended. The reads I put in are all repeat sequences.

                        This is so cool. Thanks a lot!! I'll play with bwa some more and reply if I find something interesting

                        Comment


                        • #13
                          Does any one know of an any more recent comparison. I am working with ancient (damaged) DNA in color space and I always get a far better alignment with Bowtie 0.12.7 so judging for the results from http://lh3lh3.users.sourceforge.net/alnROC.shtml I was wondering if all the mapping I am obtaining could be inaccurate or is it somehow the type of data that works better for that softaware?

                          I see that the post from 2009 says: Bowtie is probably more tolerant with low-qual bases at the tail and thus more sensitive given such input.

                          Is that still true in recent versions?

                          thanks
                          Last edited by pepperoni; 10-22-2012, 08:51 AM.

                          Comment


                          • #14
                            "Better", how are you assessing this?

                            From what I understand in anthropology there is a model of degradation where certain methylated cytosines degrade to tyrosine while other cytosine degrade to uracil, amongst other possible degradations that can all be modeled. In that context you would want an aligner that does well with a large number of mismatches. One of the things that you might consider is if you have enough time and CPUs is to run BWA with a setting that would permit a larger number of mismatches in its search.

                            Also you might want to use some sort of aligner that takes into account the probabilistic model for mismatches that would give you an optimal alignment based on the mismatch probabilities.

                            Comment


                            • #15
                              thanks @rskr,
                              I have tried with lots of parameters combinations for both Bowtie and BWA, particularly for the ones reported for illumina reads for aDNA. allowing more mismatches, varying or eliminating seed length, more gaps, etc... Depending on the library, most times I get more mapping reads with bowtie, and with some libraries the difference is not significant. The way I am assessing the quality of the alignment is basically by visualization, that way I can tell if I can see clear SNPs, or DNA damage, or where parameters just were too relaxed and the alignment does not look real. There are no reports for solid data on ancient DNA yet. However there is a paper were different parameters have to be used to get the best alignments for illumina and helicos data on ancient DNA.

                              The other problem that I have is that for paired data on solid the reads come in the opposite direction and BWA cannot handle them, only bowtie does in theory because I haven't manage to map paired color data yet

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X