Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to allow more mismatches in BWA?

    I have a RNA data set with a read lenght of 76 bp. I want to allow for more mismatches when aligning in BWA. How many mismatches does BWA allow with default setting and which parameter(s) should I change if I want to allow e.g. the mismatch number to be twice as high?? I have been playing around with aln -n, -l and -M, without any success.

  • #2
    Any comments will be very much appreciated!

    Comment


    • #3
      When you say "without any success" what do you mean? How do you check?

      Comment


      • #4
        Sorry for not being clear - I mean that the percentage of aligned reads get lower instead of higher...

        Comment


        • #5
          are you aligning against a transcript database? if not, you might consider using a splice aware aligner like tophat or star:

          Comment


          • #6
            Hi volks, I have to use BWA since this aligner allows the two reads in a read pair to be on different chromosomes - my analysis depends on this. I am aligning against a custummade reference genome.

            Comment


            • #7
              if you are certain that BWA is your only option ..
              the parameters are pretty clear:

              Options: -n NUM max #diff (int) or missing prob under 0.02 err rate (float) [0.04]
              -o INT maximum number or fraction of gap opens [1]
              -e INT maximum number of gap extensions, -1 for disabling long gaps [-1]
              -i INT do not put an indel within INT bp towards the ends [5]
              -d INT maximum occurrences for extending a long deletion [10]
              -l INT seed length [32]
              -k INT maximum differences in the seed [2]
              -M INT mismatch penalty [3]
              -O INT gap open penalty [11]
              -E INT gap extension penalty [4]
              -L log-scaled gap penalty for long deletions

              as far as i understand it is not possible to have less reads aligned allowing for more mismatches (-n).

              Comment


              • #8
                Thanks volks. Yes, I am almost 100 percent sure that BWA is my only option. However, I am really a newbie to BWA, so I'm not sure that I understand your post. Most of the parameter settings, that you list, are default, right?

                E.g. -n is 0.04 by default, and I thought that this parameter was one of the parameters that I should change, when allowing BWA to align with more mismatches? Sorry - but can you explain me again which parameters are default and which parameters I should change?

                Comment


                • #9
                  defaults are given in brackets [].
                  for starters i would disable gapped alignment (-o 0), keep the seed at length and two mismatches (-l 32, -k 2) and try various different overall mismatches (e.g. -n 3 to 6). higher -n should give you more aligned reads.

                  Comment


                  • #10
                    Ok, thanks. I will try to use the guidelines that you have given me.

                    So I should concentrate on changing -n (the one that is set to 0.04 as default)? I will try to set it between 3 and 6. How should this parameter be set if I want to allow e.g. twice as many mismatches per read compared to default?

                    I have read somewhere that it is a good a idea to also disable seeding by setting -l (10000) when allowing more mismatches - but I don't know if I should do this?

                    Comment


                    • #11
                      if you run it on default it will tell you what the number of mismatches are for various read lenghts. just double that.

                      i dont see why you should turn off seeding, and i am not sure if setting -l 10000 would do that.

                      Comment


                      • #12
                        Disable seeding will make run slower. If speed is not an issue here.

                        Comment


                        • #13
                          Speed is not the biggest issue... Xied75, would you disable seeding if/when allowing for more mismatches? I'm running some test changing the parameters that volks suggested me, but I don't have any results yet.

                          Comment


                          • #14
                            Hi, Karenj,

                            I did some test.

                            First thing, if you don't give any parameter to adjust, then:

                            Default value for n, which you saw at the beginning of output:

                            [bwa_aln] 17bp reads: max_diff = 2
                            [bwa_aln] 38bp reads: max_diff = 3
                            [bwa_aln] 64bp reads: max_diff = 4
                            [bwa_aln] 93bp reads: max_diff = 5
                            [bwa_aln] 124bp reads: max_diff = 6
                            [bwa_aln] 157bp reads: max_diff = 7
                            [bwa_aln] 190bp reads: max_diff = 8
                            [bwa_aln] 225bp reads: max_diff = 9

                            My data is 83bp thus n = 4, if I run with n = 8 or n = 16, I can see more reads mapped.

                            Now -l changes the seed length, seems doesn't work, it runs 100 times slower, and map less, -k change the mismatch within seed, giving a large number doesn't work either.

                            There are many more parameters you can change e.g. -o, -e, -i, -d, -M, -O, -E, the point is you do need understanding of it.

                            But the point of BWA is to align very fast with low error reads, if you adjust any of those listed above, it might align some hard reads, but the run time is significant LOOOOOOOONGER. Which you might better just use BWA to align first round and use another tool to align those unmapped, (like many re-aligner do).

                            Comment


                            • #15
                              Hi xied75, thanks for your post. I'm a bit confused about where I see the default value for n. I use BWA at the Galaxy server, perhaps it works a bit different there?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X