Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to allow more mismatches in BWA?

    I have a RNA data set with a read lenght of 76 bp. I want to allow for more mismatches when aligning in BWA. How many mismatches does BWA allow with default setting and which parameter(s) should I change if I want to allow e.g. the mismatch number to be twice as high?? I have been playing around with aln -n, -l and -M, without any success.

  • #2
    Any comments will be very much appreciated!

    Comment


    • #3
      When you say "without any success" what do you mean? How do you check?

      Comment


      • #4
        Sorry for not being clear - I mean that the percentage of aligned reads get lower instead of higher...

        Comment


        • #5
          are you aligning against a transcript database? if not, you might consider using a splice aware aligner like tophat or star:

          Comment


          • #6
            Hi volks, I have to use BWA since this aligner allows the two reads in a read pair to be on different chromosomes - my analysis depends on this. I am aligning against a custummade reference genome.

            Comment


            • #7
              if you are certain that BWA is your only option ..
              the parameters are pretty clear:

              Options: -n NUM max #diff (int) or missing prob under 0.02 err rate (float) [0.04]
              -o INT maximum number or fraction of gap opens [1]
              -e INT maximum number of gap extensions, -1 for disabling long gaps [-1]
              -i INT do not put an indel within INT bp towards the ends [5]
              -d INT maximum occurrences for extending a long deletion [10]
              -l INT seed length [32]
              -k INT maximum differences in the seed [2]
              -M INT mismatch penalty [3]
              -O INT gap open penalty [11]
              -E INT gap extension penalty [4]
              -L log-scaled gap penalty for long deletions

              as far as i understand it is not possible to have less reads aligned allowing for more mismatches (-n).

              Comment


              • #8
                Thanks volks. Yes, I am almost 100 percent sure that BWA is my only option. However, I am really a newbie to BWA, so I'm not sure that I understand your post. Most of the parameter settings, that you list, are default, right?

                E.g. -n is 0.04 by default, and I thought that this parameter was one of the parameters that I should change, when allowing BWA to align with more mismatches? Sorry - but can you explain me again which parameters are default and which parameters I should change?

                Comment


                • #9
                  defaults are given in brackets [].
                  for starters i would disable gapped alignment (-o 0), keep the seed at length and two mismatches (-l 32, -k 2) and try various different overall mismatches (e.g. -n 3 to 6). higher -n should give you more aligned reads.

                  Comment


                  • #10
                    Ok, thanks. I will try to use the guidelines that you have given me.

                    So I should concentrate on changing -n (the one that is set to 0.04 as default)? I will try to set it between 3 and 6. How should this parameter be set if I want to allow e.g. twice as many mismatches per read compared to default?

                    I have read somewhere that it is a good a idea to also disable seeding by setting -l (10000) when allowing more mismatches - but I don't know if I should do this?

                    Comment


                    • #11
                      if you run it on default it will tell you what the number of mismatches are for various read lenghts. just double that.

                      i dont see why you should turn off seeding, and i am not sure if setting -l 10000 would do that.

                      Comment


                      • #12
                        Disable seeding will make run slower. If speed is not an issue here.

                        Comment


                        • #13
                          Speed is not the biggest issue... Xied75, would you disable seeding if/when allowing for more mismatches? I'm running some test changing the parameters that volks suggested me, but I don't have any results yet.

                          Comment


                          • #14
                            Hi, Karenj,

                            I did some test.

                            First thing, if you don't give any parameter to adjust, then:

                            Default value for n, which you saw at the beginning of output:

                            [bwa_aln] 17bp reads: max_diff = 2
                            [bwa_aln] 38bp reads: max_diff = 3
                            [bwa_aln] 64bp reads: max_diff = 4
                            [bwa_aln] 93bp reads: max_diff = 5
                            [bwa_aln] 124bp reads: max_diff = 6
                            [bwa_aln] 157bp reads: max_diff = 7
                            [bwa_aln] 190bp reads: max_diff = 8
                            [bwa_aln] 225bp reads: max_diff = 9

                            My data is 83bp thus n = 4, if I run with n = 8 or n = 16, I can see more reads mapped.

                            Now -l changes the seed length, seems doesn't work, it runs 100 times slower, and map less, -k change the mismatch within seed, giving a large number doesn't work either.

                            There are many more parameters you can change e.g. -o, -e, -i, -d, -M, -O, -E, the point is you do need understanding of it.

                            But the point of BWA is to align very fast with low error reads, if you adjust any of those listed above, it might align some hard reads, but the run time is significant LOOOOOOOONGER. Which you might better just use BWA to align first round and use another tool to align those unmapped, (like many re-aligner do).

                            Comment


                            • #15
                              Hi xied75, thanks for your post. I'm a bit confused about where I see the default value for n. I use BWA at the Galaxy server, perhaps it works a bit different there?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              59 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              57 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              56 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X