Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie number of mismatches and multiple aligned reads

    Hi
    Should the --no-1mm-upfront parameter be used with bowtie2 to allow exactly 1 vs 2 mismatches? If so how to use it?

    Should 1 as cutoff for MAPQ be used to discriminate the exactly 1 time aligned vs >1 time aligned reads?

    Look forward to your reply,

    Carol

  • #2
    --no-1mm-upfront

    Below is an excerpt from the Bowtie manual:



    "By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-end alignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed."


    I don't think you can tell Bowtie to find exactly 1 or 2 mismatches,
    I think you can only tell it the maximum number of mismatches to allow.

    Comment


    • #3
      So are you confirming that --no-1mm-upfront should be used as --no-1mm-upfront 1 or --no-1mm-upfront 2? Or should N and L be used?

      Once > 1 time aligned reads are reported by bowtie, how is it possible to separate reads that aligned exactly once from those that aligned > 1 times?

      Thanks

      Comment


      • #4
        It's just "--no-1mm-upfront" (it doesn't take an argument).

        Your goal isn't to filter out "unique" vs. "non-unique" mappers, because there's no such thing (the terms are simply wrong and bowtie should just be changed to not use them, no reads are unique if you consider a large enough edit distance). Rather, your goal is to filter out alignments that are/aren't reliable. The normal way to do that is by MAPQ score, with reasonable thresholds being somewhere between 5 and 10.

        Comment


        • #5
          but -no-1mm-upfront attempts to find 0 or 1 mismatch. How about 2 mismatches?

          I meant mapping to repetitive regions by > 1 times alignment because in stats report, I get > 50% of > 1 times alignments. So the value of MAPQ is heureustic. In a given interval, how to choose the best?

          Comment


          • #6
            Originally posted by carolW View Post
            but -no-1mm-upfront attempts to find 0 or 1 mismatch. How about 2 mismatches?
            No, -no-1mm-upfront disables bowtie's default behaviour (which is to find alignments with 0 or 1 mismatches).
            You can set -N 2 if you want to allow up to 2 mismatches in the seed region.

            Comment


            • #7
              When I set -N 2, I get error message:

              Error: -N was set to 2, but cannot be set greater than 1
              Error: Encountered internal Bowtie 2 exception (#1)

              Is there any other parameter that should be set, too?

              Comment


              • #8
                Bowtie2 doesn't allow more than 1 mismatch in the seed. Note that the number of mismatches in the seed is not the same as the number allowed for the whole alignment (unless your reads are the same length as the seeds).

                Comment


                • #9
                  Originally posted by dpryan View Post
                  It's just "--no-1mm-upfront" (it doesn't take an argument).

                  Your goal isn't to filter out "unique" vs. "non-unique" mappers, because there's no such thing (the terms are simply wrong and bowtie should just be changed to not use them, no reads are unique if you consider a large enough edit distance). Rather, your goal is to filter out alignments that are/aren't reliable. The normal way to do that is by MAPQ score, with reasonable thresholds being somewhere between 5 and 10.
                  How could we judge a threshold as a reasonable? Does it depend of the data? All info is welcome.

                  Comment


                  • #10
                    The MAPQ relates to the probability that the alignment is correct, so just pick a value that you're happy with depending on your downstream applications. For RNAseq, I usually use a theshold of 5, since there's enough coverage that a small amount of error won't have any considerable effect. For bisulfite sequencing data, on the other hand, I've found that a MAPQ threshold of 10 is usually the sweet spot, since there's less coverage per site, so one can't accept as much error. For variant calling, many of the callers utilize MAPQ and Phred scores in their call algorithms, so you may either not bother filtering or might just remove the highly unreliable alignments, which for bowtie2 are those with MAPQ of 0 or 1.

                    If you're looking for some objectively perfect filtering algorithm there is none, it's just a question of how much error your requirements can accept.

                    Comment


                    • #11
                      so it seems to be easy with my data as I have 0, 1, 42. 0 must corresponds to 0 time alignment as there is u in the strand column. 1 must be ambigous or aligned > 1 time and 42 unambigous, or aligned exactly once.

                      Comment


                      • #12
                        Yeah, life is easy when you have just 3 values. A value of 42 is given when there's a perfect match and there's no valid next-best alignment. If you played with --score-min then you'd eventually get a larger variety of MAPQ scores, though that'd just overcomplicate your life

                        Comment


                        • #13
                          BTW, there are actually 5 ways in which bowtie2 will yield a MAPQ of 0, only one of which is due to a read not being mapped (it's an unreliable alignment in any case). It's actually possible to have a "unique" alignment with a MAPQ of 0, assuming the definition of "unique" is having only one valid alignment given the --score-min and penalty settings.

                          Comment


                          • #14
                            agree with you

                            Comment


                            • #15
                              Originally posted by dpryan View Post
                              Bowtie2 doesn't allow more than 1 mismatch in the seed. Note that the number of mismatches in the seed is not the same as the number allowed for the whole alignment (unless your reads are the same length as the seeds).
                              so, what is the right way to set the overall permitted mismatches while mapping to the reference genome index with bowtie2? looking forward to your answer!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X