Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2.1.0 gave mismatch with -N=0 option

    Dear all,

    I am using Bowtie2.1.0 to analyze the reads from Illumina machines. The parameter I used are --end-to-end -D 5 -R 1 -N 0 -L 22 -i S,0,2.50.

    One of my reads is TTAAAGGAACCCAGAGAGATATTTCA, and Bowtie gave me
    HWI-ST1225 0 chr2 227981508 24 26M * 0 0 TTAAAGGAACCCAGAGAGATATTTCA BBBFFFFFFFFFFIIFFFIIIIIFII AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:12T13 YT:Z:UU

    Since I set max number of mismatches in seed alignment as 0 (-N 0) and length of seed is 22 (-L 22), I didn’t expect any mismatch within 22 bases. However, Bowtie gave me 12T13. Did I misinterpreted the Bowtie options?

    Thank you so much.

  • #2
    Does anybody have any clue?

    Comment


    • #3
      No idea.

      Ignore the following, I just left it in for historical purposes and to remind myself how stupid I can be ... It is interesting that the MD field is different from the CIGAR field. As per the Bowtie2 manual: The MD field ought to match the CIGAR string. Which it obviously does not. '12T13' vs 26M.

      Out of stupidity mode, the rest of my original comment ....

      Out of curiosity, and perhaps to help troubleshooting, what does the reference look like at the match position?
      Last edited by westerman; 12-05-2013, 01:17 PM. Reason: Stupidity

      Comment


      • #4
        26M in CIGAR string means 26 match or mismatch. So CIGAR string is consistent with MD field.

        Comment


        • #5
          Ah, so correct. Must be the end of a long day. I'm getting dangerous in not thinking fast enough. Anyway I am as mystified as you are. If I have time (hah!) I'll try out your command myself and see if 'playing around' reveals anything. Once again thanks for the correction.

          Comment


          • #6
            This is human sequence I take it? I might play around with the bowtie2 source code tomorrow to see why this is happening if no one comes up with the reason beforehand. I imagine that this sort of issue affects more than a few people, especially since even the default settings shouldn't allow this!

            Comment


            • #7
              Yes. It is human sequence. And I used hg19

              Comment


              • #8
                Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.

                Comment


                • #9
                  Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed. If so, then why go for trimming and removing N using other program. If we just remove adaptor seq then will be okay... or just define seq length in bowtie and , in this case, we dont even need trimming ?
                  I am totally confused, since I didnt touch this field for 1 year.
                  May somebody like to answer ?
                  thanks in advance
                  jp.

                  Originally posted by gringer View Post
                  Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.

                  Comment


                  • #10
                    Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed.
                    This question should really be posted in a new thread, but given that it's marginally related...

                    Bowtie2 can handle Ns in the map index and in the reads, and happily align any base at that location. They're not removed, but are probably treated in a similar way to a read with a very low Q score. It may also "correct" a read mapping to a non-N position for the read record in the SAM output.

                    [FWIW, Bowtie v1 can't handle Ns. I think it will replace Ns with As when doing indexing and alignment]

                    Comment


                    • #11
                      Originally posted by gringer View Post
                      Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.
                      Yeah, but the mismatch is in the seed region.

                      Comment


                      • #12
                        Originally posted by jp. View Post
                        Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed. If so, then why go for trimming and removing N using other program. If we just remove adaptor seq then will be okay... or just define seq length in bowtie and , in this case, we dont even need trimming ?
                        I am totally confused, since I didnt touch this field for 1 year.
                        May somebody like to answer ?
                        thanks in advance
                        jp.
                        See the --np and --n-ceil options for how bowtie2 handles Ns. By default, Ns decrease the alignment score and reads with too many Ns will be skipped altogether. If you have Ns at one end of a read, then you might as well trim them off.

                        Comment


                        • #13
                          Originally posted by dpryan View Post
                          Yeah, but the mismatch is in the seed region.
                          Bowtie2 seeds across the entire read length:

                          Bowtie 2 begins by extracting substrings ("seeds") from the read and its reverse complement and aligning them in an ungapped fashion with the help of the FM Index. This is "multiseed alignment" and it is similar to what Bowtie 1 does, except Bowtie 1 attempts to align the entire read this way.
                          Although now I notice that you've got a 26bp read, and a 22bp seed, so any seed will overlap with the mismatch. Thinking again about jp.'s question, perhaps there is an N (or other ambiguous base) at that position in the reference sequence. Otherwise, yes, very odd.

                          Comment


                          • #14
                            Yeah, if the read were long enough that the mismatch could not be in the seed then that would make total sense. There are no Ns in the reference in that area (the sequence there is "ttaaaggaaccctgagagatatttca"). My guess at the moment is that either the scoring matrix that's fed to al.exactSweep() isn't set properly or the output of that (which contains whether a seed maps with 0, 1, or 2 mismatches) just isn't being dealt with properly. I guess it'd be faster to just email Ben Langmead :P

                            Comment


                            • #15
                              While I haven't traced things completely through the code, I can't see that bowtie2 reliably follows the -N option. It sets it internally and does do some computation dependent upon it, but it seems to not set a read as unalignable if -N 0 is used and there are no perfect seeds (the easiest fix (presumably) would be to just flag a read as unmapped if bestmin > 0 in the multiseedSearchWorker if multseedMms == 0). Either way, this is a bug and should get reported (in fact, I've just done so).

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 11:49 AM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 08:47 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              61 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X