Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie2 changing -k affects overall alignment rate - why?

    Hello, and sorry if this question is in the wrong section of the forum or if the answer is obvious to all but me...

    I am new to using bowtie2 (2.2.0), aligning ~ 300,000 single-end 454 cDNA reads to a de-novo transcriptome. I am finding that changing the value for -k results in a dramatic change in the overall alignment rate, but it is my understanding that the -k option should only change the number of alignments reported, and not the percentage of reads which have at least one valid alignment.

    With all other options as default, and only changing the value for -k, I find the following:

    without -k specified (default)
    22.69% of reads aligned exactly 1 time
    60.52% of reads aligned > 1 times
    83.20% overall alignment rate

    -k 2
    22.23% of reads aligned exactly 1 time
    61.70% of reads aligned > 1 times
    83.92% overall alignment rate

    -k 10
    22.02% of reads aligned exactly 1 time
    62.25% of reads aligned > 1 times
    84.26% overall alignment rate

    -k 200
    22.02% of reads aligned exactly 1 time
    62.29% of reads aligned > 1 times
    84.31% overall alignment rate


    Granted, this is a small change overall (a few tenths of a percent), but this becomes a bit more dramatic when I use the some settings recommended for RSEM (--very-sensitive --dpad 0 --gbar 99999999 --mp 1,1 --mp 1 --score-min L,0,-0.1) except that -k is varied

    -k 2
    15.92% of reads aligned exactly 1 time
    24.93% of reads aligned > 1 times
    40.86% overall alignment rate

    -k 10
    15.78% of reads aligned exactly 1 time
    28.06% of reads aligned > 1 times
    43.84% overall alignment rate

    -k 200
    15.89% of reads aligned exactly 1 time
    30.25% of reads aligned > 1 times
    46.14% overall alignment rate

    -k 1000
    15.89% of reads aligned exactly 1 time
    30.27% of reads aligned > 1 times
    46.16% overall alignment rate

    Why do these values change? From reading the manual I believe that the only change will be in the maximum number of alignments per read that get reported. To my mind, a read still has an alignment regardless of whether you report only 1 alignment for that read, or up to 1000, so that the overall alignment rate should not change.

    Hoping that someone can shed some light on this. Thanks
    Last edited by JasonM; 03-06-2014, 05:13 PM.

  • #2
    Hi,

    I think you may be confusing Bowtie(1) and Bowtie2.

    The Bowtie2 manual says that -k specifies the number of
    alignments bowtie searches for, and it will report all of them,
    but if there are more alignments than what it searches for,
    they won't be found or reported.

    Comment


    • #3
      Thanks mastal for the reply, but I understand this is bowtie2 and not bowtie. The manual for bowtie2 states:
      -k mode: search for one or more alignments, report each
      In -k mode, Bowtie 2 searches for up to N distinct, valid alignments for each read, where N equals the integer specified with the -k parameter. That is, if -k 2 is specified, Bowtie 2 will search for at most 2 distinct alignments. It reports all alignments found, in descending order by alignment score. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Each reported read or pair alignment beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS field. See the SAM specification for details.

      Bowtie 2 does not "find" alignments in any specific order, so for reads that have more than N distinct, valid alignments, Bowtie 2 does not gaurantee that the N alignments reported are the best possible in terms of alignment score. Still, this mode can be effective and fast in situations where the user cares more about whether a read aligns (or aligns a certain number of times) than where exactly it originated.
      So I get that the total number of alignments reported for each read changes, but I don't get why the percentage of reads with at least one valid alignment changes. If I set -k to 2 or 2000 it should not change the fact that x% of reads have one or more valid alignments

      Comment


      • #4
        issue does not occur in bowtie-1.0.0

        sorry to belabor the point but I just wanted to add that I have tested this with bowtie-1.0.0 and changing k does not affect the number of reads which have at least one valid alignment. I found that running bowtie in default reporting mode, or with -k (tried values 2, 10, and 1000), or with -a -m 200, always resulted in the same % of reads with at least one valid alignment (78.71% to be precise, or 76.36% aligned and 2.35% suppressed with -a -m 200 options used).
        If it matters for this discussion, I also used -n 2 -l 25 options.
        Thanks

        Comment


        • #5
          If you look back at your original post, the overall alignment rate increases slightly as you increase -k.

          k is defined slightly differently in bowtie1 and bowtie2.

          Comment


          • #6
            Originally posted by mastal View Post
            If you look back at your original post, the overall alignment rate increases slightly as you increase -k.
            Yes, but his point is that there's no reason for that to happen; it makes no sense.

            Comment


            • #7
              Is there anyone else who can shed some light on this? Could it possibly be a bug in the program?
              Last edited by JasonM; 03-10-2014, 03:52 AM. Reason: typo

              Comment


              • #8
                You'll likely have to contact the original authors. There are a number of undocumented areas of bowtie2 that would require someone to go through all of the source code. It's usually faster to just ask Ben Langmead (just post a synopsis of the reply here so others don't have to bug Ben in the future).

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 11:49 AM
                0 responses
                15 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                62 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Working...
                X