Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • question about bowtie -e parameter

    Hi,

    My question concerns Bowtie's –e (--maqerr) parameter.

    If I understand correctly, while –n sets the maximum number of mismatches permitted in the "seed", the total number of mismatches over the entire read length can be controlled by the –e parameter. Indeed, increasing this parameter could greatly increase the number of aligned reads. For example, in one sample that I tested, increasing –e from 70 (default) to 140, increased the percentage of aligned reads from 31% to 48%.

    Default (-e 70)
    --------------------------------
    # reads processed: 4424341
    # reads with at least one reported alignment: 1395000 (31.53%)
    # reads that failed to align: 3029341 (68.47%)

    -e 140
    --------------
    # reads processed: 4424341
    # reads with at least one reported alignment: 2125127 (48.03%)
    # reads that failed to align: 2299214 (51.97%)

    Does the default value (70) is the recommended level for 36 bp reads? Did anyone test how –e should be increased with the increasing of reads length? For example, any recommendation on how –e should be increased if reads length is 80 bp?

    Many thanks in advance,
    Rani

  • #2
    It all depends on your application, because this is a tradeoff between the quality of the alignment and the number of reads aligned. Figure between 30 and 40 per mismatch.

    Comment


    • #3
      For RNA-Seq the desired output is usually a read count, so the reads only have to be of sufficient quality to map to the right location. The value for e in those applications can be 300+, depending on read length, without sacrificing quality of results.
      For SNP calling the quality of the reads is more important than the quantity, so a much lower -e is useful. For longer reads (80 bases) I wouldn't do anything lower than 100.

      I generally have used this method to figure out how to set -e.
      How many of the bases not covered in the seed would I tolerate being wrong, assuming they are high-quality bases. I take that number times 30 to set -e. If you don't care about what comes after the seed, take the number of non-seed bases and multiply by 30.
      Larger values for -e seem to slow bowtie down.

      Comment


      • #4
        Quality values get rounded to a the nearest 10, which means reads will be rejected if you have 3 high quality mismatches (it saturates at 30) in your mismatch. If the basecall quality is quite bad however, you can easily end up with 10 or 15 low scoring mismatches.

        As adamdeluca mentioned already there might be applications where it is worth increasing the limit (e.g. many high quality SNPs if you are sequencing another strain). Increasing -e does increase the alignment time considerably however.

        It might be worth performing some quality control on the data to see if the error rates start to increase drastically towards later cycles (e.g. with fastqc), and if so you might just trim all sequences to a cycle where you do still trust the basecalls before running bowtie.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X