Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • question about bowtie -e parameter

    Hi,

    My question concerns Bowtie's –e (--maqerr) parameter.

    If I understand correctly, while –n sets the maximum number of mismatches permitted in the "seed", the total number of mismatches over the entire read length can be controlled by the –e parameter. Indeed, increasing this parameter could greatly increase the number of aligned reads. For example, in one sample that I tested, increasing –e from 70 (default) to 140, increased the percentage of aligned reads from 31% to 48%.

    Default (-e 70)
    --------------------------------
    # reads processed: 4424341
    # reads with at least one reported alignment: 1395000 (31.53%)
    # reads that failed to align: 3029341 (68.47%)

    -e 140
    --------------
    # reads processed: 4424341
    # reads with at least one reported alignment: 2125127 (48.03%)
    # reads that failed to align: 2299214 (51.97%)

    Does the default value (70) is the recommended level for 36 bp reads? Did anyone test how –e should be increased with the increasing of reads length? For example, any recommendation on how –e should be increased if reads length is 80 bp?

    Many thanks in advance,
    Rani

  • #2
    It all depends on your application, because this is a tradeoff between the quality of the alignment and the number of reads aligned. Figure between 30 and 40 per mismatch.

    Comment


    • #3
      For RNA-Seq the desired output is usually a read count, so the reads only have to be of sufficient quality to map to the right location. The value for e in those applications can be 300+, depending on read length, without sacrificing quality of results.
      For SNP calling the quality of the reads is more important than the quantity, so a much lower -e is useful. For longer reads (80 bases) I wouldn't do anything lower than 100.

      I generally have used this method to figure out how to set -e.
      How many of the bases not covered in the seed would I tolerate being wrong, assuming they are high-quality bases. I take that number times 30 to set -e. If you don't care about what comes after the seed, take the number of non-seed bases and multiply by 30.
      Larger values for -e seem to slow bowtie down.

      Comment


      • #4
        Quality values get rounded to a the nearest 10, which means reads will be rejected if you have 3 high quality mismatches (it saturates at 30) in your mismatch. If the basecall quality is quite bad however, you can easily end up with 10 or 15 low scoring mismatches.

        As adamdeluca mentioned already there might be applications where it is worth increasing the limit (e.g. many high quality SNPs if you are sequencing another strain). Increasing -e does increase the alignment time considerably however.

        It might be worth performing some quality control on the data to see if the error rates start to increase drastically towards later cycles (e.g. with fastqc), and if so you might just trim all sequences to a cycle where you do still trust the basecalls before running bowtie.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X