Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does bowtie aling with > 3 mismateches?

    Hello,

    I run bowtie without specifying a limit on mismatches:

    bowtie -p 16 --chunkmbs 1024 --phred33 --fr --all --maxins 1000 ./mm9 -1 mate1.fastq -2 mate2.fastq fr.bwt

    When I examined the last field of the output (fr.bwt) lists more than three substitutions for some reads (e.g. 14:A>C,34:C>A,35:C>A,42:T>C,44:T>A,47:T>C,49:T>G,50:T>A,51:T>G,53:T>A,56:T>C,58:T>C,59:A>T,60:T>A,62:T>C,66:T>A,68:T>A,74:T>A,75:T>G,77:C>T,78:T>A,79:T>G)

    So does it mean that bowtie handles more than 3 mismatches? The manual (http://bowtie-bio.sourceforge.net/manual.shtml) clearly says up to 3 mismatches in the seed, but doesn't mention a limit for -v option. Is there really no limit?

    thanks.
    "Let’s start with the three fundamental Rules of Robotics...."

  • #2
    If you don't specify anything Bowtie runs with the default parameters -n 2 and -l 28, so allowing up to 2 mismatches in the first 28 bp (seed). After that it allows more mismatches, until the combined sum of mismatch qualities hits the mismatch ceiling. This can be set with the -e parameter and is 70 by default:

    -e/--maqerr <int> : Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the "seed". The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.

    So if your sequence quality at the end is very poor (presumably BBBBB or the Phred 33 equivalent #######), Bowtie will potentially handle lots and lots of mismatches since a Phred scores of 2 are rounded to 0 and do thus not count towards the mismatch ceiling limit.

    It might help to run your data through a quality-trimming program to get rid of very poor sequences first (e.g. Cutadapt, FastX toolkit etc.).

    Comment


    • #3
      Thanks very much, this explains the described behavior exactly, as indeed the reads with many mismatches end with ...########## quality scores.

      However, what does make it necessary to actually trim these bad quality bases? Naively thinking, shouldn't the alignment over 28 base seed with 2 mismatches already be a strong evidence for the alignment in the identified position?
      "Let’s start with the three fundamental Rules of Robotics...."

      Comment


      • #4
        You are right that it might well be a valid alignment, but it depends a bit on your experimental questions. Such data might be fine for e.g. ChIP-seq but should definitely not be used for SNP calling, bisulfite-seq or the like.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        48 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X