Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie to BWA parameters

    I use a pipeline that uses bowtie as the aligner, however, due to genome size, I have had to alter this pipeline to use BWA. I need to run BWA in a way that will give me results equivalent as possible to how I run bowtie, but I am not entirely certain how to do this.

    Here are the arguments we use in running bowtie:

    -S -k 1 -m 1 --chunkmbs 3072 --best --strata -o 4 -e 80 -l 20 -n 0

    Anyone have any suggestions on the best way to run BWA?

  • #2
    assuming you mean bwa 'aln' and not 'mem'...

    -S -k1 -m1 --chunkmbs --best and --strata all correspond to how bwa works by default.

    -l 20 in bowtie is -l 20 in bwa.
    -n 0 in bowtie is -k 0 in bwa
    -e will be impossible to mimic because bwa doesn't care about base qualities however you can tinker with -n (in bwa) to adjust the mismatch allowance by either setting a strict limit (-n INT) or an automatic limit based on read length (0 < -n < 1).
    The -o option is not mirrored in bwa.

    The only other difference is bowtie does not report gapped alignments and BWA does. I have tried to figure out how to disable this behavior in BWA thought it seems like the -i option should be able to kill them off (-i is a number of bases from the end of a read limit for indels). In most aligners if you set this kind of setting to a value equal to or greater than your read length it disables gaps.

    If you're talking about using bwa 'mem' then things work a little differently and you actually have much less control.

    FYI it sounds like the new bowtie2 release can handle genomes > 4GB. bowtie2 is easier to configure in a way that mimics the behavior of bowtie 1. to the best of my knowledge you're going to have to change your mismatch allowance rule no matter what because bowtie1 is the only one I know of that uses that -e setting (where sum of base quals of mismatches is used as a limit).

    To match your options with bowtie2 (and produce un-gapped alignments) you can use the following:

    --gbar <read length or larger> --mp A,B --np 1 --score-min L,0,C -L 20 -N 0

    A, B and C should be replaced with values that you can tailor for mismatch allowance. bowtie2 will rely on a minimum alignment score setting for reporting alignments so to control mismatches you have to be specific about mismatch penalties. The --mp option is used to set the penalty for high-qual and low-qual bases. If you wanted a penalty of 2 for high qual and 1 for low qual you'd use --mp 2,1. The --score-min option sets the minimum score relative to your read length. The way I have it written the formula for minimum score will be read_length*-C. So with --score-min L,0,-0.04 and 100bp reads you're allowing a minimum score of -4 which could be divided up into -2 and -1 penalties for high-qual and low-qual mismatches (assuming --mp 2,1).

    By the way I've found bowtie2 to be very good at reproducing correct alignments in simulations. I'd say it's a great upgrade to the performance of bowtie1.
    Last edited by sdriscoll; 02-15-2014, 12:12 AM. Reason: updated knowledge
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    59 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    57 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    51 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X