Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • novoalign - allow for more mismatches

    Hi,

    I want to use novoalign to map reads coming from relatively divergent genomes - allowing up to 8 mismatches for 90bp paired-end reads. Here is the command I used:
    Code:
    novoalign -d <genome> -f <1.1.fq.gz> <1.2.fq.gz> -k 200 -r E 10 -o SAM > <1.sam>
    However, this only aligns reads with up to 2/3 mismatches, and produces much less alignments than BWA (using "-n 8" option). Anyone having any idea which parameters I should change or add?

    Thanks,

  • #2
    Hi,

    Did you mean -t 200 not -k 200? Or -k -t 200?

    -t 200 will allow around 6-7 mismatches in each pair.

    For divergent genome I suggest not using -t, the default will allow 8 or more mismatches per read.

    I'd also add -x 4 to reduce the gap extend penalty, We use -x4 or -x6 for almost all Novoalign runs. The default is a bit too high. The gap open might also be reduced if you expect divergence to include more gaps. -g 20 might be a suitable value. Setting these too low can cause false positive alignments so you need to take care. I'd be tempted to run a few tests and maybe even analyse the frequency of short indels in the alignments and then set gap penalties to suit.

    And consider using quality calibration, -k option. This could be interesting with divergent genomes because it will bring mismatch penalties down in line with divergence rate (% of mismatches) and should allow more mismatches. I'd test on 50K reads to see what difference it made.


    Colin

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 11:49 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-24-2024, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    61 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X