Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2: a specific use where the parameters are not respected

    Hello everyone,
    I'm writing this post because I observed a strange behavior with bowtie2. And I suspect this behavior to affect mostly alignments on references with short sequence.

    I'm using Bowtie2 in order to map short reads (between 15 and 28 nt) against mature mirna reference sequences (ranging from 17 to 28nt in length).
    I don't want any mismatch to occur in an alignment, and luckily the maximum seed length in bowtie2 is 28 nt, so I can forbid any mismatch in my alignment using the parameters -L 28 (the seed length), -N 0 (the mismatch number within the seed), and --no-1mm-upfront (to forbid any 1mm alignment attempt before trying the multiseed heuristic).

    Here is the command I'm using:
    Code:
    bowtie2 --end-to-end -a -D 20 -R 3 -N 0 --no-1mm-upfront -L 28 -i S,1,0.5 --norc -x mirbase_hsa -q my_trimmed_reads.fq -S out.sam
    99.9% of aligned reads don't have any mismatchs. But for a few hundreds of them, i can observe mismatchs and insertions! These incriminated reads are 22 to 24 nt long, so the seed should cover the whole sequence and no mismatch should be accepted. Here is an IGV screenshot of such reads mapping with an insertion and a mismatch on a 21 long reference sequence:



    Here is the fastq line for the first read with an insertion and a mismatch (all reads with that pattern look the same):
    Code:
    @NS500388:284:HMVTJBGXY:1:11103:12180:4814 1:N:0:GAGTGG
    TCACAGTGAACCGGTCTCTTTT
    +
    AAAAAEEEEEEEEEEEEEEEEE
    And here is the fasta line for the reference sequence:
    Code:
    >hsa-miR-128-3p
    TCACAGTGAACCGGTCTCTTT
    As a keen observer would note, the reference sequence match the read, except for 1 'T' missing on the 3' end. So bowtie2 should not allow this alignment, as we are in end-to-end mode.
    But it would appear that bowtie2 employs a malicious strategy in order to respect the end-to-end rule, by creating one deletion and a mismatch, thus disrespecting the NO MISMATCH rule.

    In local mode, this read would have been accepted, with a 1 nt long soft-clip on the right side.

    I observed this behavior with other references, and it's always the same pattern: an insertion and a mismatch are created to allow a long read on a shorter reference sequence, even if I specified 0 mismatch allowed in the parameter.

    So I'm wondering: Why is this happening? I'm I missing something?
    Any help is very welcome!

    PS: my bowtie2 version is 2.2.4


    ******** UPDATE ********

    I kept on with my investigation, and I realized the weird behavior I described can't be observed if the reference file contains only the sequence mentioned previously (hsa-miR-128-3p). If you run Bowtie2 with this sequence only, everything works fine, meaning the read is not mapped (as expected).
    BUT, if you add just one other reference sequence, and this sequence must start with the letter 'T', then Bowtie2 map the read with an insertion and a mismatch.
    Then I tried with another reference starting with an 'A', and in this case the read is not mapped. Which is very puzzling.

    You can try this at home with the following reference to print in a file named mirbase_hsa.fa:
    Code:
    >hsa-miR-128-3p
    TCACAGTGAACCGGTCTCTTT
    >miR-test
    T
    And copy the following read to map (the same as before) in a file named my_trimmed_reads.fq:
    Code:
    @NS500388:284:HMVTJBGXY:1:11103:12180:4814 1:N:0:GAGTGG
    TCACAGTGAACCGGTCTCTTTT
    +
    AAAAAEEEEEEEEEEEEEEEEE
    Then you can run the following script:
    Code:
    bowtie2-build mirbase_hsa.fa mirbase_hsa
    bowtie2 --end-to-end -a -D 20 -R 3 -N 0 --no-1mm-upfront -L 28 -i S,1,0.5 --norc -x mirbase_hsa -q my_trimmed_reads.fq -S out.sam
    And you should see that the read is mapping when it should not. You can also try to change the following ref (miR-test) and make it start with an 'A' instead of a 'T' and you should see that the read is not mapping.

    I tried this with the latest version of Bowtie2 (2.3.1) and this behavior can still be observed.
    Last edited by FlorianT; 03-30-2017, 12:14 AM. Reason: Update with new informations

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-27-2024, 06:37 PM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-27-2024, 06:07 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X