SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cannot get specific sample IDs in SAM alignment using Bowtie2 erms Bioinformatics 4 03-28-2014 07:38 AM
bowtie2 parameters for chip-seq analyst Bioinformatics 8 10-04-2013 01:20 AM
Bowtie2 parameters for a more stringent mapping ysb Bioinformatics 0 04-17-2013 09:56 AM
how to change bowtie2 parameters claudyhenrix Bioinformatics 0 07-02-2012 09:32 AM
Recommended Bowtie2 parameters for chloroplast assembly using unpaired 101 bp reads claudyhenrix Bioinformatics 1 06-27-2012 12:42 PM

Reply
 
Thread Tools
Old 03-28-2017, 07:10 AM   #1
FlorianT
Junior Member
 
Location: Paris

Join Date: Mar 2017
Posts: 1
Default Bowtie2: a specific use where the parameters are not respected

Hello everyone,
I'm writing this post because I observed a strange behavior with bowtie2. And I suspect this behavior to affect mostly alignments on references with short sequence.

I'm using Bowtie2 in order to map short reads (between 15 and 28 nt) against mature mirna reference sequences (ranging from 17 to 28nt in length).
I don't want any mismatch to occur in an alignment, and luckily the maximum seed length in bowtie2 is 28 nt, so I can forbid any mismatch in my alignment using the parameters -L 28 (the seed length), -N 0 (the mismatch number within the seed), and --no-1mm-upfront (to forbid any 1mm alignment attempt before trying the multiseed heuristic).

Here is the command I'm using:
Code:
bowtie2 --end-to-end -a -D 20 -R 3 -N 0 --no-1mm-upfront -L 28 -i S,1,0.5 --norc -x mirbase_hsa -q my_trimmed_reads.fq -S out.sam
99.9% of aligned reads don't have any mismatchs. But for a few hundreds of them, i can observe mismatchs and insertions! These incriminated reads are 22 to 24 nt long, so the seed should cover the whole sequence and no mismatch should be accepted. Here is an IGV screenshot of such reads mapping with an insertion and a mismatch on a 21 long reference sequence:



Here is the fastq line for the first read with an insertion and a mismatch (all reads with that pattern look the same):
Code:
@NS500388:284:HMVTJBGXY:1:11103:12180:4814 1:N:0:GAGTGG
TCACAGTGAACCGGTCTCTTTT
+
AAAAAEEEEEEEEEEEEEEEEE
And here is the fasta line for the reference sequence:
Code:
>hsa-miR-128-3p
TCACAGTGAACCGGTCTCTTT
As a keen observer would note, the reference sequence match the read, except for 1 'T' missing on the 3' end. So bowtie2 should not allow this alignment, as we are in end-to-end mode.
But it would appear that bowtie2 employs a malicious strategy in order to respect the end-to-end rule, by creating one deletion and a mismatch, thus disrespecting the NO MISMATCH rule.

In local mode, this read would have been accepted, with a 1 nt long soft-clip on the right side.

I observed this behavior with other references, and it's always the same pattern: an insertion and a mismatch are created to allow a long read on a shorter reference sequence, even if I specified 0 mismatch allowed in the parameter.

So I'm wondering: Why is this happening? I'm I missing something?
Any help is very welcome!

PS: my bowtie2 version is 2.2.4


******** UPDATE ********

I kept on with my investigation, and I realized the weird behavior I described can't be observed if the reference file contains only the sequence mentioned previously (hsa-miR-128-3p). If you run Bowtie2 with this sequence only, everything works fine, meaning the read is not mapped (as expected).
BUT, if you add just one other reference sequence, and this sequence must start with the letter 'T', then Bowtie2 map the read with an insertion and a mismatch.
Then I tried with another reference starting with an 'A', and in this case the read is not mapped. Which is very puzzling.

You can try this at home with the following reference to print in a file named mirbase_hsa.fa:
Code:
>hsa-miR-128-3p
TCACAGTGAACCGGTCTCTTT
>miR-test
T
And copy the following read to map (the same as before) in a file named my_trimmed_reads.fq:
Code:
@NS500388:284:HMVTJBGXY:1:11103:12180:4814 1:N:0:GAGTGG
TCACAGTGAACCGGTCTCTTTT
+
AAAAAEEEEEEEEEEEEEEEEE
Then you can run the following script:
Code:
bowtie2-build mirbase_hsa.fa mirbase_hsa
bowtie2 --end-to-end -a -D 20 -R 3 -N 0 --no-1mm-upfront -L 28 -i S,1,0.5 --norc -x mirbase_hsa -q my_trimmed_reads.fq -S out.sam
And you should see that the read is mapping when it should not. You can also try to change the following ref (miR-test) and make it start with an 'A' instead of a 'T' and you should see that the read is not mapping.

I tried this with the latest version of Bowtie2 (2.3.1) and this behavior can still be observed.

Last edited by FlorianT; 03-30-2017 at 12:14 AM. Reason: Update with new informations
FlorianT is offline   Reply With Quote
Reply

Tags
bowtie2, mismatch, short read alignment, small rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:21 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO