View Single Post
Old 07-21-2016, 03:55 AM   #1
Jane M
Senior Member
Location: Paris

Join Date: Aug 2011
Posts: 239
Question Allowing a high number of mismatches when mapping

Dear all,

I have sequences of 53bp, among which between 23 and 30 bases are of interest (=motifs). For simplicity, I took only the first 17 bases. Each sample has between 5 and 23 millions of reads.
The reference is composed of 7450 distinct sequences. I took the 17 first bases of the reference sequences for simplicity.
My goal is to map the motifs to the reference.

If there was no sequencing error, I would find only 7450 distinct motifs in my samples. There was a problem during the sequencing most likely and 25% of the reads have poor quality.
When mapping with bowtie
bowtie --best --strata -v 2 -k 1 -m 1 --norc
the mapping rate is ~ 70-82%.
I used -v 3 on two samples, and it increases the mapping rate of ~ 1.5% only.

Since my reference is small (7450 distinct sequences), I know that with less than 17 bases (sometimes 6 bases are sufficient), I can uniquely identify from which of the 7450 references the sequence comes. Thus, I need to allow for this specific case a higher number of mismatches (bowtie is limited to 3).

I intend to try bowtie2 in local mode. I do not know it, but RMAP ( seems to correspond to my question.

Could you please give me some suggestions/ideas to deal with this particular case?
Thank you a lot for your help.
Jane M is offline   Reply With Quote