![]() |
Allowing a high number of mismatches when mapping
Dear all,
I have sequences of 53bp, among which between 23 and 30 bases are of interest (=motifs). For simplicity, I took only the first 17 bases. Each sample has between 5 and 23 millions of reads. The reference is composed of 7450 distinct sequences. I took the 17 first bases of the reference sequences for simplicity. My goal is to map the motifs to the reference. If there was no sequencing error, I would find only 7450 distinct motifs in my samples. There was a problem during the sequencing most likely and 25% of the reads have poor quality. When mapping with bowtie Code:
bowtie --best --strata -v 2 -k 1 -m 1 --norc I used -v 3 on two samples, and it increases the mapping rate of ~ 1.5% only. Since my reference is small (7450 distinct sequences), I know that with less than 17 bases (sometimes 6 bases are sufficient), I can uniquely identify from which of the 7450 references the sequence comes. Thus, I need to allow for this specific case a higher number of mismatches (bowtie is limited to 3). I intend to try bowtie2 in local mode. I do not know it, but RMAP (https://omictools.com/rmap-tool) seems to correspond to my question. Could you please give me some suggestions/ideas to deal with this particular case? Thank you a lot for your help. |
I suggest you try BBMap, which is quite tolerant of low identity; it typically allows mapping down to around 60-70% identity. For very high sensitivity, try this command:
Code:
bbmap.sh in=reads.fq out=mapped.sam vslow minid=0.6 maxindel=5 k=11 |
Thank you Brian for your suggestion.
I am doing some tests with Bowtie on shorter sequences and if it doesn't work, I will try BBMap. The maximum length I can use is 23 bp. Would it be sufficient? |
23 is fine, but more bases will always increase specificity. If your sequences are 53 bp, why are you cutting them down to 23?
|
Thank you for your answer.
I am working on a sh screen. The first 22-30 bases are common to all sequences. Between 23 and 31 bases correspond to the sh in each sequence. Since there is a problem of quality at the end (from the middle in fact), I use the minimum number of bases (from the left) needed to discriminate the sh. |
All times are GMT -8. The time now is 06:47 AM. |
Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.