Dear all,
I have sequences of 53bp, among which between 23 and 30 bases are of interest (=motifs). For simplicity, I took only the first 17 bases. Each sample has between 5 and 23 millions of reads.
The reference is composed of 7450 distinct sequences. I took the 17 first bases of the reference sequences for simplicity.
My goal is to map the motifs to the reference.
If there was no sequencing error, I would find only 7450 distinct motifs in my samples. There was a problem during the sequencing most likely and 25% of the reads have poor quality.
When mapping with bowtie
the mapping rate is ~ 70-82%.
I used -v 3 on two samples, and it increases the mapping rate of ~ 1.5% only.
Since my reference is small (7450 distinct sequences), I know that with less than 17 bases (sometimes 6 bases are sufficient), I can uniquely identify from which of the 7450 references the sequence comes. Thus, I need to allow for this specific case a higher number of mismatches (bowtie is limited to 3).
I intend to try bowtie2 in local mode. I do not know it, but RMAP (https://omictools.com/rmap-tool) seems to correspond to my question.
Could you please give me some suggestions/ideas to deal with this particular case?
Thank you a lot for your help.
I have sequences of 53bp, among which between 23 and 30 bases are of interest (=motifs). For simplicity, I took only the first 17 bases. Each sample has between 5 and 23 millions of reads.
The reference is composed of 7450 distinct sequences. I took the 17 first bases of the reference sequences for simplicity.
My goal is to map the motifs to the reference.
If there was no sequencing error, I would find only 7450 distinct motifs in my samples. There was a problem during the sequencing most likely and 25% of the reads have poor quality.
When mapping with bowtie
Code:
bowtie --best --strata -v 2 -k 1 -m 1 --norc
I used -v 3 on two samples, and it increases the mapping rate of ~ 1.5% only.
Since my reference is small (7450 distinct sequences), I know that with less than 17 bases (sometimes 6 bases are sufficient), I can uniquely identify from which of the 7450 references the sequence comes. Thus, I need to allow for this specific case a higher number of mismatches (bowtie is limited to 3).
I intend to try bowtie2 in local mode. I do not know it, but RMAP (https://omictools.com/rmap-tool) seems to correspond to my question.
Could you please give me some suggestions/ideas to deal with this particular case?
Thank you a lot for your help.
Comment