Hi, all.
I'm working with a tightly constrained sequence analysis project. The mostly unimportant part: I'm aligning repeat sequences (LINES) to the mouse genome using Bowtie2, with IonTorrent sequence.
The important part is that I'm trying to use Bowtie2 with some serious constraints (high penalty for mismatches: mx 1000 and --ignore quals ) while allowing snps on "N" in the reference sequence with loose -np 0 and --n-ceil C,3. No indels (--rdg 1000,40, --rfg 1000,40). My mapping constraints are --score-min G,30,20.
This would all work great if it wasn't for the fact that it's not really doing a good job matching sequences that are identical except for the variation in the reference sequence at the position where I've put an "N" to indicate an ambiguous character. Anything with an ambiguous character isn't getting aligned.
For instance, AATAAGGACTAGGAC will align sequence, but AATAANGACTAGGAC will not.
I was wondering if anyone else has had success with mapping highly similar sequences, while trying to allow at least one ambiguous character within the reference -- and what Bowtie parameters they used?
I'm working with a tightly constrained sequence analysis project. The mostly unimportant part: I'm aligning repeat sequences (LINES) to the mouse genome using Bowtie2, with IonTorrent sequence.
The important part is that I'm trying to use Bowtie2 with some serious constraints (high penalty for mismatches: mx 1000 and --ignore quals ) while allowing snps on "N" in the reference sequence with loose -np 0 and --n-ceil C,3. No indels (--rdg 1000,40, --rfg 1000,40). My mapping constraints are --score-min G,30,20.
This would all work great if it wasn't for the fact that it's not really doing a good job matching sequences that are identical except for the variation in the reference sequence at the position where I've put an "N" to indicate an ambiguous character. Anything with an ambiguous character isn't getting aligned.
For instance, AATAAGGACTAGGAC will align sequence, but AATAANGACTAGGAC will not.
I was wondering if anyone else has had success with mapping highly similar sequences, while trying to allow at least one ambiguous character within the reference -- and what Bowtie parameters they used?