Dear SEQanswers,
I have a particular gene. I use its sequence as a reference to gather an atlas of variations induced by some enzyme, whose target is located within the said gene sequence.
The sequence of the region of interest follows.
Code:
gatattgatattggtcttaatatgacttgttttcattgttctcaggtacctcagccagcatggcagcctctttcccacccaccttgggactca
In the sequence, two repeated 4-mers are colored in red.
I also happen to have two short reads obtained by DNA sequencing.
FORWARD:
gatattgatattggtcttaatatgacttgttttcattgttctcagccagcatggcagcctctttcccacccac
REVERSE:
tggtcttaatatgacttgttttcattgttctcagccagcatggcagcctctttcccacccaccttgggactca
Aligning these sequences onto the aforementioned reference sequence can potentially produce at least 5 alignments, regardless of utilized alignment software & algorithms.
Alignment 1
Code:
gatattgatattggtcttaatatgacttgttttcattgttctca---------gccagcatggcagcctctttcccacccac FORWARD
tggtcttaatatgacttgttttcattgttctca---------gccagcatggcagcctctttcccacccaccttgggactca REVERSE
gatattgatattggtcttaatatgacttgttttcattgttctcaggtacctcagccagcatggcagcctctttcccacccaccttgggactca GENE
Alignment 2
Code:
gatattgatattggtcttaatatgacttgttttcattgttctc---------agccagcatggcagcctctttcccacccac FORWARD
tggtcttaatatgacttgttttcattgttctc---------agccagcatggcagcctctttcccacccaccttgggactca REVERSE
gatattgatattggtcttaatatgacttgttttcattgttctcaggtacctcagccagcatggcagcctctttcccacccaccttgggactca GENE
Alignment 3
Code:
gatattgatattggtcttaatatgacttgttttcattgttct---------cagccagcatggcagcctctttcccacccac FORWARD
tggtcttaatatgacttgttttcattgttct---------cagccagcatggcagcctctttcccacccaccttgggactca REVERSE
gatattgatattggtcttaatatgacttgttttcattgttctcaggtacctcagccagcatggcagcctctttcccacccaccttgggactca GENE
Alignment 4
Code:
gatattgatattggtcttaatatgacttgttttcattgttc---------tcagccagcatggcagcctctttcccacccac FORWARD
tggtcttaatatgacttgttttcattgttc---------tcagccagcatggcagcctctttcccacccaccttgggactca REVERSE
gatattgatattggtcttaatatgacttgttttcattgttctcaggtacctcagccagcatggcagcctctttcccacccaccttgggactca GENE
Alignment 5
Code:
gatattgatattggtcttaatatgacttgttttcattgtt---------ctcagccagcatggcagcctctttcccacccac FORWARD
tggtcttaatatgacttgttttcattgtt---------ctcagccagcatggcagcctctttcccacccaccttgggactca REVERSE
gatattgatattggtcttaatatgacttgttttcattgttctcaggtacctcagccagcatggcagcctctttcccacccaccttgggactca GENE
Alignment 6
Code:
gatattgatattggtcttaatatgacttgttttcattgt--t-------ctcagccagcatggcagcctctttcccacccac FORWARD
tggtcttaatatgacttgttttcattgt--t-------ctcagccagcatggcagcctctttcccacccaccttgggactca REVERSE
gatattgatattggtcttaatatgacttgttttcattgttctcaggtacctcagccagcatggcagcctctttcccacccaccttgggactca GENE
(alignments omitted)
I can rule out the alignment number 6 and similar omitted alignments by considering them too complex in comparison to the 5 others.
Therefore, there is a deletion of 9 nucleotides.
However, the deletion can potentially start at 5 different positions.
Based solely on sequence information, I think there is just nothing that can be done to retrieve the true starting position of the deletion.
SEQanswers, what do you think ?
-Seb