Dear SEQanswers,
I have a particular gene. I use its sequence as a reference to gather an atlas of variations induced by some enzyme, whose target is located within the said gene sequence.
The sequence of the region of interest follows.
In the sequence, two repeated 4-mers are colored in red.
I also happen to have two short reads obtained by DNA sequencing.
FORWARD:
gatattgatattggtcttaatatgacttgttttcattgttctcagccagcatggcagcctctttcccacccac
REVERSE:
tggtcttaatatgacttgttttcattgttctcagccagcatggcagcctctttcccacccaccttgggactca
Aligning these sequences onto the aforementioned reference sequence can potentially produce at least 5 alignments, regardless of utilized alignment software & algorithms.
Alignment 1
Alignment 2
Alignment 3
Alignment 4
Alignment 5
Alignment 6
(alignments omitted)
I can rule out the alignment number 6 and similar omitted alignments by considering them too complex in comparison to the 5 others.
Therefore, there is a deletion of 9 nucleotides.
However, the deletion can potentially start at 5 different positions.
Based solely on sequence information, I think there is just nothing that can be done to retrieve the true starting position of the deletion.
SEQanswers, what do you think ?
-Seb
I have a particular gene. I use its sequence as a reference to gather an atlas of variations induced by some enzyme, whose target is located within the said gene sequence.
The sequence of the region of interest follows.
Code:
gatattgatattggtcttaatatgacttgttttcattgtt[COLOR="Red"]ctca[/COLOR]ggtac[COLOR="Red"]ctca[/COLOR]gccagcatggcagcctctttcccacccaccttgggactca
I also happen to have two short reads obtained by DNA sequencing.
FORWARD:
gatattgatattggtcttaatatgacttgttttcattgttctcagccagcatggcagcctctttcccacccac
REVERSE:
tggtcttaatatgacttgttttcattgttctcagccagcatggcagcctctttcccacccaccttgggactca
Aligning these sequences onto the aforementioned reference sequence can potentially produce at least 5 alignments, regardless of utilized alignment software & algorithms.
Alignment 1
Code:
gatattgatattggtcttaatatgacttgttttcattgttctca---------gccagcatggcagcctctttcccacccac FORWARD tggtcttaatatgacttgttttcattgttctca---------gccagcatggcagcctctttcccacccaccttgggactca REVERSE gatattgatattggtcttaatatgacttgttttcattgtt[COLOR="Red"]ctca[/COLOR]ggtac[COLOR="Red"]ctca[/COLOR]gccagcatggcagcctctttcccacccaccttgggactca GENE
Alignment 2
Code:
gatattgatattggtcttaatatgacttgttttcattgttctc---------agccagcatggcagcctctttcccacccac FORWARD tggtcttaatatgacttgttttcattgttctc---------agccagcatggcagcctctttcccacccaccttgggactca REVERSE gatattgatattggtcttaatatgacttgttttcattgtt[COLOR="Red"]ctca[/COLOR]ggtac[COLOR="Red"]ctca[/COLOR]gccagcatggcagcctctttcccacccaccttgggactca GENE
Alignment 3
Code:
gatattgatattggtcttaatatgacttgttttcattgttct---------cagccagcatggcagcctctttcccacccac FORWARD tggtcttaatatgacttgttttcattgttct---------cagccagcatggcagcctctttcccacccaccttgggactca REVERSE gatattgatattggtcttaatatgacttgttttcattgtt[COLOR="Red"]ctca[/COLOR]ggtac[COLOR="Red"]ctca[/COLOR]gccagcatggcagcctctttcccacccaccttgggactca GENE
Alignment 4
Code:
gatattgatattggtcttaatatgacttgttttcattgttc---------tcagccagcatggcagcctctttcccacccac FORWARD tggtcttaatatgacttgttttcattgttc---------tcagccagcatggcagcctctttcccacccaccttgggactca REVERSE gatattgatattggtcttaatatgacttgttttcattgtt[COLOR="Red"]ctca[/COLOR]ggtac[COLOR="Red"]ctca[/COLOR]gccagcatggcagcctctttcccacccaccttgggactca GENE
Alignment 5
Code:
gatattgatattggtcttaatatgacttgttttcattgtt---------ctcagccagcatggcagcctctttcccacccac FORWARD tggtcttaatatgacttgttttcattgtt---------ctcagccagcatggcagcctctttcccacccaccttgggactca REVERSE gatattgatattggtcttaatatgacttgttttcattgtt[COLOR="Red"]ctca[/COLOR]ggtac[COLOR="Red"]ctca[/COLOR]gccagcatggcagcctctttcccacccaccttgggactca GENE
Alignment 6
Code:
gatattgatattggtcttaatatgacttgttttcattgt--t-------ctcagccagcatggcagcctctttcccacccac FORWARD tggtcttaatatgacttgttttcattgt--t-------ctcagccagcatggcagcctctttcccacccaccttgggactca REVERSE gatattgatattggtcttaatatgacttgttttcattgtt[COLOR="Red"]ctca[/COLOR]ggtac[COLOR="Red"]ctca[/COLOR]gccagcatggcagcctctttcccacccaccttgggactca GENE
I can rule out the alignment number 6 and similar omitted alignments by considering them too complex in comparison to the 5 others.
Therefore, there is a deletion of 9 nucleotides.
However, the deletion can potentially start at 5 different positions.
Based solely on sequence information, I think there is just nothing that can be done to retrieve the true starting position of the deletion.
SEQanswers, what do you think ?
-Seb
Comment