Hi,
I am trying to convert color space sequences generated by ABI SOLiD sequencer to actually bases using the following color space data "matrix":
AA=0
AC=1
AG=2
AT=3
CC=0
CA=1
CT=2
CG=3
GG=0
GT=1
GA=2
GC=3
TT=0
TG=1
TC=2
TA=3
So, this
>44_35_267_F3
T20220213203000111000122223221121222
gets converted to
>44_35_267_F3
CCTCCTGCTTAAAACACCCCAGAGATCTGTCAGAG
I want to do this to be able to use alignment programs that cannot work with ABI color space data. But so far I think I am doing something wrong because my alignment rates are less than 5% using published data (allowing upto 2 mismatches, mouse genome).
Any insights would be really appreciated.
I may just go ahead and use MAQ to do this in color space but I am not sure why this does not work the way I am currently trying to doing it. I am very new to SOLiD data so I maybe missing some piece of information here.
Thanks in advance.
I am trying to convert color space sequences generated by ABI SOLiD sequencer to actually bases using the following color space data "matrix":
AA=0
AC=1
AG=2
AT=3
CC=0
CA=1
CT=2
CG=3
GG=0
GT=1
GA=2
GC=3
TT=0
TG=1
TC=2
TA=3
So, this
>44_35_267_F3
T20220213203000111000122223221121222
gets converted to
>44_35_267_F3
CCTCCTGCTTAAAACACCCCAGAGATCTGTCAGAG
I want to do this to be able to use alignment programs that cannot work with ABI color space data. But so far I think I am doing something wrong because my alignment rates are less than 5% using published data (allowing upto 2 mismatches, mouse genome).
Any insights would be really appreciated.
I may just go ahead and use MAQ to do this in color space but I am not sure why this does not work the way I am currently trying to doing it. I am very new to SOLiD data so I maybe missing some piece of information here.
Thanks in advance.
Comment