SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat color space SongLi Bioinformatics 3 12-28-2010 09:27 AM
Converting nucleotide-space to color-space javijevi Bioinformatics 7 11-29-2010 02:14 AM
Solid formats translator(base space/color space/double encoded) AronaldJ SOLiD 0 10-26-2010 12:10 AM
SAMTOOLs tview for ABI color space nilshomer Bioinformatics 2 07-05-2010 07:15 AM
direct mapping of color-space data against color-space begsch SOLiD 1 09-09-2009 09:25 PM

Reply
 
Thread Tools
Old 11-07-2008, 02:57 PM   #1
sgupta
Junior Member
 
Location: Cambridge, MA

Join Date: Nov 2008
Posts: 6
Default ABI Color Space to Bases

Hi,

I am trying to convert color space sequences generated by ABI SOLiD sequencer to actually bases using the following color space data "matrix":

AA=0
AC=1
AG=2
AT=3
CC=0
CA=1
CT=2
CG=3
GG=0
GT=1
GA=2
GC=3
TT=0
TG=1
TC=2
TA=3

So, this
>44_35_267_F3
T20220213203000111000122223221121222

gets converted to

>44_35_267_F3
CCTCCTGCTTAAAACACCCCAGAGATCTGTCAGAG

I want to do this to be able to use alignment programs that cannot work with ABI color space data. But so far I think I am doing something wrong because my alignment rates are less than 5% using published data (allowing upto 2 mismatches, mouse genome).

Any insights would be really appreciated.

I may just go ahead and use MAQ to do this in color space but I am not sure why this does not work the way I am currently trying to doing it. I am very new to SOLiD data so I maybe missing some piece of information here.

Thanks in advance.
sgupta is offline   Reply With Quote
Old 11-09-2008, 07:33 AM   #2
lgoff
Member
 
Location: Cambridge, MA

Join Date: Feb 2008
Posts: 82
Default SOLiD Alignment

I have found that it is much better to do any analysis that you can in colorspace before you make the transition to DNA space. We are currently using the SHRiMP (U Toronto) alignment algorithm for fast and accurate alignment in colorspace. But even still 5% seems pretty low for DNA-space alignments of SOLiD data.
lgoff is offline   Reply With Quote
Old 11-09-2008, 01:59 PM   #3
lgoff
Member
 
Location: Cambridge, MA

Join Date: Feb 2008
Posts: 82
Default Script available

To answer your original question, just send me an email and I will provide you a python script that will convert .csfasta to .fasta as needed.

Loyal
lgoff(at)broad.mit.edu
lgoff is offline   Reply With Quote
Old 11-09-2008, 02:12 PM   #4
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Lgoff,
what advantage do you see with SHRiMP compared to the ABI tools? It is said to be very slow?

sgupta,
Direct conversion is not possible for reads that have any sequencing error since it will change all following bases in base space. Your coversion look correct though, but it is very common that sequences have at least one cs error. SOCS and ZOOM! are supposed to do colorspace alignments, perhaps worth a try.
Chipper is offline   Reply With Quote
Old 11-09-2008, 06:26 PM   #5
lgoff
Member
 
Location: Cambridge, MA

Join Date: Feb 2008
Posts: 82
Default SOLiD

Originally, I was very put off by the SOLiD pipeline. It was initially very closed and there wasn't much I could do outside of the genome resequencing for which it was originally designed. The matching is relatively fast with SOLiD, but I do like the k-mer+Smith-waterman approach of SHRiMP. While the SOLiD pipeline has become much more robust. When we received our original machine, with the original cluster, it was underpowered for anything human. We had to re-develop our own pipeline for the specific applications we were using SOLiD for (smRNAs at the time). So we went with SHRiMP, and I have stuck with it since. Since we are lucky enough to be able to parallelize everything very nicely, the speed is not terribly an issue for us. I haven't tried the SOLiD pipeline in the past few months. Am I missing any dramatic improvements?
lgoff is offline   Reply With Quote
Old 11-13-2008, 08:44 AM   #6
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Hi Loyal,

Can you share your strategy for parallel processing with SHRiMP?
ECO is offline   Reply With Quote
Old 11-09-2009, 03:48 PM   #7
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by ECO View Post
Can you share your strategy for parallel processing with SHRiMP?
Parallelizing SHRiMP is as simple as splitting your input fasta/fastq file into smaller ones, running SHRiMP on each, then merging the hits output file.

Nesoni (open source) does this automatically for you: http://www.vicbioinformatics.com/software.nesoni.shtml
Torst is offline   Reply With Quote
Old 11-09-2009, 05:30 PM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by sgupta View Post
Hi,

I am trying to convert color space sequences generated by ABI SOLiD sequencer to actually bases using the following color space data "matrix":

AA=0
AC=1
AG=2
AT=3
CC=0
CA=1
CT=2
CG=3
GG=0
GT=1
GA=2
GC=3
TT=0
TG=1
TC=2
TA=3

So, this
>44_35_267_F3
T20220213203000111000122223221121222

gets converted to

>44_35_267_F3
CCTCCTGCTTAAAACACCCCAGAGATCTGTCAGAG

I want to do this to be able to use alignment programs that cannot work with ABI color space data. But so far I think I am doing something wrong because my alignment rates are less than 5% using published data (allowing upto 2 mismatches, mouse genome).

Any insights would be really appreciated.

I may just go ahead and use MAQ to do this in color space but I am not sure why this does not work the way I am currently trying to doing it. I am very new to SOLiD data so I maybe missing some piece of information here.

Thanks in advance.
You can also do the conversion directly on our web server:
http://genome.ucla.edu/bfast-server/. Click on the left tab that says CS2NT/NT2CS and enjoy!

I would recommend aligning in color space since one color error will cause all bases after the color error to be translated incorrectly. Many great color space aware mapping tools exist.

Nils
nilshomer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:25 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO