SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
csfasta --> fasta conversion doxologist SOLiD 35 05-15-2012 09:27 AM
Convert csfasta to fastq yksikaksi Bioinformatics 2 10-30-2011 08:36 PM
help needed to retrieve fasta reads from fasta db prashanthpnu Bioinformatics 1 06-21-2011 05:59 AM
SRA to .csfasta chip_seq Bioinformatics 14 05-10-2011 07:51 PM
Can we merge 2 csfasta files ? tdm SOLiD 9 12-10-2010 09:10 AM

Reply
 
Thread Tools
Old 08-30-2011, 07:14 PM   #1
brachysclereid
Member
 
Location: California

Join Date: Feb 2011
Posts: 32
Default csfasta to fasta?

Is anyone aware of a script that will convert a color space fasta to a fasta file?
brachysclereid is offline   Reply With Quote
Old 08-30-2011, 07:25 PM   #2
BAMseek
Senior Member
 
Location: St. Louis, MO, USA

Join Date: Apr 2011
Posts: 124
Default

One thing to keep in mind - you could convert the colors into bases, but an error in the color call would throw off all the subsequent bases after that position. It may be better to align the data in colorspace (converting the reference sequence into color space), and then finding the most parsimonious conversion of the color space reads into base space reads given the reference.
BAMseek is offline   Reply With Quote
Old 08-31-2011, 06:09 AM   #3
brachysclereid
Member
 
Location: California

Join Date: Feb 2011
Posts: 32
Default csfasta to fasta

I would still like to do this. In theory those problems would result in kmers occurring only once or a few times in the data set and could be removed.


Does such a script exist?
brachysclereid is offline   Reply With Quote
Old 08-31-2011, 06:53 AM   #4
kevleb
Member
 
Location: Sophia-Antipolis

Join Date: Jun 2009
Posts: 10
Default

you should use encodeFasta.py script in corona from SOLiD.
kevleb is offline   Reply With Quote
Old 08-31-2011, 07:27 AM   #5
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by brachysclereid View Post
I would still like to do this. In theory those problems would result in kmers occurring only once or a few times in the data set and could be removed.
Alas, no. Without knowing the base n-1, there are 4 base space sequences that may derive from a color space sequence starting at base n. One of those base space sequences is correct, the other 3 are incorrect. The result is that any error at any position sends one down the incorrect path. However, these will not create wildly divergent kmers because there are only 3 of them.

It might be possible to collapse the resulting "homoconvertamers" knowing their derivation. But it would likely be easier to just do all the analysis in color space (and take advantage of the extra error detection capabilities of color space) and do the conversion to base space as a final step.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-31-2011, 09:27 AM   #6
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

I definately agree with Phillip and BAMSeek as to the advisability of doing your work in colorspace instead of in basespace. As for your specific question, there have been several cs-to-bs converters posted on SeqAnswers. Gringer posted one recently (which I did not like). Other people have posted theirs. Search for forum for the keywords "basespace colorspace"
westerman is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:44 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO