SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD

Similar Threads
Thread Thread Starter Forum Replies Last Post
Conversion from base space to colorspace KevinLam SOLiD 5 09-08-2012 04:52 PM
Stupid perl scripts for converting colour-space <-> base-space gringer Bioinformatics 7 07-20-2011 07:35 AM
Bioscope Ma conversion (to base space) DNAjunk Bioinformatics 2 04-15-2011 07:29 AM
Solid formats translator(base space/color space/double encoded) AronaldJ SOLiD 0 10-26-2010 12:10 AM
ZOOM released (supporting both Illumina data and ABI SOLiD data) spirit Bioinformatics 2 08-21-2008 06:48 AM

Reply
 
Thread Tools
Old 12-13-2009, 02:57 PM   #1
PRJ
Junior Member
 
Location: MA

Join Date: Jun 2009
Posts: 3
Default ABI SOLiD data filtering and conversion to base-space

I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

(1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

Does anybody have any idea how to do this. Any help would be greatly appreciated.
PRJ is offline   Reply With Quote
Old 12-13-2009, 10:02 PM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,283
Default

Quote:
Originally Posted by PRJ View Post
I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

(1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

Does anybody have any idea how to do this. Any help would be greatly appreciated.
If you use programs that can align with indels in color space (i.e BFAST, BWA, or SHRiMP) they may be aligned as insertions at the ends of reads. Then you can remove the adaptor sequence post-alignment.
nilshomer is offline   Reply With Quote
Old 12-14-2009, 09:17 AM   #3
snetmcom
Senior Member
 
Location: USA

Join Date: Oct 2008
Posts: 126
Default

i didn't think you could even order 25bp chemistry anymore. Was this done on a version 2 machine?

Your best bet is to find someone with bioscope so you can output these directly into SAM files.
snetmcom is offline   Reply With Quote
Old 12-15-2009, 03:41 AM   #4
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 1,797
Default

Quote:
Originally Posted by PRJ View Post
I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

Does anybody have any idea how to do this. Any help would be greatly appreciated.
Believe me, you do not want to do this. Converting raw SOLiD color space data directly to base space causes a serious problem. That is, any color space error will not only result in a base space error, but it will switch the "color frame" (for lack of a better term) and result in every base from that point on in the read being converted incorrectly.

If you absolutely must use a program that does not understand color space, you can do a trick called "double encoding". Double encoding leaves the sequence in color space, but uses base letters (a, c, g, t) to indicate color instead of numbers. This allows the use of color space naive programs with one caveat: to inter-convert strands in color space one must reverse, rather than reverse-complement. So forward and reverse strands have to be considered separately. (Assembly programs, for example, would create two contigs -- one top strand, the other bottom strand). For strand-specific data like small RNA data sets, this will be less of an issue.

As far as clipping adaptor sequence from the end goes. That would be tricky with 25 base reads. I suppose you could just chop off the last 5 bases.

Your best bet really is to use a color space aware program to map the reads like the SOLiD™ System Small RNA Analysis Tool or its Bioscope equivalent then convert the reads that align to your reference to base space, if needed.

--
Phillip
pmiguel is offline   Reply With Quote
Old 12-15-2009, 04:58 AM   #5
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 865
Default

If you do go the double-encoding route (via the encodeFasta.py program provided within the Corona lite package) then make sure that you differentiate your double-encoded file from normal sequence files. ABI recommends making all double-encoded files begin with 'de_' and to use the '-a' switch in order to add an annotation to the file.

Also be aware that color-space, even double-encoded color space, can not be reverse complemented in the normal fashion.
westerman is offline   Reply With Quote
Old 12-15-2009, 05:55 AM   #6
aguffanti
Member
 
Location: Milano, Italy

Join Date: Dec 2008
Posts: 26
Default miRNA mapping

Hi. you can notice the adapter (or P2) from a string which, in case your sequences are 3' SREK sequences, begins with 3302010

You could map {0,1,2,3} to {A,C,G,T} easily, trim with a S&W procedure the P2 (check on the SREK protocol manual the sequence in nucleotides) and revert back transforming {A,C,G,T} to {0,1,2,3} - REMEMBER not to tocuh the first T, ie reads should look like T0011112333, T2233111000 etc

OR map with SHRiMP against referecne genome or mirbase => the adapter won't align properly

HTH

Alessandro


Quote:
Originally Posted by PRJ View Post
I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

(1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

Does anybody have any idea how to do this. Any help would be greatly appreciated.
aguffanti is offline   Reply With Quote
Reply

Tags
abi, base space, colorspace, filter, solid

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:10 PM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.