Seqanswers Leaderboard Ad

**nilshomer** · 12-13-2009, 11:02 PM

Originally posted by PRJ View Post

I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

(1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

Does anybody have any idea how to do this. Any help would be greatly appreciated.

If you use programs that can align with indels in color space (i.e BFAST, BWA, or SHRiMP) they may be aligned as insertions at the ends of reads. Then you can remove the adaptor sequence post-alignment.

**snetmcom** · 12-14-2009, 10:17 AM

i didn't think you could even order 25bp chemistry anymore. Was this done on a version 2 machine?

Your best bet is to find someone with bioscope so you can output these directly into SAM files.

**pmiguel** · 12-15-2009, 04:41 AM

Originally posted by PRJ View Post

I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

Does anybody have any idea how to do this. Any help would be greatly appreciated.

Believe me, you do not want to do this. Converting raw SOLiD color space data directly to base space causes a serious problem. That is, any color space error will not only result in a base space error, but it will switch the "color frame" (for lack of a better term) and result in every base from that point on in the read being converted incorrectly.

If you absolutely must use a program that does not understand color space, you can do a trick called "double encoding". Double encoding leaves the sequence in color space, but uses base letters (a, c, g, t) to indicate color instead of numbers. This allows the use of color space naive programs with one caveat: to inter-convert strands in color space one must reverse, rather than reverse-complement. So forward and reverse strands have to be considered separately. (Assembly programs, for example, would create two contigs -- one top strand, the other bottom strand). For strand-specific data like small RNA data sets, this will be less of an issue.

As far as clipping adaptor sequence from the end goes. That would be tricky with 25 base reads. I suppose you could just chop off the last 5 bases.

Your best bet really is to use a color space aware program to map the reads like the SOLiD™ System Small RNA Analysis Tool or its Bioscope equivalent then convert the reads that align to your reference to base space, if needed.

--
Phillip

**westerman** · 12-15-2009, 05:58 AM

If you do go the double-encoding route (via the encodeFasta.py program provided within the Corona lite package) then make sure that you differentiate your double-encoded file from normal sequence files. ABI recommends making all double-encoded files begin with 'de_' and to use the '-a' switch in order to add an annotation to the file.

Also be aware that color-space, even double-encoded color space, can not be reverse complemented in the normal fashion.

**aguffanti** · 12-15-2009, 06:55 AM

miRNA mapping

Hi. you can notice the adapter (or P2) from a string which, in case your sequences are 3' SREK sequences, begins with 3302010

You could map {0,1,2,3} to {A,C,G,T} easily, trim with a S&W procedure the P2 (check on the SREK protocol manual the sequence in nucleotides) and revert back transforming {A,C,G,T} to {0,1,2,3} - REMEMBER not to tocuh the first T, ie reads should look like T0011112333, T2233111000 etc

OR map with SHRiMP against referecne genome or mirbase => the adapter won't align properly

HTH

Alessandro

Originally posted by PRJ View Post

I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

(1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

Does anybody have any idea how to do this. Any help would be greatly appreciated.

Topics	Statistics	Last Post
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 57 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 200 views 0 reactions	Last Post by seqadmin 03-03-2025, 01:15 PM

Seqanswers Leaderboard Ad

ABI SOLiD data filtering and conversion to base-space

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News