Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ABI SOLiD data filtering and conversion to base-space

    I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

    (1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

    Does anybody have any idea how to do this. Any help would be greatly appreciated.

  • #2
    Originally posted by PRJ View Post
    I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

    (1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

    Does anybody have any idea how to do this. Any help would be greatly appreciated.
    If you use programs that can align with indels in color space (i.e BFAST, BWA, or SHRiMP) they may be aligned as insertions at the ends of reads. Then you can remove the adaptor sequence post-alignment.

    Comment


    • #3
      i didn't think you could even order 25bp chemistry anymore. Was this done on a version 2 machine?

      Your best bet is to find someone with bioscope so you can output these directly into SAM files.

      Comment


      • #4
        Originally posted by PRJ View Post
        I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

        Does anybody have any idea how to do this. Any help would be greatly appreciated.
        Believe me, you do not want to do this. Converting raw SOLiD color space data directly to base space causes a serious problem. That is, any color space error will not only result in a base space error, but it will switch the "color frame" (for lack of a better term) and result in every base from that point on in the read being converted incorrectly.

        If you absolutely must use a program that does not understand color space, you can do a trick called "double encoding". Double encoding leaves the sequence in color space, but uses base letters (a, c, g, t) to indicate color instead of numbers. This allows the use of color space naive programs with one caveat: to inter-convert strands in color space one must reverse, rather than reverse-complement. So forward and reverse strands have to be considered separately. (Assembly programs, for example, would create two contigs -- one top strand, the other bottom strand). For strand-specific data like small RNA data sets, this will be less of an issue.

        As far as clipping adaptor sequence from the end goes. That would be tricky with 25 base reads. I suppose you could just chop off the last 5 bases.

        Your best bet really is to use a color space aware program to map the reads like the SOLiD™ System Small RNA Analysis Tool or its Bioscope equivalent then convert the reads that align to your reference to base space, if needed.

        --
        Phillip

        Comment


        • #5
          If you do go the double-encoding route (via the encodeFasta.py program provided within the Corona lite package) then make sure that you differentiate your double-encoded file from normal sequence files. ABI recommends making all double-encoded files begin with 'de_' and to use the '-a' switch in order to add an annotation to the file.

          Also be aware that color-space, even double-encoded color space, can not be reverse complemented in the normal fashion.

          Comment


          • #6
            miRNA mapping

            Hi. you can notice the adapter (or P2) from a string which, in case your sequences are 3' SREK sequences, begins with 3302010

            You could map {0,1,2,3} to {A,C,G,T} easily, trim with a S&W procedure the P2 (check on the SREK protocol manual the sequence in nucleotides) and revert back transforming {A,C,G,T} to {0,1,2,3} - REMEMBER not to tocuh the first T, ie reads should look like T0011112333, T2233111000 etc

            OR map with SHRiMP against referecne genome or mirbase => the adapter won't align properly

            HTH

            Alessandro


            Originally posted by PRJ View Post
            I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

            (1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

            Does anybody have any idea how to do this. Any help would be greatly appreciated.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X