Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • PRJ
    Junior Member
    • Jun 2009
    • 3

    ABI SOLiD data filtering and conversion to base-space

    I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

    (1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

    Does anybody have any idea how to do this. Any help would be greatly appreciated.
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    Originally posted by PRJ View Post
    I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

    (1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

    Does anybody have any idea how to do this. Any help would be greatly appreciated.
    If you use programs that can align with indels in color space (i.e BFAST, BWA, or SHRiMP) they may be aligned as insertions at the ends of reads. Then you can remove the adaptor sequence post-alignment.

    Comment

    • snetmcom
      Senior Member
      • Oct 2008
      • 159

      #3
      i didn't think you could even order 25bp chemistry anymore. Was this done on a version 2 machine?

      Your best bet is to find someone with bioscope so you can output these directly into SAM files.

      Comment

      • pmiguel
        Senior Member
        • Aug 2008
        • 2328

        #4
        Originally posted by PRJ View Post
        I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

        Does anybody have any idea how to do this. Any help would be greatly appreciated.
        Believe me, you do not want to do this. Converting raw SOLiD color space data directly to base space causes a serious problem. That is, any color space error will not only result in a base space error, but it will switch the "color frame" (for lack of a better term) and result in every base from that point on in the read being converted incorrectly.

        If you absolutely must use a program that does not understand color space, you can do a trick called "double encoding". Double encoding leaves the sequence in color space, but uses base letters (a, c, g, t) to indicate color instead of numbers. This allows the use of color space naive programs with one caveat: to inter-convert strands in color space one must reverse, rather than reverse-complement. So forward and reverse strands have to be considered separately. (Assembly programs, for example, would create two contigs -- one top strand, the other bottom strand). For strand-specific data like small RNA data sets, this will be less of an issue.

        As far as clipping adaptor sequence from the end goes. That would be tricky with 25 base reads. I suppose you could just chop off the last 5 bases.

        Your best bet really is to use a color space aware program to map the reads like the SOLiD™ System Small RNA Analysis Tool or its Bioscope equivalent then convert the reads that align to your reference to base space, if needed.

        --
        Phillip

        Comment

        • westerman
          Rick Westerman
          • Jun 2008
          • 1104

          #5
          If you do go the double-encoding route (via the encodeFasta.py program provided within the Corona lite package) then make sure that you differentiate your double-encoded file from normal sequence files. ABI recommends making all double-encoded files begin with 'de_' and to use the '-a' switch in order to add an annotation to the file.

          Also be aware that color-space, even double-encoded color space, can not be reverse complemented in the normal fashion.

          Comment

          • aguffanti
            Member
            • Dec 2008
            • 29

            #6
            miRNA mapping

            Hi. you can notice the adapter (or P2) from a string which, in case your sequences are 3' SREK sequences, begins with 3302010

            You could map {0,1,2,3} to {A,C,G,T} easily, trim with a S&W procedure the P2 (check on the SREK protocol manual the sequence in nucleotides) and revert back transforming {A,C,G,T} to {0,1,2,3} - REMEMBER not to tocuh the first T, ie reads should look like T0011112333, T2233111000 etc

            OR map with SHRiMP against referecne genome or mirbase => the adapter won't align properly

            HTH

            Alessandro


            Originally posted by PRJ View Post
            I have previously been using Solexa for small RNA sequencing but am trying out SOLiD. I just received the data back for my first SOLiD smallRNA sequencing run and am having some difficulty with data analysis.

            (1) The run produced 25nt long reads - my smallRNAs are expected to be ~21nt long. I assumed there is primer sequence at the ends of the reads. What is the best way to filter these reads using the primer sequences and .qualilty files? I know that ABI provides the "small RNA analysis pipeline" for this but I want to - filter reads using primer sequences/qual files and output them in base-space, not color space as some programs I want to use for different analyses require colorspace.

            Does anybody have any idea how to do this. Any help would be greatly appreciated.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Pathogen Surveillance with Advanced Genomic Tools
              by seqadmin




              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
              03-24-2025, 11:48 AM
            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            49 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            57 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            49 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            200 views
            0 reactions
            Last Post seqadmin  
            Working...