Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SmallRNA analysis pipeline

    I am little bit confused with the smallRNA analysis pipeline used for analyzing SOLiD results.

    Here is what I have read everywhere.

    1. Trim the adaptors and convert from csfasta to csfastq/fastq
    2. Align them to the genome
    3. Match them with miRBase to get the number of counts per miRNA sequence.


    My question is Why do you need to align to the reference genome. Why can't we just find the unique sequence reads and then blast them to miRBase to get the counts.

    The unaligned reads can then be blasted to the genome to discover any new miRNA.

    I am really new to this and cannot seem to find a reasonable explanation for aligning first to the reference genome.

    If someone can explain this to me or suggest any paper, that will be great!

    Thank you

  • #2
    You actually align to the genome after aligning to mirBase. The steps are filter against a know set of tRNA and rRNA and then what isn't filtered align to mirBase. What doesn't match there is aligned to the entire genome for novel discovery. You will get these reads annotated and returned in a gff like formatted file. The reason you would want to not use blast is that these short read aligners are much better tuned to these types of data and even if you tuned blast to better handle them, it doesn't work in color space.
    Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

    Comment


    • #3
      Thank you very much for the reply. Exactly thats how I thought the pipeline should be. But, most of the papers published first align to the genome and then to miRBase. That's what confused me.

      Thanks again.

      Comment


      • #4
        miRNA * non* forms and hairpin

        I've been working with the SOLiD smRNA pipeline and have been questioning the genome alignment portion.

        I've been using Bioscope to directly align the smRNA reads to the whole genome as 1x36 reads. I don't need to trim adapters because of the seed and extend process that Bioscope does. I've been manually fishing for reads that align to the human genome and flank +/- 100bp to look for a secondary hit with the same read. This would hopefully identify both forms of the miRNA as well as the loop. Novoalign performs this adequately on Illumina reads, has anyone seen anything for SOLiD smRNA reads?

        Comment


        • #5
          There is a version of novoalign for SOLiD reads. I imagine you could configure it similarly

          Comment


          • #6
            Originally posted by rdeborja View Post
            I've been working with the SOLiD smRNA pipeline and have been questioning the genome alignment portion.

            I've been using Bioscope to directly align the smRNA reads to the whole genome as 1x36 reads. I don't need to trim adapters because of the seed and extend process that Bioscope does. I've been manually fishing for reads that align to the human genome and flank +/- 100bp to look for a secondary hit with the same read. This would hopefully identify both forms of the miRNA as well as the loop. Novoalign performs this adequately on Illumina reads, has anyone seen anything for SOLiD smRNA reads?
            Hi just checking if you are using the whole transcriptome pipeline in bioscope? or just the resequencing pipeline?

            I would think that for small RNA with length of 21-22 nt you would need to trim the adaptors. (or are you saying that the seed and extend process only maps up to the smallRNA and ignores the adaptor seq?)

            For WT mapping one will be able to provide the filter reference which includes adaptors and stuff like rRNA and tRNAs.

            I am guessing there will be unlikely spurious hits where the small RNA bridges exon-intron-exon boundaries. But I am unsure of this.
            http://kevin-gattaca.blogspot.com/

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            31 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X