Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alignment of small RNA data

    I was recently at a meeting about RNA-seq in general, and the topic of small RNA-seq came up, something with which I'm quite unfamiliar. The discussions were interesting, but seeing as I didn't know much about sRNA-seq (and I was the "RNA-seq"-guy at the meeting), they didn't get very far. I've since tried to learn a bit about it, and I wanted to ask some questions to clear up things I'm not sure about...

    1) A general pipeline for sRNA-seq. As far as I understand it, the sequencing adapters are proportionally a much larger part of the reads than for normal RNA-seq. This would make adapter trimming more or less mandatory for any sRNA-seq analysis. Is this correct?

    2) Seeing as sRNA is a lot smaller, would that mean that there are more duplicated reads in an sRNA-seq dataset? If so, would you remove them?

    3) As far as alignment goes, I can't really understand if one should use one of the sRNA-specific aligners I seem to find by googling, or to use one of the normal RNA-seq aligners (STAR, Tophat, etc.). I seem to find information saying that you can use either...

    4) Can you align to the normal human reference genome (such as GRCh38), or do you need to add some sRNA-specific database? I found miRBase, for example, which (as far as I can tell) is a database for miRNA sequences. I assume one could align to that, if one is only interested in miRNA? Or should those sequences be added to e.g. GRCh38 and then aligned to the collated reference?

    Since I'm interested in this purely from a learning and knowledge perspective, I won't actually work with any sRNA-seq dataset. I did download a run from the SRA and put it through my standard alignment pipeline just to see what happened, though. I got around 80% ambigously alignments and about 10% duplicated reads using just a very simple STAR 2-pass alignment to GRCh38 without any sRNA-specific sequences added and no adapter/quality trimming. Do these numbers make sense for the non-optimised (from an sRNA perspective) pipeline used? What would be required to get a better alignment?

  • #2
    Dear Eric, we are very often analyzing sRNA data and I can give you some insight.
    1) Adaptor trimming is really a must. With the minimum sequencing length being 50 you always have adaptor remnants in the sequence.

    2) removing duplicated reads would be a problem. The problem is, that you will in most of the cases have the full length sequence of you the sRNA sequenced. Therefore in contrast to RNAseq you do not have a random shifting in you sequence (hope this is understandable). Removing duplicates will leave you most likely with a very low and very similar count for all the miRNAs no matter how high/different they were expressed. You can use adaptors containing random nucleotides and then use these 8Ns in combination with the sRNA sequence to assess the duplication rate.

    3) we use good old bowtie and it works perfectly fine for us (if there are different opinions on that one, any input is appreciated)


    4) I guess the answer here largely depends on your question.

    hope that helps as a start.

    Comment


    • #3
      Hi,
      I do a lot of short RNA-seq and here are some thoughts (but there are other ways of doing things that work well):

      1. Agreed that adapter trimming is a must or most of your reads will not map. We use cutadapt which works really nice.
      2. No duplicate read removing is needed nor should be done. You'll loose lots of things.

      3. bowtie works well, I have also used BWA which also seemed to work well but usually default to bowtie. As far as I understand, STAR wouldn't work for short RNAs as it was designed for long RNA and specifically paired-end (but don't quote me here, I may be wrong). STAR is our goto aligner for long RNA.

      4. As far as aliging. In my opinion, you should always align to the whole genome (GRCh37 or 38, which ever you choose). Afterwords intersect with miRBase or some other database of interest. Also, keep in mind, that the vast majority of miRNAs are 'unique' sequences in genome and should align uniquely. But there are cases, in which some miRNAs have duplicate sequences in the genome (e.g. miR-92a-3p, or miR-1302 which the same sequence is in 11 places in the genome). Also by mapping to the whole genome, you could do additional things like novel miR discovery. Some people do use miRBase sequences and align to those instead of the whole genome, but I personally think that is a bad idea, and will give a false-sense of what you are looking at. Essentially, you would be 'forcing' many reads to align to those regions, when in fact they would align better to other places in the genome, especially when you allow a mismatch in there.

      Have fun with it. miRNAs do lots of interesting things and have many useful roles!

      Comment


      • #4
        Which GTFs to use for annotation of sRNA?

        Hello everyone,

        Following on from ErikFas's query about using the normal human reference genome for sRNA-seq analysis, I wanted to ask if a regular gtf/gff from Ensembl or UCSC can be used for annotation purposes of sRNA or are there specific gtfs?

        Thanks a lot!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        47 views
        0 likes
        Last Post seqadmin  
        Working...
        X