Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RIN values for RNAseq

    Hi All,

    We have some RNA samples from frozen tissue which we want to have sequenced using the Illumina platform. However, some of the samples have low RIN values on the Agilent Bioanalyzer. We would really like to get the sequence but can't afford to blow the money it would cost. We have a range of RIN values from 3.6 to 10. BGI (our potential sequence service provider) suggests a RIN > 7 for samples.

    My question is, has anybody had a good (preferably!) experience of sequencing low RIN value RNA. Did you use DSN in the library prep?

    Also, in another project, we are thinking of sending material from FFPE tissue - any experience there?

    Thanks a lot,
    Peadar

  • #2
    Hi Pogaora,
    RIN is a metric offered by the Agilent Bioanalyzer as an estimate of the extent of degradation of total RNA.

    Generally the amplicons sequenced by second generation sequencers will be quite limited in length (less than 400 bp inserts). Therefore, length of your RNA, per se, is not an issue. The issue we are really speaking to here is: what bias do I cause by avoiding the often >90% of the RNA contributed to total RNA by the large and small ribosomal sub unit RNAs when my RNA is somewhat degraded. Basically there are a handful of methodologies employed:

    (1) Isolate polyA+ RNAs
    (2) Remove ribosomal by hybridization+immobilization
    (3) Prime first strand cDNA synthesis from the polyA tail
    (4) Normalize the total RNA sample to bring the highly expressed RNAs (eg, rRNA) down near to the level of more lowly expressed RNA (eg, messenger RNA) -- this would include the DSN method you mention.
    (5) Degrade all messages without a 5' cap. (Epicentre terminal exonuclease)

    (1) or (2) are very common. If you have a low RIN value, neither works well. poly A purification of degraded RNA (or priming cDNA synthesis from the polyA tail) will pull out only the most 3' segment of the RNA population. That is, the full length of all your messenger RNA is present, for the most part, even in highly degraded sample. But because the more 5' sequence has become detached from the polyA tail, it is no longer accessible. Below a RIN of 7 you will likely be seeing strong 3' bias in your sequences. This may be especially problematic because 3' untranslated regions (UTRs) of messages may harbor repetitive and/or low complexity sequences which may thwart even mapping them uniquely.

    Conversely ribosomal RNA removal methodologies that work by hybridization generally target a handful of well-conserved sites on the ribosomal RNA. So you will see higher percentages of rRNA sequence resulting from this depletion method as the amount of degradation increases. Epicentre's "Ribo-zero" kits are touted as being able to remove ribosomal RNA from somewhat more degraded RNA than, well, the other kit is Invitrogen's "Ribo-minus". I presume they do this by increasing the number of rRNA binding oligo sites.

    Normalization methodologies may well be very attractive in situations where the actual RNA counts are unimportant (eg, de novo transcriptome). Normalization methodologies rely on the kinetics of hybridization. Most current kits normalize by denaturing double-stranded cDNA, allowing a limited hybridization time, followed by degradation of double-stranded molecules with a double stranded nuclease (DSN). The more highly expressed molecules stand a better chance of finding their complement strand in a given amount of time and so they can be preferentially removed on that basis. Multiple cycles of this process will probably be necessary to remove the majority of the rRNA. The old school method of fractionating the single from double stranded molecules was of hydroxylapatite (HA) chromatography. Not sure why it has fallen out of favor.

    I do not think TEX (method 5) would work well with degraded RNA because TEX requires a 5' phosphate to initiate exonucleolytic activity. I think (and please anyone that knows otherwise correct me if I am wrong here) most processes that result in RNA degradation result in 5'-hydroxyls and 3'-phosphates.

    That is the overview. The answer is going to be application specific as well. If you just want to count transcripts, ignore splice variants and you know that 3'-UTRs are uniquely mappable in your genome of interest, then degradation will have little net negative effect for you. If you are going to use the Applied Biosystems (SOLiD) RNAseq library construction methodology you have to keep in mind that ends present in your sample prior to fragmentation probably are not ligatable to the adapters in that kit. Although treatment with T4-PNK prior to fragmentation might mitigate that issue.

    A further issue is that degradation might occur during ribosomal depletion. So, for libraries where this issue is critical, running a pico RNA chip prior to fragmentation would be useful. If, for example, you saw nearly all of your polyA+ RNA is less than 500 nt. prior to fragmentation, you would know nearly all your resulting sequence will be localized to less than 500 bases from the polyadenylation site. However, some protocols make this nearly impossible (eg, Illumina's TruSeq RNA prep kit.) As RNAseq becomes a replacement for microarray-based expression analysis methodologies there is a strong need for less expensive and higher-throughput library construction methodologies. So it is not surprising that this would begin to interfere with QC steps that previously would have warned against continuing with a given samples. It is a trade-off that needs to be weighed.

    --
    Phillip

    Comment


    • #3
      Has anyone looked at RNA-seq data from samples with low RIN? We recently sequenced a few samples and were unable to detect any kind of 3' sequence bias from within the raw reads. Not what I expected.

      Comment


      • #4
        Were the samples actually degraded, or just gave a low RIN score for some other reason? What species/tissue was the RNA from? What library construction and sequencing method did you use? What method did you use to detect "any kind of 3' sequence bias from within the raw reads"?

        As implied above, yes I have seen 3' bias from RNA with lower RIN scores using a polyA mRNA isolation method. When we repeated the experiment using RiboZero ribo-depletion we did not the 3' bias.

        --
        Phillip

        Comment


        • #5
          Philip,

          Samples were from a few different species - mammalian, worm and reptile. RINs ranged from 4-6. Typically such samples would not even pass QC but these were all we had. We had also seen good samples from the same species so believe that the low RIN is adequately telling of degradation.

          Illumina TruSeq RNA-seq kit, library protocol involves polyA purification.

          To detect 3 bias, we did the following
          1) Took the assembled transcripts (this had already been done by the bioinfomaticians)
          2) Aligned raw reads back to assembled transcripts and conducted a simple calculation of ratio of coverage at 3 end (3 end was defined as 15% of full length for apolyAdenylated transcript) to total coverage of that transcript.

          In all these cases (n = 4) , we picked up only a small fraction of assembled transcripts that reflected a greater than 5x ratio of coverage at 3 end versus total coverage for a transcript as calculated above. (Less than 10 incidences, where I would expect a lot more)

          I am wondering if
          1) Assembly algorithms filter out biased transcripts, so that they never make it to assembly? (I need to still understand the assembly process as that is not done by me)

          2) Our calculation method described is flawed? May be 15% is too long?

          3) Anything else?

          How did you detect 3 bias in your data set?

          Thanks,
          Nandita

          Comment


          • #6
            So naively one would expect that the extent of degradation present in the large/small sub-unit rRNAs reflects that of the whole sample. RIN is meant to assess that degradation. But it does not always get it right. RNA prep methods, for example, that pull down all the small RNAs (eg Trizol) can result in lower RINs, or even a failure to calculate a RIN for a lane. And, obviously, strange rRNA patterns generally result in low RIN scores whether the samples are degraded or not.

            I generally check for 3' bias by loading up a .bam file in IGV and looking at some fairly highly expressed genes. If I see a pattern where most of the reads are mapping towards the 3' ends of genes and nearly not towards the 5' end, then I have evidence of and RNA degradation issue.

            We try to get people to re-isolate RNA if it looks degraded. So bias is usually not a big issue.

            Most of our libraries were either SOLiD Total RNA seq kit or Illumina RNA TruSeq. Either of these are susceptible to 3' bias when fed degraded total RNA that is ribodepleted using oligo dT capture methods.

            Normally I think of samples with a RIN of 4-6 being moderately degraded. As to whether libraries constructed from such samples would generate 5x as many reads mapped to the 3' 15% of the transcript or not, I could not say. Sounds like your data says "no".

            --
            Phillip

            Comment


            • #7
              We routinely do degraded and FFPE RNA samples. With mRNA-seq its a disaster, however whole transcriptome RNA-seq seems to be less sensitive to degraded RNA. For this we use the RiboZero FFPE kits and a standard TruSeq whole transcriptome library with 50 or 2x50 bp reads. And since we prefer whole transcriptome for our research it works out fairly well.

              We prefer RINs over 8 but sometimes with out samples and our collaborators we do not have a choice.

              Comment


              • #8
                Originally posted by pmiguel View Post
                So naively one would expect that the extent of degradation present in the large/small sub-unit rRNAs reflects that of the whole sample. RIN is meant to assess that degradation. But it does not always get it right. RNA prep methods, for example, that pull down all the small RNAs (eg Trizol) can result in lower RINs, or even a failure to calculate a RIN for a lane. And, obviously, strange rRNA patterns generally result in low RIN scores whether the samples are degraded or not.

                I generally check for 3' bias by loading up a .bam file in IGV and looking at some fairly highly expressed genes. If I see a pattern where most of the reads are mapping towards the 3' ends of genes and nearly not towards the 5' end, then I have evidence of and RNA degradation issue.

                We try to get people to re-isolate RNA if it looks degraded. So bias is usually not a big issue.

                Most of our libraries were either SOLiD Total RNA seq kit or Illumina RNA TruSeq. Either of these are susceptible to 3' bias when fed degraded total RNA that is ribodepleted using oligo dT capture methods.

                Normally I think of samples with a RIN of 4-6 being moderately degraded. As to whether libraries constructed from such samples would generate 5x as many reads mapped to the 3' 15% of the transcript or not, I could not say. Sounds like your data says "no".

                --
                Phillip
                Hi Phillip,

                I saw your post and I was wondering if you could help. I'm trying to understand the consequences of low RIN values for differential gene expression analysis.

                We have run Illumina TruSeq on samples with low RIN values (ie 3.5) in parallel with samples with higher RIN values (>7). The mapped reads for a typical sample with low RIN are ~380M and for samples with high RIN ~420M. So, I do see lower mapped reads mapped in samples with lower RIN values. However, would this really affect when assessing differential gene expression since we will be using FPKM?

                Also, you suggest to have a look at some highly expressed genes with IGV and look for 3' bias. Attached are the images of GAPDH in the BAMs for two samples with disparate RIN values (B70 RIN=3.5 and B71 RIN=7.1). I don't see much of a 3' bias (this is the case for other highly expressed genes and across samples). Is that your assessment as well? If so, can I trust the data generated with samples with low RIN values? and what is really the consequence of low RIN values?

                Thank you in advance for the help.
                Kike
                Attached Files

                Comment


                • #9
                  An easy way to check if you have 3´bias is to use the program RSEQC. With that you can calculate the gene body coverage.

                  Comment


                  • #10
                    Originally posted by DonDolowy View Post
                    An easy way to check if you have 3´bias is to use the program RSEQC. With that you can calculate the gene body coverage.
                    https://code.google.com/p/rseqc/
                    also sam-stats -R cacluates coverage, bias and a skewness metric for rna seq. median skewness can be a single-metric summary of coverage bias per sample.

                    More about coverage skewness:

                    Comment


                    • #11
                      Thank you DonDolowy and earonesty.

                      We do use RNAseQC which does coverage. I have never used sam-stats and I like the idea of the skewness metric.

                      In our datasets I haven't been able to find a strong correlation between RIN value and 3' bias (more like a weak trend). This is important for us since our samples may never have great RIN values. In addition, it is nos clear to me what 3' bias does to differential gene expression analysis and if it does something, to what extent. Any thoughts on this?

                      Kike

                      Comment


                      • #12
                        You might want to check out this paper: http://www.plosone.org/article/info%...l.pone.0091851

                        Comment


                        • #13
                          Originally posted by Enrique Zudaire View Post
                          Thank you DonDolowy and earonesty.

                          We do use RNAseQC which does coverage. I have never used sam-stats and I like the idea of the skewness metric.

                          In our datasets I haven't been able to find a strong correlation between RIN value and 3' bias (more like a weak trend). This is important for us since our samples may never have great RIN values. In addition, it is nos clear to me what 3' bias does to differential gene expression analysis and if it does something, to what extent. Any thoughts on this?

                          Kike
                          I've found that outliers in median coverage skewness tend to have higher variability, thus reducing the power if these samples are included in group comparisons.

                          Comment


                          • #14
                            Coverage biases with rRNA depletion and poly-A selection

                            There are pretty considerable coverage biases with rRNA depletion and poly(A) selection methods, that up to now have not been well characterized: http://blog.genohub.com/rrna-depleti...as-in-rna-seq/.

                            I'd be interested in learning about tools / analysis methods that attempt to account for these exon - level expression biases.

                            - Genohub

                            Comment


                            • #15
                              Originally posted by Genohub View Post
                              There are pretty considerable coverage biases with rRNA depletion and poly(A) selection methods, that up to now have not been well characterized: http://blog.genohub.com/rrna-depleti...as-in-rna-seq/.

                              I'd be interested in learning about tools / analysis methods that attempt to account for these exon - level expression biases.

                              - Genohub
                              sam-stats with the -R option outputs a "skewness" value for each transcript. Because of huge variation in reference, there's no absolute good number for this value, but you can compare the median skewness among samples and see which samples had more/less bias. Degraded samples generally have more bias.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              71 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              80 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X