Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How many million reads required to have a 20x coverage for rat RNA

    Hi All,

    Try to seek a piece of advice --- we are trying to obtain an average coverage of 20x for RNA-Seq of rat tissues. How many million reads should we try to get for each library to have that kind of coverage?

    Much thanks in advance!


    Wing

  • #2
    Well, it is very hard to tell because different tissues will have different part of the transcriptome expressed.

    If $$$ is not an issue, 100M 2x100 reads should very likely be an overkill of what you want.

    Good luck!

    Comment


    • #3
      "20X coverage" for RNA-Seq is difficult to define since the copy number varies for transcripts across at least 4 orders of magnitude within a tissue. Therefore estimating "coverage" for RNA-Seq is not nearly as straightforward as it is for DNA applications.

      For very highly expressed transcripts, as little as 1 Million reads will easily give you 20X coverage.

      But for rare transcripts, you can collect 1 Billion or more reads and still not ever get to 20X coverage.

      And of course this issue varies depending upon which tissue you are studying as well...a transcript may be easy to study in liver, but be virtually absent in brain.

      For mRNA sequencing (TruSeq Stranded mRNA Kits) we usually recommend 50 Million paired-end 2 X 75 bp reads...you can always go to 100M if you want deeper coverage...but beyond that the cost-benefit ratio of collecting more reads on a single sample really falls off dramatically.

      Comment


      • #4
        Thanks much to y'all. These are very useful as well as practical helps!

        Wing

        Comment


        • #5
          With all that said, if I am allowed to twist the question a bit.

          Say, I already have some Affy microarray data and I want to better or at least confirm the array data with RNA-Seq. The Affy chip used was HG ST gene array and the experiment was done with n=3. Now we want to do also n=3 in RNA-Seq, will 20M clean read of PE2x100 have similar or better coverage than the array data?

          Thanks

          Wing

          Comment


          • #6
            Originally posted by wingtec View Post
            With all that said, if I am allowed to twist the question a bit.

            Say, I already have some Affy microarray data and I want to better or at least confirm the array data with RNA-Seq. The Affy chip used was HG ST gene array and the experiment was done with n=3. Now we want to do also n=3 in RNA-Seq, will 20M clean read of PE2x100 have similar or better coverage than the array data?

            Thanks

            Wing
            Long, long ago, when we did SOLiD runs, an ABI applications specialist told us 5M reads was equivalent to an Affy Chip. But I don't know what that was based on.
            Possibly there are comparisons in the literature?
            --
            Phillip

            Comment


            • #7
              Originally posted by wingtec View Post
              With all that said, if I am allowed to twist the question a bit.

              Say, I already have some Affy microarray data and I want to better or at least confirm the array data with RNA-Seq. The Affy chip used was HG ST gene array and the experiment was done with n=3. Now we want to do also n=3 in RNA-Seq, will 20M clean read of PE2x100 have similar or better coverage than the array data?

              Thanks

              Wing
              People have a lot of opinions on the amount of coverage needed for RNA-Seq - it almost turns into a religious debate! Generally speaking, 10M reads should give you 'array-like' coverage. 20M PE reads (which I'm defining as 20M clusters) would be even better. If cost is a major issue, you could either reduce the number of clusters or go for SE reads. PE is nice, but unless you're going to do the hard work of trying to figure out splice isoforms, it's probably not necessary.

              Good luck with the experiment!
              AllSeq - The Sequencing Marketplace
              [email protected]
              www.AllSeq.com

              Comment


              • #8
                I would highly recommend this blog from CoreGenomics which tries to address this issue using the data published by the SEQC group last year:

                I've been working with microarrays since 2000 and ever since RNA-seq came on the scene the writing has been on the wall. RNA-seq has so man...


                Bottom line is that many independent groups have come to the same conclusion: 10M to 20M single-end 50 bp reads (from libraries made with polyA mRNA preps) will give gene-level expression values that are better than an AFFY array.

                These days, what with the lower price of sequencing etc., I always try to default to 25M paired-end 2x75 bp reads. This data will persist for a long time and can be used by lots of different pipelines to do more advanced analysis of splicing, fusions, and novel transcript discovery than can be done with 50 bp SE reads alone.

                Comment


                • #9
                  Originally posted by wingtec View Post
                  With all that said, if I am allowed to twist the question a bit.

                  Say, I already have some Affy microarray data and I want to better or at least confirm the array data with RNA-Seq. The Affy chip used was HG ST gene array and the experiment was done with n=3. Now we want to do also n=3 in RNA-Seq, will 20M clean read of PE2x100 have similar or better coverage than the array data?

                  Thanks

                  Wing
                  Note that regardless of depth of coverage, you may well not be able to "confirm" some array results with an independent RNA-seq experiment. Just because you detect any given gene as significantly differentially expressed in one experiment does not mean you will do so in the other experiment. Sometimes the overlap in DEGs is great, but sometimes it can be quite low.

                  You may get better correspondance (better confirmation) in the end by ontology enrichment comparisons of the genes selected from the two experiments than you will with a direct comparison of signficant gene lists. Particularly given that your n=3 for biological replication is a minimally low number of replicates.

                  Array equivalence is a two part issue to my mind. First is the issue of equivalent sensitivity - how much RNA-seq coverage will give you equivalent statistical sensitivity of detection of change? But how much coverage do you need to pick up either the equivalent number of DEGs or largely the same set of DEGs is a different issue. Typically, coverage for the former is far less than for the latter. 5-10M reads per sample will equal or exceed array sensitivity, but you'd be better to have 30-50M reads per sample if you want a good chance of getting high overlap in detected DEGs in both experiments (in my experience).
                  Michael Black, Ph.D.
                  ScitoVation LLC. RTP, N.C.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X