Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How many reads are acceptable from an RNA seq experiment

    Hi

    We have data from an RNA seq experiment, 48 samples v2.5 Illumina. We had roughly the recommended number of clusters and an even distribution between the samples so we've ended up with roughly 6-7 million paird reads or 12-14 million single reads per sample.

    I've heard people claim that you need at least 20-25 million reads per sample. So I'm wondering if anyone knows or have an article that has looked at a good read number for an RNA seq experiment. The data quality is really nice, if someone ask me how our runs look I always show the fastqc from this run...

    /Petter

  • #2
    I'd guess it depends on the analysis you want to do on the data, or the purpose of your experiment. Generally, for snp-calling, this amount of reads is sufficient I'd suppose. However, if you are looking at gene expression, especially to detect low expressed genes' differential expression, then maybe more reads would help.

    I'd love to see the fastqc results to see how good an RNA-Seq data could look like. The ones I am working with, while they are good after preprocessing (adapter clipping + quality trimming), I have never seen a library sequenced good enough by looking at the raw data.
    Also, it would be great if you could tell how much of total RNA did you use and also a bit about pre-amplification of the library.. if it was performed, how many cycles etc...

    Thank you.

    Comment


    • #3
      This is a hotly debated topic, see e. g. http://blog.fejes.ca/?p=607 where Anthony Fejes discusses a paper claiming that 500 million reads are needed to estimate transcription levels ... There has been a kind of mini-trend lately with several papers claiming that RNA-seq is actually not that good compared to microarrays unless you have very deep coverage.

      As cedance said, it really depends on what you are interested in. I have performed some simulations where I downsampled the data and looked at the resulting abundance estimates for isoforms from Cufflinks and other tools, and haven't seen that much difference beyond 10 million paired-end reads so far. Looking at the number of detected transcripts, it always grows with sequencing depth, but again the curve is almost flat after 10-20M reads in the cases I've looked at.

      Comment


      • #4
        To make it even more complex, we have seen that polyA+ RNA gives a much higher fraction of reads mapping to exons compared to total RNA (rRNA depleted) where there are instead lots of intronic reads. Our explanation is that total RNA-seq captures lots of nascent transcripts that have not yet been fully transcribed while PolyA+ RNA-seq captures mainly mature transcripts (see http://dx.doi.org/10.1038/nsmb.2143).

        So I think fewer reads are required for polyA+ RNA-seq compared to total RNA-seq if you are interested in mRNA expression.

        Comment


        • #5
          you should read this:

          Comment


          • #6
            Thanks for all the answers. I've decided to resequence a couple of samples to a much higher depth as well as doing some data pooling to see how things look in our system. I'm assuming that the coveraqge needed it will be dependent on read length as well read depth and since we have 101 bp long reads we might be better off. I'm also uncertain regarding the number of transcripts to expect, we're working in a highly specialised celltype, not in a cell line, so I'm expecting less transcripts and far from all that could exist in comparison to the vast numbers found in the immortalised cell lines.

            I'm also curious whether it's much dependent on the highly expressed genes that are in the sample since they "steal" a lot of the data being produced. I know that it's possible to select the genes that one is interested in but have any one tried to remove the genes that is uninteresting/highly expressed to increase the coverage of the other genes? This would allow for a higher coverage even of genes that you don't know exist in comparison to the positive selection when you only find what you expected to find.'

            I've also (wanted to) attach a figure to show what I call high quality data since cedence asked for it but since it ask for an url to do it and I have those figures just on my computer I can't. Are there any nice (fast and simple) ways of doing this?

            Comment


            • #7
              Pettervikman,
              About posting images/urls to images, I use imageshack to upload images and paste the url here with the URL button.

              Comment


              • #8
                A new try for the figures


                Comment


                • #9
                  That looks really great. Could you also post the plots for "Sequence duplication levels" and "per base sequence content"? These are the ones I am not quite satisfied with, with our data.

                  Comment


                  • #10




                    Here are per base content and duplication levels. Since we've used the poly A tail pulldown I'm not suprised of the increase in A/T initially. The duplication levels are much higher then I'd accept for a genomic project but since there's much less diversity from the transcriptome I'm fine with this. Consider that there are hard end points that really cant be changed (5' and 3' ends of transcripts) and between maybe 10-15 k transcripts to start with.

                    An other question though. After cufflinks using RABT (-g) the transcripts creation looks a lot nicer. That said does anyone know why some transcripts are labelled OK despite the fact that their FPKM_low is 0? I'm also wondering about transcripts labelled as FAIL that have the positive numbers in coverage, fpkm, fpkm_high.

                    To sum it up, why are there transcripts with positive numbers in coverage, fpkm, fpkm_high and 0 in fpkm_low sometime OK, LOWDATA or FAIL?

                    Comment


                    • #11
                      Thanks again. I am sorry I don't/haven't used cufflinks, yet.
                      1 more question!!: why is poly-A pulldown responsible for initial increase in A/T?

                      Comment


                      • #12
                        Petter, those data look super. Did you get them sequenced in Uppsala?

                        Comment


                        • #13
                          Thanks! They got sequenced here on "my" hiseq. We have a hiseq here on CRC in Malmö, and where part of Lund University/LUDC (Lund University Diabets Center).

                          The pulldown uses a poly T tail and this will bind somewhere in the poly A tail (just to be super clear). Hopefully close to the 3' end of the CDS/3'non coding. But if it binds further down there will be a few As or Ts sequenced before the actual sequencing, hence the slight increase of A/T.

                          Comment


                          • #14
                            Originally posted by cedance View Post
                            Thanks again. I am sorry I don't/haven't used cufflinks, yet.
                            1 more question!!: why is poly-A pulldown responsible for initial increase in A/T?
                            It isn't. The non-random base distribution in the first 10 bases is attributed to hexamer-primed 2nd strand synthesis. (The hexamers do not prime perfectly randomly.)

                            --
                            Phillip

                            Comment


                            • #15
                              Thanks pmiquel. Didn't know that. But I've heard that it's much more common in rna-seq experiments in comparison to dna seq, hence the poly a tail story. But your saying that it's only dependent on the 2nd strand syntesis?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X