Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq results interpretation - help needed

    Hello,

    I am using a standard procedure for RNA-seq, then TopHat followed by DeSeq to determine differential expression in my cell lines from the total RNA sequencing. I am using 2-3 replicates per cell line, with ~30-40 million reads. What surprises me is that for ~9% of all transcripts, I am getting zero expression in all replicates in one of the cell lines. Exactly zero, no reads at all for these transcripts. It is even not possible to calculate the log2 ratio for these genes, since the log of 0 does not exist. Should I consider that these genes are completely shut down in this cell line? Is it common like this?

    Thanks!
    Last edited by rebrendi; 09-01-2012, 12:03 PM.

  • #2
    I would say it's normal, yes. At least this kind of thing is what I typically observe.

    Comment


    • #3
      Originally posted by kopi-o View Post
      I would say it's normal, yes. At least this kind of thing is what I typically observe.
      and you considered that all those transcripts have no expression, or just the signal is missing?

      Comment


      • #4
        Well, of course if the seq depth is very low you will get zero counts for transcripts that are really expressed. Also discarding multi-mapping reads could lead to this sort of effect. But in general, I tend to assume most of the all-zero transcripts are really not expressed.

        Perhaps I should go back to my existing RNA-seq data and plot the fraction of all-zero count genes against the sequencing depth. That might give a clue about when the fraction of zero-count genes starts to bottom out.

        Comment


        • #5
          Originally posted by kopi-o View Post
          Perhaps I should go back to my existing RNA-seq data and plot the fraction of all-zero count genes against the sequencing depth. That might give a clue about when the fraction of zero-count genes starts to bottom out.
          Yes, that would be the best check. I have actually, for one of the cell lines, two replicate experiments with 30,000 and 5,000 mapped reads. Both of them have these ~8-9% transcripts with zero reads.

          Comment


          • #6
            30,000 and 5,000 mapped reads, respectively, seems awfully low. I am surprised you have as few as 8-9% zero-count transcripts, unless it is a bacterium or something, but you said it was a cell line. Are these human cell lines or some other species? And what transcript annotation (e g RefSeq) do you use? I use ENSEMBL and I suspect that in itself leads to a larger fraction of zero-count genes.

            Comment


            • #7
              Originally posted by kopi-o View Post
              30,000 and 5,000 mapped reads, respectively, seems awfully low. I am surprised you have as few as 8-9% zero-count transcripts, unless it is a bacterium or something, but you said it was a cell line. Are these human cell lines or some other species? And what transcript annotation (e g RefSeq) do you use? I use ENSEMBL and I suspect that in itself leads to a larger fraction of zero-count genes.
              I am using Eldorado, it contains much more than RefSeq, so more noise. But I am getting non-zero expression for these 9% transcripts in one cell line, and zero expression in another line, so this is not the annotation artifact. Sorry, I misprinted in the last post, I have 30 millions and 5 millions mapped reads in these two replicate experiments. What do you think?
              Last edited by rebrendi; 09-01-2012, 01:28 PM.

              Comment


              • #8
                OK,

                (1) I checked my existing RNA-seq data, admittedly a small sample, but anyway. The most interesting data point is a study where we have 134 (human) biological replicates and up to 60M (paired) reads per sample. Even with this relatively deep probing, I find 23% ENSEMBL genes with all-zero counts! (Again, it may be that ENSEMBL, which is relatively generous regarding inclusion, will systematically yield higher values) For other organisms like Drosophila, the fraction is lower.

                (2) If we forget about this zero-count business for a while, and just focus on your core problem, which is to distinguish truly expressed transcripts from truly non-expressed, I haven't found a better way to do it than the one outlined in this paper: http://www.ploscompbiol.org/article/...l.pcbi.1000598

                Basically one uses as controls a set of genomic regions for which there is no evidence of expression in any source. Then, by counting how many reads that fall into these "gold standard negative" regions, one can calculate a false positive rate for a range of RPKM values. By finding a good compromise between a low false positive rate and a low false negative rate (calculated from annotated transcripts), one can get an estimate for an RPKM cutoff.

                Comment


                • #9
                  You'll never be able tell which gene are truly not expressed. That's how science works. We can only see what is, you can never see what isn't!!!!!

                  In this case you will always be able to say, if you sequenced a little deeper a given gene would show some expression.
                  --------------
                  Ethan

                  Comment


                  • #10
                    Originally posted by kopi-o View Post
                    (2) If we forget about this zero-count business for a while, and just focus on your core problem, which is to distinguish truly expressed transcripts from truly non-expressed, I haven't found a better way to do it than the one outlined in this paper: http://www.ploscompbiol.org/article/...l.pcbi.1000598
                    Thank you, great article!

                    Comment


                    • #11
                      Originally posted by kopi-o View Post
                      (1) I checked my existing RNA-seq data, admittedly a small sample, but anyway. The most interesting data point is a study where we have 134 (human) biological replicates and up to 60M (paired) reads per sample. Even with this relatively deep probing, I find 23% ENSEMBL genes with all-zero counts!
                      so these were all-zero in all 134 replicates, or just in some fraction of them?

                      Comment


                      • #12
                        In all 134.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-27-2024, 06:37 PM
                        0 responses
                        13 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-27-2024, 06:07 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        69 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X