SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Preprocessing needed for RNA-Seq data PFS Bioinformatics 10 03-06-2014 08:36 AM
New to RNA-Seq: Help obtaining sequencing summary needed. ccard28 Bioinformatics 12 05-14-2012 12:44 AM
what equipment is needed for analyzing RNA-seq data IceWater General 0 04-16-2012 05:55 PM
500 million reads needed for RNA-Seq?! epistatic RNA Sequencing 6 10-31-2011 03:53 PM
quality of RNA needed for prokaryotic RNA-seq? greigite RNA Sequencing 1 12-01-2010 09:53 AM

Reply
 
Thread Tools
Old 09-01-2012, 11:33 AM   #1
rebrendi
ng
 
Location: LA

Join Date: May 2008
Posts: 78
Default RNA-seq results interpretation - help needed

Hello,

I am using a standard procedure for RNA-seq, then TopHat followed by DeSeq to determine differential expression in my cell lines from the total RNA sequencing. I am using 2-3 replicates per cell line, with ~30-40 million reads. What surprises me is that for ~9% of all transcripts, I am getting zero expression in all replicates in one of the cell lines. Exactly zero, no reads at all for these transcripts. It is even not possible to calculate the log2 ratio for these genes, since the log of 0 does not exist. Should I consider that these genes are completely shut down in this cell line? Is it common like this?

Thanks!

Last edited by rebrendi; 09-01-2012 at 12:03 PM.
rebrendi is offline   Reply With Quote
Old 09-01-2012, 12:33 PM   #2
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

I would say it's normal, yes. At least this kind of thing is what I typically observe.
kopi-o is offline   Reply With Quote
Old 09-01-2012, 12:36 PM   #3
rebrendi
ng
 
Location: LA

Join Date: May 2008
Posts: 78
Default

Quote:
Originally Posted by kopi-o View Post
I would say it's normal, yes. At least this kind of thing is what I typically observe.
and you considered that all those transcripts have no expression, or just the signal is missing?
rebrendi is offline   Reply With Quote
Old 09-01-2012, 12:44 PM   #4
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Well, of course if the seq depth is very low you will get zero counts for transcripts that are really expressed. Also discarding multi-mapping reads could lead to this sort of effect. But in general, I tend to assume most of the all-zero transcripts are really not expressed.

Perhaps I should go back to my existing RNA-seq data and plot the fraction of all-zero count genes against the sequencing depth. That might give a clue about when the fraction of zero-count genes starts to bottom out.
kopi-o is offline   Reply With Quote
Old 09-01-2012, 12:51 PM   #5
rebrendi
ng
 
Location: LA

Join Date: May 2008
Posts: 78
Default

Quote:
Originally Posted by kopi-o View Post
Perhaps I should go back to my existing RNA-seq data and plot the fraction of all-zero count genes against the sequencing depth. That might give a clue about when the fraction of zero-count genes starts to bottom out.
Yes, that would be the best check. I have actually, for one of the cell lines, two replicate experiments with 30,000 and 5,000 mapped reads. Both of them have these ~8-9% transcripts with zero reads.
rebrendi is offline   Reply With Quote
Old 09-01-2012, 01:01 PM   #6
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

30,000 and 5,000 mapped reads, respectively, seems awfully low. I am surprised you have as few as 8-9% zero-count transcripts, unless it is a bacterium or something, but you said it was a cell line. Are these human cell lines or some other species? And what transcript annotation (e g RefSeq) do you use? I use ENSEMBL and I suspect that in itself leads to a larger fraction of zero-count genes.
kopi-o is offline   Reply With Quote
Old 09-01-2012, 01:25 PM   #7
rebrendi
ng
 
Location: LA

Join Date: May 2008
Posts: 78
Default

Quote:
Originally Posted by kopi-o View Post
30,000 and 5,000 mapped reads, respectively, seems awfully low. I am surprised you have as few as 8-9% zero-count transcripts, unless it is a bacterium or something, but you said it was a cell line. Are these human cell lines or some other species? And what transcript annotation (e g RefSeq) do you use? I use ENSEMBL and I suspect that in itself leads to a larger fraction of zero-count genes.
I am using Eldorado, it contains much more than RefSeq, so more noise. But I am getting non-zero expression for these 9% transcripts in one cell line, and zero expression in another line, so this is not the annotation artifact. Sorry, I misprinted in the last post, I have 30 millions and 5 millions mapped reads in these two replicate experiments. What do you think?

Last edited by rebrendi; 09-01-2012 at 01:28 PM.
rebrendi is offline   Reply With Quote
Old 09-02-2012, 02:56 AM   #8
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

OK,

(1) I checked my existing RNA-seq data, admittedly a small sample, but anyway. The most interesting data point is a study where we have 134 (human) biological replicates and up to 60M (paired) reads per sample. Even with this relatively deep probing, I find 23% ENSEMBL genes with all-zero counts! (Again, it may be that ENSEMBL, which is relatively generous regarding inclusion, will systematically yield higher values) For other organisms like Drosophila, the fraction is lower.

(2) If we forget about this zero-count business for a while, and just focus on your core problem, which is to distinguish truly expressed transcripts from truly non-expressed, I haven't found a better way to do it than the one outlined in this paper: http://www.ploscompbiol.org/article/...l.pcbi.1000598

Basically one uses as controls a set of genomic regions for which there is no evidence of expression in any source. Then, by counting how many reads that fall into these "gold standard negative" regions, one can calculate a false positive rate for a range of RPKM values. By finding a good compromise between a low false positive rate and a low false negative rate (calculated from annotated transcripts), one can get an estimate for an RPKM cutoff.
kopi-o is offline   Reply With Quote
Old 09-02-2012, 03:01 AM   #9
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

You'll never be able tell which gene are truly not expressed. That's how science works. We can only see what is, you can never see what isn't!!!!!

In this case you will always be able to say, if you sequenced a little deeper a given gene would show some expression.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 09-02-2012, 03:26 AM   #10
rebrendi
ng
 
Location: LA

Join Date: May 2008
Posts: 78
Default

Quote:
Originally Posted by kopi-o View Post
(2) If we forget about this zero-count business for a while, and just focus on your core problem, which is to distinguish truly expressed transcripts from truly non-expressed, I haven't found a better way to do it than the one outlined in this paper: http://www.ploscompbiol.org/article/...l.pcbi.1000598
Thank you, great article!
rebrendi is offline   Reply With Quote
Old 09-02-2012, 03:27 AM   #11
rebrendi
ng
 
Location: LA

Join Date: May 2008
Posts: 78
Default

Quote:
Originally Posted by kopi-o View Post
(1) I checked my existing RNA-seq data, admittedly a small sample, but anyway. The most interesting data point is a study where we have 134 (human) biological replicates and up to 60M (paired) reads per sample. Even with this relatively deep probing, I find 23% ENSEMBL genes with all-zero counts!
so these were all-zero in all 134 replicates, or just in some fraction of them?
rebrendi is offline   Reply With Quote
Old 09-02-2012, 03:36 AM   #12
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

In all 134.
kopi-o is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO