![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
many biological replicates - 'traditional' statistics vs Cuffdiff or DESeq/edgeR? | sjm | RNA Sequencing | 0 | 03-31-2012 09:42 AM |
Differential expression from RNA-seq: variation between replicates | beans | RNA Sequencing | 6 | 11-03-2011 10:45 AM |
Differential Expression analysis without replicates | polsum | Bioinformatics | 1 | 08-05-2011 04:40 AM |
Differential gene expression: Can Cufflinks/Cuffcompare handle biological replicates? | marcora | Bioinformatics | 38 | 12-14-2010 04:57 PM |
Differential gene expression: Can Cufflinks/Cuffcompare handle biological replicates? | marcora | Bioinformatics | 0 | 05-19-2010 02:11 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: USA Join Date: Aug 2010
Posts: 103
|
![]()
Hi all,
I am a cufflinks user and I am trying to test other popular gene expression analysis tools such as edgeR and DESeq. In most of my projects we only have one Normal and one Tumor sample. Though there has been a lot of discussions, it is still unclear to me if edgeR or DESeq is "better" than cuffdiff when there are no biological replicates. Any advice will be appreciated. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Research Triangle Park, NC Join Date: Aug 2009
Posts: 245
|
![]()
In the complete absence of replicates, I don't think any statistical tool is going to be worth a dang for differential gene expression. All you can do is look at simple differences in counts, with no means at all of assessing the significance of those differences. The statistics cannot compensate for a complete lack of adequate data for the analysis in question, and without some minimal number of replicates (3 is really the minimum, 4 or more would be far better), there is no way to assign statistical significance.
I know the vignettes for tools like edgeR talk about good performance "...even for experiments with minimal levels of biological replication" (quoting from the edgeR manual), but note the use of the word "minimum". A complete absence of replication is not minimum, and in the complete absence of replication, you cannot perform statistical tests of significance for differences. And since you have no statistical power at all, comparing different analytical tools seems pointless to me. |
![]() |
![]() |
![]() |
#3 |
Member
Location: MPI Join Date: Jun 2010
Posts: 17
|
![]()
I have to agree with mbblack. you should try to gain more statistical power by getting at least 3 replicates per treatment. otherwise your comparision is not really meaningful.
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: USA Join Date: Aug 2010
Posts: 103
|
![]()
Many thanks, mbblack and lexa.
Lacking of replicates is indeed an issue for some of my projects. Unfortunately, these collaborators will not proceed to sequence replicates until they find something interesting in the current data. They even wish to have a short, "reliable" list of DE genes or differentially spliced that makes sense, while we are not able to achieve this without replicates. It is really a dilema. It is important for biologists to discuss with bioinformaticians before they submit the samples for sequencing. |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Kansas City Join Date: Mar 2008
Posts: 197
|
![]()
What you could do is run both and show them the resulting gene lists for both and the intersection (venn diagram?)
|
![]() |
![]() |
![]() |
#6 |
Member
Location: MPI Join Date: Jun 2010
Posts: 17
|
![]()
that's hard. anyway, you could try to get a 'reliable' gene set using different methods and just take the overlap from different methods. maybe, you should take genes verified by at least 2 different methods. then, do a literature search for the genes you found. maybe, some of the genes you find are already described.
|
![]() |
![]() |
![]() |
#7 | |
Member
Location: Iowa Join Date: Oct 2008
Posts: 28
|
![]()
edgeR does mention a method for dealing with lack of replication by assigning a variance value
Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: USA Join Date: Aug 2010
Posts: 103
|
![]()
In my mind I tried that a long time ago. I found that the result is sensitive to the selected dispersion coefficient.
|
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: Research Triangle Park, NC Join Date: Aug 2009
Posts: 245
|
![]() Quote:
They really need to do a proper pilot study, with 3-5 replicates to see just what they have to work with. Otherwise, all you can tell them is what is different, but with no statistical ranking of significance nor any idea of how variable those differences may be. It is not that you have minimal statistical power without replicates, you have none. All you have is simple numeric differences of some count or normalized values, and nothing more. And you have no idea at all if those differences are real biological differences, or random experimental noise. And there is nothing unique to RNAseq data about that - you cannot compute statistics on a simple difference between two single numbers.
__________________
Michael Black, Ph.D. ScitoVation LLC. RTP, N.C. |
|
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: USA Join Date: Aug 2010
Posts: 103
|
![]()
I could not agree more. Inferring a short list of DE genes from an expensive(compared to array data) RNA-Seq sequencing for even one single pair of samples is some collaborators' dream. Some even prefer to spend money on sequencing more cell line types rather than replicates. I find it is hard to persuade them.
Without replicates, what we can provide is only the list of DE genes based on statistical models such as poisson but this will never reflect the truth without sufficient replicates. Quote:
|
|
![]() |
![]() |
![]() |
#11 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
Knowing that it is unwise to do experiments without replication, I find myself in exactly that situation. (pooled samples).
I've analysed these data with older versions of DE-Seq, but now would also like to try edgeR. I can't seem to decipher exactly how one does this analysis without replicates based on the vignette. Anyone able to help me out/share a script? It's pretty clear that both DEseq and edgeR camps are now strongly discouraging such efforts (does DEseq2 even stil incorporate such analyses?), but still need to give it a go in this case. Thanks! |
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: Research Triangle Park, NC Join Date: Aug 2009
Posts: 245
|
![]()
To be honest, my opinion is that the first option mentioned in the edgeR vignette is really the only valid approach to follow in that situation. To quote from page 18:
"1. Be satised with a descriptive analysis, that might include an MDS plot and an analysis of fold changes. Do not attempt a signicance analysis. This may be the best advice." In other words, make your argument for significantly differentially expressed genes based solely on the magnitude of measured differences between samples and accept that you cannot perform any reliable or valid statistical significance testing. I just think it is pointless to spend a lot of time running algorithms or code on a data set that fundamentally cannot be analyzed statistically. Basically, what is the point of the effort if the stats are meaningless or open to vigorous negative criticism?
__________________
Michael Black, Ph.D. ScitoVation LLC. RTP, N.C. |
![]() |
![]() |
![]() |
#13 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
thanks, option 1 is basically what we are doing. but also trying to scrutinize the data in as many ways as possible. we pooled 10 individuals per library, and our results seem not hopeless in that we can see some of the things we know we should see, and these do hold up to DESeq stats ("working without replicates"). but its the novel stuff that is more problematic. We'll be finding out via qPCR and in situs, I suppose, how well these stats hold up. But yes, not so optimistic. should also say that we are have 3 groups, not 2 so we at least have a bit more information on variability.
Last edited by chrisbala; 05-23-2013 at 08:23 AM. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|