SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
MISO compare two samples with replicate; RNA-SEQ demis001 Bioinformatics 3 11-19-2012 12:04 PM
DESeq-statistical analysis without replicate lynn012 RNA Sequencing 0 10-27-2011 02:47 AM
RNA-Seq: A Statistical Framework for eQTL Mapping Using RNA-seq Data. Newsbot! Literature Watch 0 08-16-2011 02:00 AM
RNA seq analysis with one replicate per biological sample anle Bioinformatics 2 06-03-2011 10:16 AM
PubMed: Statistical Design and Analysis of RNA-Seq Data. Newsbot! Literature Watch 0 05-09-2010 07:00 PM

Reply
 
Thread Tools
Old 03-17-2011, 02:14 AM   #1
Claudia34
Junior Member
 
Location: Geneva, Switzerland

Join Date: Sep 2010
Posts: 9
Unhappy RNA-seq statistical analysis without replicate

Dear all,

I am applying RNA-seq to analyze small non coding RNAs
I perform differential analysis and I am currently doing it with edgeR (and will use baySeq and maybe DEseq to reinforce my analyses)
The fact is I don't have and will not have any replicate for my samples and so I'm relying on your expertise to tell me if any statistical analysis is relevant even without replicate ...

Thanks for your answers

Claudia
Claudia34 is offline   Reply With Quote
Old 03-17-2011, 03:20 AM   #2
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

No it isn't relevant
NicoBxl is offline   Reply With Quote
Old 04-03-2011, 02:43 PM   #3
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default

Hi, I'm in the same boat (attempting to compare two libraries without having replicates - they are similar to RNA-seq libraries) and am wondering whether this is going to be possible or not.

Quote:
Originally Posted by NicoBxl View Post
No it isn't relevant
Could you be a bit more specific about your answer here? W.r.t. edgeR, I just read in the abstract of the Applications Note describing the program [Robinson MD, McCarthy DJ and Smythe GK, Bioinformatics, vol. 26(1), 2010] that:

"An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated."

So to me, it sounds like you *must* have replicates to use edgeR.

Might anyone know of a method that does not require any replicates?
kerhard is offline   Reply With Quote
Old 04-03-2011, 11:09 PM   #4
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

You've to check the p-value for each of the small RNA after the DE analysis. If you've only on sample for each condition, the p-value will be big ( and therefore the DE analysis irelevant ).

But you can, even with on sample per condition, applying DESeq ( edgeR I don't know )

So it's very dangerous to jump to conclusions with no replicates.
NicoBxl is offline   Reply With Quote
Old 04-03-2011, 11:15 PM   #5
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default Analyzing RNA-seq libraries with no replicates

Perusing the vignette for DESeq (mentioned above) by Simon Anders entitled "Analysing RNA-Seq data with the 'DESeq' package", I found this bit, which i think is quite informative and to the point:

###hope it's ok to post this here, the pdf is posted on the web for free...###

"Proper replicates are essential to interpret a biological experiment.
After all, if one compares two conditions and find a difference, how else
would one know that this difference is due to the different conditions and
would not have arisen between replicates, as well, just due to noise?
Hence, any attempt to work without any replicates will lead to conclusions
of very limited reliability. Nevertheless, such experiments are often
undertaken, especially in HTS, and the DESeq package can deal with them,
even though the soundness of the results may depend very much on the
circumstances.

Our primary assumption is still that the mean is a good predictor for the
variance. Hence, if a number of genes with similar expression level are
compared between replicates, we expect that their variation is of
comparable magnitude. Once we accept this assumption, we may argue as
follows: Given two samples from different conditions and a number of genes
with comparable expression levels, of which we expect only a minority to
be influenced by the condition, we may take the variance estimated from
comparing their count rates across conditions as ersatz for a proper
estimate of the variance across replicates. After all, we assume most
genes to behave the same within replicates as across conditions, and
hence, the estimated variance should not change too much due to the
influence of the hopefully few differentially expressed genes.
Furthermore, the differentially expressed genes will only cause the
variance estimate to be too high, so that the test will err to the side of
being too conservative, i.e., we only lose power."

I'm not sure if there is a way to change the parameters of edgeR analysis to account for not having replicates, but I also haven't looked very hard yet. I am going to try out DESeq and see if it is sufficient for the level of analysis I want out of my libraries for the time being. No reason to blow another Illumina lane if it's not really necessary....
kerhard is offline   Reply With Quote
Old 04-03-2011, 11:17 PM   #6
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default

Quote:
Originally Posted by NicoBxl View Post
You've to check the p-value for each of the small RNA after the DE analysis. If you've only on sample for each condition, the p-value will be big ( and therefore the DE analysis irelevant ).

But you can, even with on sample per condition, applying DESeq ( edgeR I don't know )

So it's very dangerous to jump to conclusions with no replicates.
Thanks! That makes a lot of sense now.
kerhard is offline   Reply With Quote
Old 09-29-2011, 12:26 PM   #7
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Talking

Kerhard's post make sense! I also read "Analysing RNA-Seq data with the 'DESeq' package". And I have another question: All the Codes using DESeq
in R is for the Example with Replicates. like
Then, the minimal set of commands to run a full analysis is:
> cds <- newCountDataSet( countsTable, conds )
> cds <- estimateSizeFactors( cds )
> cds <- estimateVarianceFunctions( cds )
> res <- nbinomTest( cds, "T", "N")


So how to change the commonds to fit two conditions without replicates?

Thanks for the response and further discussions.



Quote:
Originally Posted by kerhard View Post
Perusing the vignette for DESeq (mentioned above) by Simon Anders entitled "Analysing RNA-Seq data with the 'DESeq' package", I found this bit, which i think is quite informative and to the point:

###hope it's ok to post this here, the pdf is posted on the web for free...###

"Proper replicates are essential to interpret a biological experiment.
After all, if one compares two conditions and find a difference, how else
would one know that this difference is due to the different conditions and
would not have arisen between replicates, as well, just due to noise?
Hence, any attempt to work without any replicates will lead to conclusions
of very limited reliability. Nevertheless, such experiments are often
undertaken, especially in HTS, and the DESeq package can deal with them,
even though the soundness of the results may depend very much on the
circumstances.

Our primary assumption is still that the mean is a good predictor for the
variance. Hence, if a number of genes with similar expression level are
compared between replicates, we expect that their variation is of
comparable magnitude. Once we accept this assumption, we may argue as
follows: Given two samples from different conditions and a number of genes
with comparable expression levels, of which we expect only a minority to
be influenced by the condition, we may take the variance estimated from
comparing their count rates across conditions as ersatz for a proper
estimate of the variance across replicates. After all, we assume most
genes to behave the same within replicates as across conditions, and
hence, the estimated variance should not change too much due to the
influence of the hopefully few differentially expressed genes.
Furthermore, the differentially expressed genes will only cause the
variance estimate to be too high, so that the test will err to the side of
being too conservative, i.e., we only lose power."

I'm not sure if there is a way to change the parameters of edgeR analysis to account for not having replicates, but I also haven't looked very hard yet. I am going to try out DESeq and see if it is sufficient for the level of analysis I want out of my libraries for the time being. No reason to blow another Illumina lane if it's not really necessary....
byou678 is offline   Reply With Quote
Old 10-20-2011, 04:23 AM   #8
pschwien
Junior Member
 
Location: Walnut Creek, CA, USA

Join Date: Jan 2009
Posts: 4
Default

Hi byou678,

a few pages later in the DESeq documentation you'll find the answer, the only thing you have to change is the estimateVarianceFunction like this:
cds <- estimateVarianceFunctions(cds, method="blind")

Regards,
Patrick

Quote:
Originally Posted by byou678 View Post
Kerhard's post make sense! I also read "Analysing RNA-Seq data with the 'DESeq' package". And I have another question: All the Codes using DESeq
in R is for the Example with Replicates. like
Then, the minimal set of commands to run a full analysis is:
> cds <- newCountDataSet( countsTable, conds )
> cds <- estimateSizeFactors( cds )
> cds <- estimateVarianceFunctions( cds )
> res <- nbinomTest( cds, "T", "N")


So how to change the commonds to fit two conditions without replicates?

Thanks for the response and further discussions.
pschwien is offline   Reply With Quote
Old 10-20-2011, 06:33 AM   #9
lynn012
Junior Member
 
Location: china

Join Date: Sep 2010
Posts: 9
Default

I have the same situation.
When I run the minimal set of commands:
> cds <- newCountDataSet( countsTable, conds )
> cds <- estimateSizeFactors( cds )
> cds <- estimateVarianceFunctions( cds, method="blind")
> res <- nbinomTest( cds, "A", "B")
Error: condA %in% levels(conditions(cds)) is not TRUE
what's the problem?
lynn012 is offline   Reply With Quote
Old 10-21-2011, 07:05 AM   #10
pschwien
Junior Member
 
Location: Walnut Creek, CA, USA

Join Date: Jan 2009
Posts: 4
Default

Could some expert in the field comment on my finding, that only DESeq and the recently published NOISeq allow for DE testing with no replicates? All other tools I searched (edgeR, DEGSeq, BaySeq, Cufflinks) need at least one of the conditions to be in duplicate. Is this correct?

Thank's in advance!
pschwien is offline   Reply With Quote
Old 10-23-2011, 08:00 PM   #11
Dario1984
Senior Member
 
Location: Sydney, Australia

Join Date: Jun 2011
Posts: 166
Default

I've used cufflinks and edgeR without replicates. They work without problems in such situations.
Dario1984 is offline   Reply With Quote
Old 10-24-2011, 09:18 AM   #12
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245
Default

Quote:
Originally Posted by pschwien View Post
Could some expert in the field comment on my finding, that only DESeq and the recently published NOISeq allow for DE testing with no replicates? All other tools I searched (edgeR, DEGSeq, BaySeq, Cufflinks) need at least one of the conditions to be in duplicate. Is this correct?

Thank's in advance!
The issue is not whether you can find some tool that will compute a mathematical solution and spit out a p-value. The real question is does your biological interpretation of those numbers have any real significance or meaning in the absence of replicates. And, the growing consensus seems to be that no, such results are largely valueless as they have no statistical rigor.

In other words, you can have little to no confidence that your apparently significant results are, in fact, significant at all and not just the product of random events.

I can fully understand those situations where it is truly impossible to replicate. I've handled data from some human studies where tissue or samples were extraordinarily hard to come by, and we were lucky to get enough tissue for a single experiment per individual (we did not use NGS though, we went with Affy arrays so I could at least use probe level data to gain some statistical rigor in DE analysis - used SScore in R).

But other then the situation where it truly is not possible, I think replication should be considered an absolute essential condition for an experiment. Especially once you factor in the issue of multiplicity in the huge number of tests being compared in a DE analysis, the absence of replicates makes it impossible to impose any statistically rigorous significance threshold.

Last edited by mbblack; 10-24-2011 at 09:29 AM.
mbblack is offline   Reply With Quote
Old 10-24-2011, 03:00 PM   #13
Dario1984
Senior Member
 
Location: Sydney, Australia

Join Date: Jun 2011
Posts: 166
Default

That's a good point. That design would give more reproducible conclusions. Here is a good article I read sometime ago that discusses exactly that.
Dario1984 is offline   Reply With Quote
Old 01-16-2013, 07:59 PM   #14
feeldead
Junior Member
 
Location: china

Join Date: Oct 2010
Posts: 6
Default

Have you tried GFOLD? This a tool specifically designed for no replicate case.
http://bioinformatics.oxfordjournals...ts515.abstract
feeldead is offline   Reply With Quote
Reply

Tags
replicate, rna-seq, statistics

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:25 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO