SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq without biological replicates (dataset: Marioni et al.) Andrea Apolloni Bioinformatics 3 01-13-2012 04:58 AM
DESeq: question about with replicates and without any replicates. nb509 RNA Sequencing 2 10-25-2011 06:04 AM
DESeq for a small sets of sequences without replicates starlight Bioinformatics 6 09-05-2011 10:39 AM
DESeq analysis without replicates for 16 tissues johannes.helmuth Bioinformatics 0 05-25-2011 01:53 AM
DESeq: question about baseMean. Also, replicates. Azazel Bioinformatics 5 05-18-2011 10:51 PM

Reply
 
Thread Tools
Old 10-13-2010, 02:12 PM   #1
austinpa
Junior Member
 
Location: los angeles

Join Date: Aug 2010
Posts: 4
Default DESeq without replicates

Hello,

I am unable to get method=blind to work to calculate variance without replicates. Or actually any "method" argument from the vignette. Is this now obsolete? Are the only variance calculations supported now "pooled" or with replicates? Has anyone else had trouble with this recently?

Thanks,
Austin
austinpa is offline   Reply With Quote
Old 11-04-2010, 11:14 AM   #2
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by austinpa View Post
Hello,

I am unable to get method=blind to work to calculate variance without replicates. Or actually any "method" argument from the vignette. Is this now obsolete? Are the only variance calculations supported now "pooled" or with replicates? Has anyone else had trouble with this recently?

Thanks,
Austin
try
Code:
 method="blind"
RockChalkJayhawk is offline   Reply With Quote
Old 11-05-2010, 01:29 AM   #3
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

... and make sure you use a current version of DESeq. I've added the 'method' argument only a few months ago.
Simon Anders is offline   Reply With Quote
Old 11-05-2010, 04:50 AM   #4
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by Simon Anders View Post
... and make sure you use a current version of DESeq. I've added the 'method' argument only a few months ago.
Do you also have some more documentation on the 'method' argument? I can't seem to find it.
RockChalkJayhawk is offline   Reply With Quote
Old 11-05-2010, 06:25 AM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by RockChalkJayhawk View Post
Do you also have some more documentation on the 'method' argument? I can't seem to find it.
Start R, load DESeq, and type "?estimateVarianceFunctions". If you don't see anything there about 'metho', you have an old DESeq version.

Simon
Simon Anders is offline   Reply With Quote
Old 11-05-2010, 10:49 AM   #6
austinpa
Junior Member
 
Location: los angeles

Join Date: Aug 2010
Posts: 4
Default DESeq without replicates

Great. Thanks Simon. I just needed the newer version of DESeq and R.

Austin


Quote:
Originally Posted by Simon Anders View Post
... and make sure you use a current version of DESeq. I've added the 'method' argument only a few months ago.
austinpa is offline   Reply With Quote
Old 01-03-2011, 08:09 PM   #7
taoxiang180
Junior Member
 
Location: sichuan provine

Join Date: Jun 2010
Posts: 9
Default

Hi, everyone.
I am a nevice to R and DESeq. Working without any replicates, I went through each step of differential gene expression analysis from the DESeq manual, But the results really puzzled me: some gene have a pval<0.05, but have a padj of 1, how the padj was calculated? and how to set the padj threshold?Any pointers will be appreciated.Thanks!

Last edited by taoxiang180; 01-03-2011 at 08:47 PM.
taoxiang180 is offline   Reply With Quote
Old 01-03-2011, 08:11 PM   #8
taoxiang180
Junior Member
 
Location: sichuan provine

Join Date: Jun 2010
Posts: 9
Default

Hi, everyone.
I am a novice to R and DESeq. Working without any replicates, I went through each step of differential gene expression analysis from the DESeq manual, But the results really puzzled me: some gene have a pval<0.05, but have a padj of 1, how the padj was calculated? and how to set the padj threshold?Any pointers will be appreciated.Thanks!
taoxiang180 is offline   Reply With Quote
Old 01-04-2011, 04:59 AM   #9
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by taoxiang180 View Post
Hi, everyone.
I am a novice to R and DESeq. Working without any replicates, I went through each step of differential gene expression analysis from the DESeq manual, But the results really puzzled me: some gene have a pval<0.05, but have a padj of 1, how the padj was calculated? and how to set the padj threshold?Any pointers will be appreciated.Thanks!
The padj corrects for multiple testing to limit your false discovery rate. Using this value, you can essentially interpret P<0.05 to mean that you are 95% sure there is a difference in expression.
RockChalkJayhawk is offline   Reply With Quote
Old 01-04-2011, 07:28 AM   #10
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by RockChalkJayhawk View Post
The padj corrects for multiple testing to limit your false discovery rate. Using this value, you can essentially interpret P<0.05 to mean that you are 95% sure there is a difference in expression.
Not quite. Especially, your phrasing "95% sure" is to imprecise to capture the difference between an unadjusted p value and one adjusted for FDR control.

I've just googled a bit, looking for a good primer to explain this concept (which is of vital importance: don't even think about analysing genomics data if you don't know what the multiple hypothesis testing problem is), but most of the stuff is too technical for a reader without statistics training. This paper here might explain it reasonably well, but is a bit lengthy:

Pounds SB. Estimation and control of multiple testing error rates for microarray studies.
Briefings in Bioinformatics. 2006;7(1):25-36.
http://bib.oxfordjournals.org/cgi/co...bstract/7/1/25

(DESeq uses Benjamini and Hochberg's method. See the article to learn what this means.)

Simon
Simon Anders is offline   Reply With Quote
Old 01-04-2011, 07:41 AM   #11
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by Simon Anders View Post
Not quite. Especially, your phrasing "95% sure" is to imprecise to capture the difference between an unadjusted p value and one adjusted for FDR control.

I've just googled a bit, looking for a good primer to explain this concept (which is of vital importance: don't even think about analysing genomics data if you don't know what the multiple hypothesis testing problem is), but most of the stuff is too technical for a reader without statistics training. This paper here might explain it reasonably well, but is a bit lengthy:

Pounds SB. Estimation and control of multiple testing error rates for microarray studies.
Briefings in Bioinformatics. 2006;7(1):25-36.
http://bib.oxfordjournals.org/cgi/co...bstract/7/1/25

(DESeq uses Benjamini and Hochberg's method. See the article to learn what this means.)

Simon
Thanks for pointing that out. That's what I get for not finishing my coffee before I type.

Perhaps a better way to say it is that the padj changes the pvalues so that only 5% are likely false positives - it has nothing to do with the specific gene being differentially expressed.
RockChalkJayhawk is offline   Reply With Quote
Old 01-04-2011, 08:02 AM   #12
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Exactly.

So for those to lazy to read the paper:

If you have 10,000 genes and you use a threshold of 0.05 on the raw p values, you should get around 500 false positives (5% of 10,000). So if there are 1000 genes with p<0.05, about half of them might be false positives.

If you use a threshold of 0.05 on the adjusted p values, you will find fewer genes, let's say, 100, but now, you know only 5% of these are false positives (here: 5 of 100 genes).

Simon
Simon Anders is offline   Reply With Quote
Old 01-04-2011, 05:37 PM   #13
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

The main problem is no replicates, this means the variance is calculated using both your samples pooled so a large difference will have a large variance and thus less statistical significance. You may want to ignore the p-values completely and just look at highest and lowest fold change, because you have no replicates you will need to confirm pretty much everything with (replicate) qPCR.
Jeremy is offline   Reply With Quote
Old 01-04-2011, 10:33 PM   #14
taoxiang180
Junior Member
 
Location: sichuan provine

Join Date: Jun 2010
Posts: 9
Default

Thanks for so much recommendation!
When work without any replicates and have a padj=1, what can I do next using DESeq?
(Obviously, we can not make a conclution that there are no differential expression between two groups)
According to the Yoav Benjamini's method(2001), once the P-value are
available from any statistical software, the extra FDR calculation can be done easily within a spreadsheet software such as excel using the built-in functions, can I calculate FDR in this way?
taoxiang180 is offline   Reply With Quote
Old 01-04-2011, 10:47 PM   #15
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

As Jeremy said, adjusted p values are of little use in your situation. Doing experiments without replicates is simply a bad idea.

Just start from the largest fold change (or maybe from the smallest unadjusted p value) and keep doing qPCR (with at least, say, five, biological replicates!) for all genes until you run out of patience.

And you don't have to use Excel, you can use R's 'p.adjust' function for the multiple testing calculation, For your convenience, DESeq runs 'p.adjust' with 'method="BH"' (Benjamini-Hochberg) on the 'pval' column of the result and reports this in the 'padj' column.

Alternatively, you may want to give this, quite new, idea a try:

Wu Z, Jenkins BD, Rynearson TA, et al. Empirical Bayes Analysis of Sequencing-based Transcriptional Profiling without Replicates.
BMC Bioinformatics. 2010;11(1):564. http://www.biomedcentral.com/1471-2105/11/564
Simon Anders is offline   Reply With Quote
Old 01-05-2011, 11:15 PM   #16
taoxiang180
Junior Member
 
Location: sichuan provine

Join Date: Jun 2010
Posts: 9
Default

Quote:
Originally Posted by Simon Anders View Post
As Jeremy said, adjusted p values are of little use in your situation. Doing experiments without replicates is simply a bad idea.

Wu Z, Jenkins BD, Rynearson TA, et al. Empirical Bayes Analysis of Sequencing-based Transcriptional Profiling without Replicates.
BMC Bioinformatics. 2010;11(1):564. http://www.biomedcentral.com/1471-2105/11/564

Hi,Simon,Thanks again for your help.
For me, it is really a miserable work.
I will try it again following you advice.
taoxiang180 is offline   Reply With Quote
Old 01-29-2011, 08:01 PM   #17
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

From the DESeq paper:
Quote:
If neither condition has replicates, one can still perform an analysis based on the assumption that for most genes, there is no true differential abundance, and that a valid mean-variance relationship can be estimated from treating the two samples as if they were replicates.

Code:
           SampleA    SampleB
Gene1        10000      20000
Gene2        15000      25000
Does this mean the variance for Gene1SampleA is calculated from 10000 and 20000

--or--

The variance for Gene1SampleA depends on the whole SampleA variance involving 10000 and 15000?
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 01-30-2011, 12:46 AM   #18
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

There is no such thing as a "SampleA variance". Just calculating the sample variance of all the numbers in sample A would not give any meaningful number.

What DESeq does is the following. For each gene, it calculates the variance of the counts from all sample of one treatment group (or, in your case, simply from all samples). (Special care is taken here to take into account that different samples may have been sequenced to different depth.) As this variance is obtained from a very small number of samples, it is very imprecise. We now assume that genes of similar expression strength have similar variance and so take an average of all such genes to get a variance value to be used for a certain expression strength. (Technically, this is done with a local regression of the gamma family.)

Simon
Simon Anders is offline   Reply With Quote
Old 02-22-2011, 05:58 AM   #19
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

(Hi Simon, emailed you but that might have been blocked with attachments)

I notice DESeq is calling low or zero fold change genes as significantly differentially expressed in situations with no replicates and a small number of genes (e.g. miRNAs). Is this something you have encountered?


Last edited by Zigster; 02-23-2011 at 12:33 PM.
Zigster is offline   Reply With Quote
Old 02-22-2011, 06:05 AM   #20
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

This shouldn't happen. Maybe you send me some more details.

There is one issue with some numerical instability in the p value calculation that I have not yet got fully straightened out. Especially in samples with very large variance, one very rarely encounters the situation that hardly any genes are differentially expressed, and some genes with really small log fold changes get flagged as differentially expressed even though there are genes with much stronger fold change nearby which are not called. If this happen, please try to call 'nbinomTest' with the optional parameter 'eps=1e-8' (or an even lower value).

Simon
Simon Anders is offline   Reply With Quote
Reply

Tags
deseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO