![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
DESeq without biological replicates (dataset: Marioni et al.) | Andrea Apolloni | Bioinformatics | 3 | 01-13-2012 05:58 AM |
DESeq: question about with replicates and without any replicates. | nb509 | RNA Sequencing | 2 | 10-25-2011 07:04 AM |
DESeq for a small sets of sequences without replicates | starlight | Bioinformatics | 6 | 09-05-2011 11:39 AM |
DESeq analysis without replicates for 16 tissues | johannes.helmuth | Bioinformatics | 0 | 05-25-2011 02:53 AM |
DESeq: question about baseMean. Also, replicates. | Azazel | Bioinformatics | 5 | 05-18-2011 11:51 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: los angeles Join Date: Aug 2010
Posts: 4
|
![]()
Hello,
I am unable to get method=blind to work to calculate variance without replicates. Or actually any "method" argument from the vignette. Is this now obsolete? Are the only variance calculations supported now "pooled" or with replicates? Has anyone else had trouble with this recently? Thanks, Austin |
![]() |
![]() |
![]() |
#2 | |
Senior Member
Location: Rochester, MN Join Date: Mar 2009
Posts: 191
|
![]() Quote:
Code:
method="blind" |
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
... and make sure you use a current version of DESeq. I've added the 'method' argument only a few months ago.
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Rochester, MN Join Date: Mar 2009
Posts: 191
|
![]() |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]() |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: los angeles Join Date: Aug 2010
Posts: 4
|
![]() |
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: sichuan provine Join Date: Jun 2010
Posts: 9
|
![]()
Hi, everyone.
I am a nevice to R and DESeq. Working without any replicates, I went through each step of differential gene expression analysis from the DESeq manual, But the results really puzzled me: some gene have a pval<0.05, but have a padj of 1, how the padj was calculated? and how to set the padj threshold?Any pointers will be appreciated.Thanks! Last edited by taoxiang180; 01-03-2011 at 09:47 PM. |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: sichuan provine Join Date: Jun 2010
Posts: 9
|
![]()
Hi, everyone.
I am a novice to R and DESeq. Working without any replicates, I went through each step of differential gene expression analysis from the DESeq manual, But the results really puzzled me: some gene have a pval<0.05, but have a padj of 1, how the padj was calculated? and how to set the padj threshold?Any pointers will be appreciated.Thanks! |
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: Rochester, MN Join Date: Mar 2009
Posts: 191
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]() Quote:
I've just googled a bit, looking for a good primer to explain this concept (which is of vital importance: don't even think about analysing genomics data if you don't know what the multiple hypothesis testing problem is), but most of the stuff is too technical for a reader without statistics training. This paper here might explain it reasonably well, but is a bit lengthy: Pounds SB. Estimation and control of multiple testing error rates for microarray studies. Briefings in Bioinformatics. 2006;7(1):25-36. http://bib.oxfordjournals.org/cgi/co...bstract/7/1/25 (DESeq uses Benjamini and Hochberg's method. See the article to learn what this means.) Simon |
|
![]() |
![]() |
![]() |
#11 | |
Senior Member
Location: Rochester, MN Join Date: Mar 2009
Posts: 191
|
![]() Quote:
Perhaps a better way to say it is that the padj changes the pvalues so that only 5% are likely false positives - it has nothing to do with the specific gene being differentially expressed. |
|
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Exactly.
So for those to lazy to read the paper: If you have 10,000 genes and you use a threshold of 0.05 on the raw p values, you should get around 500 false positives (5% of 10,000). So if there are 1000 genes with p<0.05, about half of them might be false positives. If you use a threshold of 0.05 on the adjusted p values, you will find fewer genes, let's say, 100, but now, you know only 5% of these are false positives (here: 5 of 100 genes). Simon |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Pathum Thani, Thailand Join Date: Nov 2009
Posts: 190
|
![]()
The main problem is no replicates, this means the variance is calculated using both your samples pooled so a large difference will have a large variance and thus less statistical significance. You may want to ignore the p-values completely and just look at highest and lowest fold change, because you have no replicates you will need to confirm pretty much everything with (replicate) qPCR.
|
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: sichuan provine Join Date: Jun 2010
Posts: 9
|
![]()
Thanks for so much recommendation!
When work without any replicates and have a padj=1, what can I do next using DESeq? (Obviously, we can not make a conclution that there are no differential expression between two groups) According to the Yoav Benjamini's method(2001), once the P-value are available from any statistical software, the extra FDR calculation can be done easily within a spreadsheet software such as excel using the built-in functions, can I calculate FDR in this way? |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
As Jeremy said, adjusted p values are of little use in your situation. Doing experiments without replicates is simply a bad idea.
Just start from the largest fold change (or maybe from the smallest unadjusted p value) and keep doing qPCR (with at least, say, five, biological replicates!) for all genes until you run out of patience. And you don't have to use Excel, you can use R's 'p.adjust' function for the multiple testing calculation, For your convenience, DESeq runs 'p.adjust' with 'method="BH"' (Benjamini-Hochberg) on the 'pval' column of the result and reports this in the 'padj' column. Alternatively, you may want to give this, quite new, idea a try: Wu Z, Jenkins BD, Rynearson TA, et al. Empirical Bayes Analysis of Sequencing-based Transcriptional Profiling without Replicates. BMC Bioinformatics. 2010;11(1):564. http://www.biomedcentral.com/1471-2105/11/564 |
![]() |
![]() |
![]() |
#16 | |
Junior Member
Location: sichuan provine Join Date: Jun 2010
Posts: 9
|
![]() Quote:
Hi,Simon,Thanks again for your help. For me, it is really a miserable work. I will try it again following you advice. |
|
![]() |
![]() |
![]() |
#17 | |
(Jeremy Leipzig)
Location: Philadelphia, PA Join Date: May 2009
Posts: 116
|
![]()
From the DESeq paper:
Quote:
Code:
SampleA SampleB Gene1 10000 20000 Gene2 15000 25000 --or-- The variance for Gene1SampleA depends on the whole SampleA variance involving 10000 and 15000? |
|
![]() |
![]() |
![]() |
#18 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
There is no such thing as a "SampleA variance". Just calculating the sample variance of all the numbers in sample A would not give any meaningful number.
What DESeq does is the following. For each gene, it calculates the variance of the counts from all sample of one treatment group (or, in your case, simply from all samples). (Special care is taken here to take into account that different samples may have been sequenced to different depth.) As this variance is obtained from a very small number of samples, it is very imprecise. We now assume that genes of similar expression strength have similar variance and so take an average of all such genes to get a variance value to be used for a certain expression strength. (Technically, this is done with a local regression of the gamma family.) Simon |
![]() |
![]() |
![]() |
#19 |
(Jeremy Leipzig)
Location: Philadelphia, PA Join Date: May 2009
Posts: 116
|
![]()
(Hi Simon, emailed you but that might have been blocked with attachments)
I notice DESeq is calling low or zero fold change genes as significantly differentially expressed in situations with no replicates and a small number of genes (e.g. miRNAs). Is this something you have encountered? ![]() Last edited by Zigster; 02-23-2011 at 01:33 PM. |
![]() |
![]() |
![]() |
#20 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
This shouldn't happen. Maybe you send me some more details.
There is one issue with some numerical instability in the p value calculation that I have not yet got fully straightened out. Especially in samples with very large variance, one very rarely encounters the situation that hardly any genes are differentially expressed, and some genes with really small log fold changes get flagged as differentially expressed even though there are genes with much stronger fold change nearby which are not called. If this happen, please try to call 'nbinomTest' with the optional parameter 'eps=1e-8' (or an even lower value). Simon |
![]() |
![]() |
![]() |
Tags |
deseq |
Thread Tools | |
|
|