![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
DESeq without biological replicates (dataset: Marioni et al.) | Andrea Apolloni | Bioinformatics | 3 | 01-13-2012 05:58 AM |
DESeq: question about with replicates and without any replicates. | nb509 | RNA Sequencing | 2 | 10-25-2011 07:04 AM |
DESeq for a small sets of sequences without replicates | starlight | Bioinformatics | 6 | 09-05-2011 11:39 AM |
DESeq analysis without replicates for 16 tissues | johannes.helmuth | Bioinformatics | 0 | 05-25-2011 02:53 AM |
DESeq: question about baseMean. Also, replicates. | Azazel | Bioinformatics | 5 | 05-18-2011 11:51 PM |
![]() |
|
Thread Tools |
![]() |
#21 |
Member
Location: Seattle, USA Join Date: Nov 2009
Posts: 12
|
![]()
Hi Simon,
I just started to use your package today. I too dont have any replicates (shy...). Your vignettes did not talk about method="blind". Perhaps its a new addition (you said that already). Just wondering where should I use it. Being naive to both expression analysis and to R, i ask this dump question. Is it at when calculating padj values? Thanks very much in advance, Gowthaman |
![]() |
![]() |
![]() |
#22 | |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]() Quote:
http://www.bioconductor.org/packages...tml/DESeq.html |
|
![]() |
![]() |
![]() |
#23 |
Junior Member
Location: shanghai Join Date: Nov 2009
Posts: 8
|
![]()
hi,simon
I am trying to use DESeq, i have 42058 genes,get 64 DE genes the resulte is that the DE gene is very little. 1.>head(ab) Num_reads.a Num_reads.b Glyma01g00270.1 8 0 Glyma01g00320.1 833 1019 Glyma01g00380.1 1430 2019 Glyma01g00400.1 1275 1135 Glyma01g00400.2 236 108 Glyma01g00400.3 12 7 2.> conds<- c("A","B") 3.> cds<- newCountDataSet(ab,conds) 4.>cds <- estimateSizeFactors( cds ) 5.> cds <- estimateVarianceFunctions( cds ,method='blind') 6.>res2 <- nbinomTest( cds, "A", "B" ) 7.>> plot( + res2$baseMean, + res2$log2FoldChange, + log="x", pch=20, cex=.1, + col = ifelse( res2$padj < .1, "red", "black" ) ) 8.>table( res_sig = res2$padj < .1, res2_sig = res2$padj < .1 ) res2_sig res_sig FALSE TRUE FALSE 41994 0 TRUE 0 64 I know it's very dangerous to jump to conclusions with no replicates,but i think i can get more DE genes. can i think about P-value and padj ? i do not how to do it ? can you give me Any suggestions? thanks ! lei |
![]() |
![]() |
![]() |
#24 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
The purpose of the 'blind' method was never to offer a proper analysis method for experiments without replication, because is is simply not possible (not just "dangerous") to get conclusions. The whole point of replicates is to allow you to draw the line for significance, i.e., to know how much fold change you need to see to consider an effect real. Without replicates, you can guess, of course, but it has to be a wild guess, unless you are happy with the extremely over-careful guess that e.g. the "blind" method give you.
Just out of curiosity: Why don't you have replicates? Every other post here, somebody wants to do DE analysis without replication, and I am genuinely puzzled why. It cannot be budget reasons, because with multiplexing, sequencing two samples to half the depth is not that much more expensive than one sample to full depth. |
![]() |
![]() |
![]() |
#25 |
Junior Member
Location: italy Join Date: Oct 2011
Posts: 7
|
![]()
Hi,
I' d like to use DESeq to analyze miRNome data by next generation sequencing. Unfortunately I haven't any replicates. After reading the paper "Differential expression analysis for sequence count data" I have two doubts: first - Is the miRNA dataset too small to consider the assumption that for most of them there is no true differential abundance? second - Without replicates resVarA and resVarB are both NA (probably due to the ratio 1/m-1 where m is the number of replicates). How the program calculates the p-value if the parameters sigmaA and sigmaB, related to negative binomial distribution, are "incomplete"? Thanks in advanced |
![]() |
![]() |
![]() |
#26 | |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]() Quote:
second - Without replicates resVarA and resVarB are both NA (probably due to the ratio 1/m-1 where m is the number of replicates). How the program calculates the p-value if the parameters sigmaA and sigmaB, related to negative binomial distribution, are "incomplete"?[/QUOTE] It calculates one sigma, by pretending that the two samples re replicates. See the paper for details. How comes you do not have replicates? [QUOTE] |
|
![]() |
![]() |
![]() |
#27 |
Member
Location: North Carolina Join Date: Feb 2011
Posts: 22
|
![]()
Okay since this topic is active I guess I will throw my relatively unique dataset into the mix regarding how to properly calculate variances in DESeq. I have the following dataset
Cell line untreated (2 biological reps) Cell line treated X (2 biological reps) Human sample untreated (1 rep) Human sample treated X (1 rep) Human sample treated Y (1 rep) I certainly understand that it would be great to have multiple replicates for the human samples, but without going into details, let's just say it isn't gonna happen. What I have done so far is to essentially do 3 DESeq variance estimations before the nbinom analysis. 1st - Cell line treated + untreated - using the replicates 2nd - Human treated X + Human untreated - using the "blind" parameter 3rd - Human treated Y + Human untreated - again using the "blind" parameter Since these samples are all similar, I wonder if it would be advisable to calculate the variance simultaneously on all samples, then do the individual comparisons at the nbinom step? Essentially wondering if adding this extra data might provide a somewhat better variance calculation for those samples without replicates... Also any recommendation on what heatmap R package that would allow me to include all these samples on a single heatmap? Last edited by Gators; 11-08-2011 at 11:59 AM. |
![]() |
![]() |
![]() |
#28 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Your cell line replicates are probably isogenic while your human samples are from different humans, and hence, the variation between humans will be much larger than what you expect between the cell lines. I hence wonder what you mean by "these samples are all similar".
|
![]() |
![]() |
![]() |
#29 |
Member
Location: North Carolina Join Date: Feb 2011
Posts: 22
|
![]()
Actually the human samples were all one donor. Cells were isolated from a human donor, then were either treated with X, treated with Y, or untreated. The cell line I would expect to be different, however they (the cell line and human-derived cells) are all the same cell type, so on some level they should be similar despite the fact that the cell line has been immortalized.
Last edited by Gators; 11-08-2011 at 12:49 PM. |
![]() |
![]() |
![]() |
#30 | |
Senior Member
Location: MA Join Date: Oct 2010
Posts: 160
|
![]() Quote:
I've installed the new version of DESeq (1.6.0), but when I type "?estimateVarianceFunctions" this is what I get: estimateVarianceFunctions package ![]() REMOVED Description: This function has been removed. Instead, use ‘estimateDispersions’. So it has been removed from the new version, or what does it mean? Thanks |
|
![]() |
![]() |
![]() |
#31 |
Junior Member
Location: Brazil Join Date: Nov 2011
Posts: 1
|
![]()
I have a table of contants in excell containing the name of the genes, the conditions and the number of reads per each gene in each consition. How can I use this data in DESeq packcage? How I put the table in R?
|
![]() |
![]() |
![]() |
#32 |
Member
Location: North Carolina Join Date: Feb 2011
Posts: 22
|
![]() |
![]() |
![]() |
![]() |
#33 | |
Senior Member
Location: Berlin Join Date: Jul 2011
Posts: 156
|
![]() Quote:
You might need a R tutorial if you are not familiar with it; you could start here: http://cran.r-project.org/doc/manuals/R-intro.html. |
|
![]() |
![]() |
![]() |
#34 |
Member
Location: tx Join Date: Dec 2009
Posts: 46
|
![]()
Hi New to RNASeq.
We are looking at data w/o replicates (bad I know, but $$ prohibited). Can someone explain how I interpret padj values =1. I believe this is a measure of FDR type I error? In the data below, we appear to have 4 genes that are significantly DE? I know that w/o replicates we are underestimating the true DE discovery.. Charles deseq_id gene_counts(nano) gene_counts(ctrl) baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj 9600 174 13 83.74641874 16.32121019 151.1716273 9.262280527 3.211367453 0.000890382 0.740886757 10604 227 19 110.5361169 23.85407643 197.2181574 8.267692025 3.047484649 0.001206005 0.771936026 8063 593 59 294.6365205 74.07318469 515.1998562 6.955281569 2.79810892 0.001591703 0.88218547 9821 245 23 120.8662944 28.87598725 212.8566016 7.371405167 2.881939658 0.001793433 0.88218547 680 61 4 29.00943031 5.021910827 52.9969498 10.55314434 3.399601013 0.002231307 1 8550 402 44 202.2498031 55.24101909 349.2585872 6.322450109 2.660483748 0.002796612 1 |
![]() |
![]() |
![]() |
#35 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
No, you have nothing.
An FDR of 0.1 (i.e., 10%), for example, means that your gene list contains at most an estimated 10% of false positives. To get such a list, you take all genes with padj<.1. Thus, padj=1 means that you cannot include the gene even if you are willing to accept 99% false positives. What I never understand is why people claim that lack of money precluded them from doing replicates. First, you now have wasted all the money you paid for the sequencing run, because without replicates it is highly unlikely to ever get useful results. Second, while it may have been expensive to obtain replicate samples it is not expensive to sequence additional samples. After all, having twice as many samples does not mean that you need to use twice as many lanes. You simply use multiplexing to sequence each sample to only half the depth and still get more statistical power than with fewer samples at more depth. The only extra expense is the additional library prep kits, not the sequencing itself. |
![]() |
![]() |
![]() |
#36 |
Member
Location: tx Join Date: Dec 2009
Posts: 46
|
![]()
Duly noted.
c |
![]() |
![]() |
![]() |
#37 | |
Member
Location: phoenix Join Date: Oct 2011
Posts: 59
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#38 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
If no replicates are provided, there is no way to know the real biological variability, and hence there are at least two options:
(i) You can ignore the issue by (implicitly) postulating the biological variance to be zero. Unfortunately, this is the option most commonly chosen in the literature, despite the fact that it is clearly untenable and will lead to nearly all strongly expressed genes being called differentially expressed if you have sequenced deeply. Cuffdiff, in the versions described in the papers, also suffered from this flaw, but I don't know what the current version does. A way to find out might be to compare if you get more or less hits if you apply the tool of your choice first on a dataset with replicates and then on only two samples from this dataset, one from each treatment group. If you get more significant hits with less data, this would hint at biological variation not being properly accounted for. (ii) If you think that only very few genes are differentially expressed, you can pretend that your two samples are replicates with respect to the majority of genes, and use this to assess variability. You might strongly overestimate variance that way and dramatically lose power. In other words: you only consider those genes as differentially expressed that differ so much more between the two samples than nearly all other samples that they "stick out" very prominently. This is what DESeq's "blind" approach attempts. Obviously, you typically only get very few hits this way, and even these could be just fluke findings. See the vignette and the paper for details. Wu et al. (BMC Bioinformatics 2010, 11:564) tried to find a middle ground here but I have not heard about any practical experiences with their approach. Anybody here tried that? |
![]() |
![]() |
![]() |
#39 | |
Junior Member
Location: Essonne Join Date: Oct 2009
Posts: 7
|
![]() Quote:
I try to use DESeq. "?estimateVarianceFunctionse" give me : ... "Usage: estimateVarianceFunctions(cds, method = c( "normal", "blind", "pooled" ), pool = NULL, locfit_extra_args = list(), lp_extra_args = list(), modelFrame = NULL )" but when I use it, I obtain a error message : "cds <- estimateVarianceFunctions(cds,method="blind") Erreur : tentative d'appliquer un objet qui n'est pas une fonction" So I don't understand why. Before this line, I do cds <- newCountDataSet(countsTable,conds) cds <- estimateSizeFactors(cds) and it's works but not this method "estimateVarianceFunctions". Can you help me please ? Thanks |
|
![]() |
![]() |
![]() |
#40 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
You must have managed to override the definition of estimateVarianceFunctions further up in your session. Independent of that, please update to a current version of R and Bioconductor.
|
![]() |
![]() |
![]() |
Tags |
deseq |
Thread Tools | |
|
|