SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq without biological replicates (dataset: Marioni et al.) Andrea Apolloni Bioinformatics 3 01-13-2012 05:58 AM
DESeq: question about with replicates and without any replicates. nb509 RNA Sequencing 2 10-25-2011 07:04 AM
DESeq for a small sets of sequences without replicates starlight Bioinformatics 6 09-05-2011 11:39 AM
DESeq analysis without replicates for 16 tissues johannes.helmuth Bioinformatics 0 05-25-2011 02:53 AM
DESeq: question about baseMean. Also, replicates. Azazel Bioinformatics 5 05-18-2011 11:51 PM

Reply
 
Thread Tools
Old 02-24-2011, 03:59 PM   #21
ragowthaman
Member
 
Location: Seattle, USA

Join Date: Nov 2009
Posts: 12
Default Where to use method?

Hi Simon,
I just started to use your package today. I too dont have any replicates (shy...). Your vignettes did not talk about method="blind". Perhaps its a new addition (you said that already).

Just wondering where should I use it. Being naive to both expression analysis and to R, i ask this dump question. Is it at when calculating padj values?

Thanks very much in advance,
Gowthaman
ragowthaman is offline   Reply With Quote
Old 02-25-2011, 12:03 AM   #22
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by ragowthaman View Post
Your vignettes did not talk about method="blind". Perhaps its a new addition (you said that already).
You are reading an outdated version of the vignette. The current version of DESeq is here:
http://www.bioconductor.org/packages...tml/DESeq.html
Simon Anders is offline   Reply With Quote
Old 07-14-2011, 07:04 PM   #23
wangleibio
Junior Member
 
Location: shanghai

Join Date: Nov 2009
Posts: 8
Default DESeq without replicates

hi,simon
I am trying to use DESeq, i have 42058 genes,get 64 DE genes the resulte is that the DE gene is very little.
1.>head(ab)
Num_reads.a Num_reads.b
Glyma01g00270.1 8 0
Glyma01g00320.1 833 1019
Glyma01g00380.1 1430 2019
Glyma01g00400.1 1275 1135
Glyma01g00400.2 236 108
Glyma01g00400.3 12 7
2.> conds<- c("A","B")
3.> cds<- newCountDataSet(ab,conds)
4.>cds <- estimateSizeFactors( cds )
5.> cds <- estimateVarianceFunctions( cds ,method='blind')
6.>res2 <- nbinomTest( cds, "A", "B" )
7.>> plot(
+ res2$baseMean,
+ res2$log2FoldChange,
+ log="x", pch=20, cex=.1,
+ col = ifelse( res2$padj < .1, "red", "black" ) )
8.>table( res_sig = res2$padj < .1, res2_sig = res2$padj < .1 )
res2_sig
res_sig FALSE TRUE
FALSE 41994 0
TRUE 0 64

I know it's very dangerous to jump to conclusions with no replicates,but i think i can get more DE genes. can i think about P-value and padj ?
i do not how to do it ? can you give me Any suggestions?
thanks !

lei
wangleibio is offline   Reply With Quote
Old 07-14-2011, 11:25 PM   #24
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

The purpose of the 'blind' method was never to offer a proper analysis method for experiments without replication, because is is simply not possible (not just "dangerous") to get conclusions. The whole point of replicates is to allow you to draw the line for significance, i.e., to know how much fold change you need to see to consider an effect real. Without replicates, you can guess, of course, but it has to be a wild guess, unless you are happy with the extremely over-careful guess that e.g. the "blind" method give you.

Just out of curiosity: Why don't you have replicates? Every other post here, somebody wants to do DE analysis without replication, and I am genuinely puzzled why. It cannot be budget reasons, because with multiplexing, sequencing two samples to half the depth is not that much more expensive than one sample to full depth.
Simon Anders is offline   Reply With Quote
Old 11-08-2011, 07:31 AM   #25
concitacantarella
Junior Member
 
Location: italy

Join Date: Oct 2011
Posts: 7
Default

Hi,
I' d like to use DESeq to analyze miRNome data by next generation sequencing.
Unfortunately I haven't any replicates.
After reading the paper "Differential expression analysis for sequence count data" I have two doubts:
first - Is the miRNA dataset too small to consider the assumption that for
most of them there is no true differential abundance?
second - Without replicates resVarA and resVarB are both NA (probably due to the ratio 1/m-1 where m is the number of replicates). How the program calculates the p-value if the parameters sigmaA and sigmaB, related to negative binomial distribution, are "incomplete"?

Thanks in advanced
concitacantarella is offline   Reply With Quote
Old 11-08-2011, 11:29 AM   #26
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
first - Is the miRNA dataset too small to consider the assumption that for
most of them there is no true differential abundance?
The normalization is quite robust with respect to this. The test for differential expression without replicates will not get you very far without replicates, unless you expect that there only very few but very strong effects.
second - Without replicates resVarA and resVarB are both NA (probably due to the ratio 1/m-1 where m is the number of replicates). How the program calculates the p-value if the parameters sigmaA and sigmaB, related to negative binomial distribution, are "incomplete"?[/QUOTE]

It calculates one sigma, by pretending that the two samples re replicates. See the paper for details.

How comes you do not have replicates?

[QUOTE]
Simon Anders is offline   Reply With Quote
Old 11-08-2011, 11:55 AM   #27
Gators
Member
 
Location: North Carolina

Join Date: Feb 2011
Posts: 22
Default

Okay since this topic is active I guess I will throw my relatively unique dataset into the mix regarding how to properly calculate variances in DESeq. I have the following dataset
Cell line untreated (2 biological reps)
Cell line treated X (2 biological reps)
Human sample untreated (1 rep)
Human sample treated X (1 rep)
Human sample treated Y (1 rep)

I certainly understand that it would be great to have multiple replicates for the human samples, but without going into details, let's just say it isn't gonna happen. What I have done so far is to essentially do 3 DESeq variance estimations before the nbinom analysis.
1st - Cell line treated + untreated - using the replicates
2nd - Human treated X + Human untreated - using the "blind" parameter
3rd - Human treated Y + Human untreated - again using the "blind" parameter

Since these samples are all similar, I wonder if it would be advisable to calculate the variance simultaneously on all samples, then do the individual comparisons at the nbinom step? Essentially wondering if adding this extra data might provide a somewhat better variance calculation for those samples without replicates...

Also any recommendation on what heatmap R package that would allow me to include all these samples on a single heatmap?

Last edited by Gators; 11-08-2011 at 11:59 AM.
Gators is offline   Reply With Quote
Old 11-08-2011, 12:07 PM   #28
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Your cell line replicates are probably isogenic while your human samples are from different humans, and hence, the variation between humans will be much larger than what you expect between the cell lines. I hence wonder what you mean by "these samples are all similar".
Simon Anders is offline   Reply With Quote
Old 11-08-2011, 12:31 PM   #29
Gators
Member
 
Location: North Carolina

Join Date: Feb 2011
Posts: 22
Default

Actually the human samples were all one donor. Cells were isolated from a human donor, then were either treated with X, treated with Y, or untreated. The cell line I would expect to be different, however they (the cell line and human-derived cells) are all the same cell type, so on some level they should be similar despite the fact that the cell line has been immortalized.

Last edited by Gators; 11-08-2011 at 12:49 PM.
Gators is offline   Reply With Quote
Old 11-08-2011, 04:43 PM   #30
cascoamarillo
Senior Member
 
Location: MA

Join Date: Oct 2010
Posts: 160
Default

Quote:
Originally Posted by Simon Anders View Post
Start R, load DESeq, and type "?estimateVarianceFunctions". If you don't see anything there about 'metho', you have an old DESeq version.

Simon
Hi,

I've installed the new version of DESeq (1.6.0), but when I type "?estimateVarianceFunctions"
this is what I get:

estimateVarianceFunctions packageESeq R Documentation

REMOVED

Description:

This function has been removed. Instead, use
‘estimateDispersions’.

So it has been removed from the new version, or what does it mean?

Thanks
cascoamarillo is offline   Reply With Quote
Old 11-08-2011, 05:46 PM   #31
roseadele
Junior Member
 
Location: Brazil

Join Date: Nov 2011
Posts: 1
Default DESeq

I have a table of contants in excell containing the name of the genes, the conditions and the number of reads per each gene in each consition. How can I use this data in DESeq packcage? How I put the table in R?
roseadele is offline   Reply With Quote
Old 11-08-2011, 08:11 PM   #32
Gators
Member
 
Location: North Carolina

Join Date: Feb 2011
Posts: 22
Default

Quote:
Originally Posted by cascoamarillo View Post

So it has been removed from the new version, or what does it mean?

Thanks
Yes, it has been replaced
Gators is offline   Reply With Quote
Old 11-09-2011, 01:08 AM   #33
arvid
Senior Member
 
Location: Berlin

Join Date: Jul 2011
Posts: 156
Default

Quote:
Originally Posted by roseadele View Post
I have a table of contants in excell containing the name of the genes, the conditions and the number of reads per each gene in each consition. How can I use this data in DESeq packcage? How I put the table in R?
Googling "read data from Excel R" gives me 136 millions answers, and the first ten looked clear and simple. The rest of the information is found in the DESeq manual (search for "Analysing RNA-Seq data with the "DESeq" package"), which is nicely written with clear examples.
You might need a R tutorial if you are not familiar with it; you could start here: http://cran.r-project.org/doc/manuals/R-intro.html.
arvid is offline   Reply With Quote
Old 12-22-2011, 09:12 AM   #34
crh
Member
 
Location: tx

Join Date: Dec 2009
Posts: 46
Default DESeq w/o replicates - padj

Hi New to RNASeq.

We are looking at data w/o replicates (bad I know, but $$ prohibited).
Can someone explain how I interpret padj values =1. I believe this is a measure of FDR type I error?

In the data below, we appear to have 4 genes that are significantly DE?
I know that w/o replicates we are underestimating the true DE discovery..

Charles

deseq_id gene_counts(nano) gene_counts(ctrl) baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
9600 174 13 83.74641874 16.32121019 151.1716273 9.262280527 3.211367453 0.000890382 0.740886757
10604 227 19 110.5361169 23.85407643 197.2181574 8.267692025 3.047484649 0.001206005 0.771936026
8063 593 59 294.6365205 74.07318469 515.1998562 6.955281569 2.79810892 0.001591703 0.88218547
9821 245 23 120.8662944 28.87598725 212.8566016 7.371405167 2.881939658 0.001793433 0.88218547
680 61 4 29.00943031 5.021910827 52.9969498 10.55314434 3.399601013 0.002231307 1
8550 402 44 202.2498031 55.24101909 349.2585872 6.322450109 2.660483748 0.002796612 1
crh is offline   Reply With Quote
Old 12-22-2011, 11:41 AM   #35
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

No, you have nothing.

An FDR of 0.1 (i.e., 10%), for example, means that your gene list contains at most an estimated 10% of false positives. To get such a list, you take all genes with padj<.1.

Thus, padj=1 means that you cannot include the gene even if you are willing to accept 99% false positives.

What I never understand is why people claim that lack of money precluded them from doing replicates. First, you now have wasted all the money you paid for the sequencing run, because without replicates it is highly unlikely to ever get useful results.

Second, while it may have been expensive to obtain replicate samples it is not expensive to sequence additional samples. After all, having twice as many samples does not mean that you need to use twice as many lanes. You simply use multiplexing to sequence each sample to only half the depth and still get more statistical power than with fewer samples at more depth. The only extra expense is the additional library prep kits, not the sequencing itself.
Simon Anders is offline   Reply With Quote
Old 12-22-2011, 12:44 PM   #36
crh
Member
 
Location: tx

Join Date: Dec 2009
Posts: 46
Default No replicates

Duly noted.

c
crh is offline   Reply With Quote
Old 02-16-2012, 12:56 AM   #37
vyellapa
Member
 
Location: phoenix

Join Date: Oct 2011
Posts: 59
Default

Quote:
Originally Posted by Simon Anders View Post
The purpose of the 'blind' method was never to offer a proper analysis method for experiments without replication, because is is simply not possible (not just "dangerous") to get conclusions. The whole point of replicates is to allow you to draw the line for significance, i.e., to know how much fold change you need to see to consider an effect real. Without replicates, you can guess, of course, but it has to be a wild guess, unless you are happy with the extremely over-careful guess that e.g. the "blind" method give you.
Is the "guess work" similar to what cuff-diff when replicates are not provided. There seems to be some mathematical modeling that cuff-diff does that I don't completely understand. Is the method 'blind' for a non statistical person to understand mentioned anywhere?
vyellapa is offline   Reply With Quote
Old 02-16-2012, 02:05 AM   #38
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

If no replicates are provided, there is no way to know the real biological variability, and hence there are at least two options:

(i) You can ignore the issue by (implicitly) postulating the biological variance to be zero. Unfortunately, this is the option most commonly chosen in the literature, despite the fact that it is clearly untenable and will lead to nearly all strongly expressed genes being called differentially expressed if you have sequenced deeply. Cuffdiff, in the versions described in the papers, also suffered from this flaw, but I don't know what the current version does. A way to find out might be to compare if you get more or less hits if you apply the tool of your choice first on a dataset with replicates and then on only two samples from this dataset, one from each treatment group. If you get more significant hits with less data, this would hint at biological variation not being properly accounted for.

(ii) If you think that only very few genes are differentially expressed, you can pretend that your two samples are replicates with respect to the majority of genes, and use this to assess variability. You might strongly overestimate variance that way and dramatically lose power. In other words: you only consider those genes as differentially expressed that differ so much more between the two samples than nearly all other samples that they "stick out" very prominently. This is what DESeq's "blind" approach attempts. Obviously, you typically only get very few hits this way, and even these could be just fluke findings. See the vignette and the paper for details.

Wu et al. (BMC Bioinformatics 2010, 11:564) tried to find a middle ground here but I have not heard about any practical experiences with their approach. Anybody here tried that?
Simon Anders is offline   Reply With Quote
Old 03-01-2012, 01:55 AM   #39
gstitan
Junior Member
 
Location: Essonne

Join Date: Oct 2009
Posts: 7
Default prb with DESeq with estimateVarianceFunctions

Quote:
Originally Posted by Simon Anders View Post
Start R, load DESeq, and type "?estimateVarianceFunctions". If you don't see anything there about 'metho', you have an old DESeq version.

Simon
Hey Simon,

I try to use DESeq. "?estimateVarianceFunctionse" give me :
...
"Usage:

estimateVarianceFunctions(cds, method = c( "normal", "blind", "pooled" ),
pool = NULL, locfit_extra_args = list(), lp_extra_args = list(),
modelFrame = NULL )"

but when I use it, I obtain a error message :
"cds <- estimateVarianceFunctions(cds,method="blind")
Erreur : tentative d'appliquer un objet qui n'est pas une fonction"

So I don't understand why. Before this line, I do
cds <- newCountDataSet(countsTable,conds)

cds <- estimateSizeFactors(cds)

and it's works but not this method "estimateVarianceFunctions".

Can you help me please ?

Thanks
gstitan is offline   Reply With Quote
Old 03-01-2012, 02:21 AM   #40
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

You must have managed to override the definition of estimateVarianceFunctions further up in your session. Independent of that, please update to a current version of R and Bioconductor.
Simon Anders is offline   Reply With Quote
Reply

Tags
deseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:01 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO