SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Reply
 
Thread Tools
Old 10-21-2013, 08:06 PM   #41
alisrpp
Member
 
Location: New York

Join Date: Dec 2010
Posts: 40
Default

Quote:
Originally Posted by Michael Love View Post
There is a lot of shrinkage of dispersion estimates (the blue points are close to the red line) presumably because you have few degrees of freedom (number of samples minus number of parameters to estimate). Yes, I would use fitType="mean" for this dataset as there is not a clear trend.
Hi Michael,

First of all, thank you for all the help.

I tried fitType="mean" with the code:

Code:
#Count matrix input
Cele_SPvsLR = read.csv (file.choose(), header=TRUE, row.names=1)
CeleDesign <- data.frame(
  row.names = colnames(Cele_SPvsLR),
  condition = factor(c("SP", "SP", "LR", "LR")))
dds <- DESeqDataSetFromMatrix(countData = Cele_SPvsLR,
                              colData = CeleDesign,
                              design = ~ condition)
dds


#Est size factor = normalize for library size
dds<- estimateSizeFactors(dds)
ddsMean <- estimateDispersions(dds, fitType="mean")
ddsMean <- nbinomWaldTest(ddsMean)
#For a parametric dispersion
dds <- DESeq(dds)
#plot dispersion
plotDispEsts(ddsMean)
plotDispEsts(dds)

#Differential expression analysis
resultsNames(ddsMean)
res <- results(ddsMean, name= "condition_SP_vs_LR") 
res <- res[order(res$padj),]
head(res)
plotMA(ddsMean)
sum(res$padj < .1, na.rm=TRUE)
This resulted in:

Code:
> sum(res$padj < .1, na.rm=TRUE)
[1] 0
But when I tried fitType="local" I get:

Code:
> sum(res$padj < .1, na.rm=TRUE)
[1] 337
(*see attached plots)

The other "weird" (at least for me) thing is that by filtering by overall count to my fitType="local" results I get less genes:

Code:
> plot(res$baseMean, pmin(-log10(res$pvalue), 50),
+ log="x", xlab="mean of normalized counts",
+ ylab=expression(-log[10](pvalue)))
Warning message:
In xy.coords(x, y, xlabel, ylabel, log) :
  19407 x values <= 0 omitted from logarithmic plot
> abline(v=10, col="red", lwd=1)
> use <- res$baseMean >= 10 & !is.na(res$pvalue)
> table(use)
use
 FALSE   TRUE 
 93445 105376 
> resFilt <- res [use, ]
> resFilt$padj <- p.adjust(resFilt$pvalue, method="BH")
> sum(resFilt$padj < .1, na.rm=TRUE)
[1] 152
Just as a refresher, I'm working with 4 samples (2 replicates/reproductive stages).

Do you think that it is a bad idea to use fitType="local" given that it is the fitType that gives me results?

Can you help me understand why by filtering my "local" results I am obtaining less genes?

Thanks, thanks, and thanks.
Attached Images
File Type: png SPvsLR_new_plotDistEsts(ddsLocal).png (56.8 KB, 28 views)
File Type: png SPvsLR_new_plotMA(ddsLocal).png (63.0 KB, 17 views)
File Type: png SPvsLR_new_plotMA(ddsMean).png (61.3 KB, 19 views)
File Type: png SPvsLR_plotDispEsts(ddsMean).png (49.0 KB, 18 views)
File Type: png SPvsLR_plotIndFilt.png (43.9 KB, 21 views)
alisrpp is offline   Reply With Quote
Old 10-22-2013, 04:46 AM   #42
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

The local fit helps you out by dipping down at the high counts, giving you lower estimates of dispersion there than by taking the mean overall. The large circles are genes whose dispersion estimates are not shrunk because they deviate too far from the fitted line. The local fit is useable as well I would say.

I presume -- because you haven't posted the output of sessionInfo() -- that you could be using a version of DESeq2 which performs automatic independent filtering to optimize the number of rejections. Check the Details section of ?results and the argument 'independentFiltering'.
Michael Love is offline   Reply With Quote
Old 10-22-2013, 07:37 AM   #43
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Hi!
Quick question; what is the default normalisation method in DESeq2?
Cant find anything in the manual.

Thanks!
sindrle is offline   Reply With Quote
Old 10-22-2013, 07:45 AM   #44
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

estimateSizeFactors() is the same function used for normalization as in DESeq, so the method is the same median ratio method as in DESeq. I can try to make this more clear in the man page for DESeq().
Michael Love is offline   Reply With Quote
Old 10-22-2013, 07:59 AM   #45
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Ok, but typing in "estimateSizeFactors()" to look for options gives no info just (object, ...)

But what is the default? Im curious because Im running edgeR with "upper quartile".

Thanks!
sindrle is offline   Reply With Quote
Old 10-22-2013, 08:06 AM   #46
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

If you're trying to do a comparison, the edgeR equivalent would be the "RLE" method in calcNormFactors(). estimateSizeFactors() doesn't take an option in this regard (see help(estimateSizeFactors) for what options it does have).
dpryan is offline   Reply With Quote
Old 10-22-2013, 08:08 AM   #47
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Ah!
Always a good answer from you. Thanks!
sindrle is offline   Reply With Quote
Old 10-22-2013, 12:38 PM   #48
alisrpp
Member
 
Location: New York

Join Date: Dec 2010
Posts: 40
Default

Quote:
Originally Posted by Michael Love View Post
The local fit helps you out by dipping down at the high counts, giving you lower estimates of dispersion there than by taking the mean overall. The large circles are genes whose dispersion estimates are not shrunk because they deviate too far from the fitted line. The local fit is useable as well I would say.

I presume -- because you haven't posted the output of sessionInfo() -- that you could be using a version of DESeq2 which performs automatic independent filtering to optimize the number of rejections. Check the Details section of ?results and the argument 'independentFiltering'.
Here is my sessionInfo()

Code:
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gplots_2.11.3           MASS_7.3-29             KernSmooth_2.23-10      caTools_1.14           
 [5] gdata_2.13.2            gtools_3.1.0            RColorBrewer_1.0-5      DESeq2_1.2.0           
 [9] RcppArmadillo_0.3.920.1 Rcpp_0.10.5             GenomicRanges_1.14.1    XVector_0.2.0          
[13] IRanges_1.20.0          BiocGenerics_0.8.0     

loaded via a namespace (and not attached):
 [1] annotate_1.40.0      AnnotationDbi_1.24.0 Biobase_2.22.0       bitops_1.0-6        
 [5] DBI_0.2-7            genefilter_1.44.0    lattice_0.20-24      locfit_1.5-9.1      
 [9] RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2         survival_2.37-4     
[13] tools_3.0.2          XML_3.95-0.2         xtable_1.7-1
I'm still having some questions about the independent filtering. I am understanding that I have to perform it right after the:

Code:
dds<- estimateSizeFactors(dds)
ddsLocal <- estimateDispersions(dds, fitType="local")
ddsLocal <- nbinomWaldTest(ddsLocal)
res <- results(ddsLocal, name= "condition_MR_vs_BR") 
res <- res[order(res$padj),]
With the code:

Code:
use <- res$baseMean >= 10 & !is.na(res$pvalue)
resFilt <- res [use, ]
resFilt$padj <- p.adjust(resFilt$pvalue, method="BH")
And then continue with (for example):

Code:
#filter for upregulated and downregulated genes
resSig <- resFilt[ resFilt$padj < .1, ]
write.table(resSig[ order( resSig$log2FoldChange, -resSig$baseMean ), ]
            ,"~\\DESeq2\\BRvsMR_old_DownRegulated.txt")
write.table(resSig[ order( -resSig$log2FoldChange, -resSig$baseMean ),
                    ],"~\\DESeq2\\BRvsMR_old_UpRegulated.txt")
But i also found this code:

Code:
filterThreshold <- 2.0
keep <- rowMeans ( counts(ddsLocal, normalized=TRUE)) > filterThreshold
And now i'm really confused. What is this for? (the last code) When i have to use it? Am I doing something wrong?

Thank you again for the help.
alisrpp is offline   Reply With Quote
Old 10-22-2013, 12:51 PM   #49
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

You are using DESeq2 v1.2 and reading the documentation off the Bioc website for version v1.0.

This is why I recommended you to look up the man page for ?results.

And it's always safer to use:

browseVignettes("DESeq2")

to ensure you are reading the documentation for the version you are using.
Michael Love is offline   Reply With Quote
Old 10-22-2013, 02:33 PM   #50
alisrpp
Member
 
Location: New York

Join Date: Dec 2010
Posts: 40
Default

Quote:
Originally Posted by Michael Love View Post
You are using DESeq2 v1.2 and reading the documentation off the Bioc website for version v1.0.

This is why I recommended you to look up the man page for ?results.

And it's always safer to use:

browseVignettes("DESeq2")

to ensure you are reading the documentation for the version you are using.
Ups! Thanks!

By using:

Code:
browseVignettes("DESeq2")
i am obtaining

Code:
/library/DESeq2/doc/DESeq2.pdf not found
Any idea about why this is happening?
alisrpp is offline   Reply With Quote
Old 10-23-2013, 02:56 AM   #51
i4u412
Junior Member
 
Location: Stockholm

Join Date: Oct 2013
Posts: 1
Default

Hi

I just started using DEseq2, so please correct if i am wrong.
My sample description has five time points (with 2 replicates).
(t0,t0,t1,t1,t2,t2,t3,t3,t4,t4,t5,t5)

if i set the timepoint design, i always get the differential expression through initial time point (control).

In addition to this, i also want to have differential expression with respect to the immediate time points (like t1 to t0, t2 to t1, t3 to t2, t4 to t3 and t5 to t4). Can you please tell me how to set the factor levels to get this type of comparison.

thank you

Last edited by i4u412; 10-23-2013 at 04:57 AM.
i4u412 is offline   Reply With Quote
Old 10-24-2013, 09:12 AM   #52
alisrpp
Member
 
Location: New York

Join Date: Dec 2010
Posts: 40
Default

Hi,

I'm using DESeq2_1.2.0 and I am having this error:

Code:
> resSig <- res[ res$padj < .1, ]
Error in normalizeSingleBracketSubscript(i, x, byrow = TRUE, exact = FALSE) : 
  subscript contains NAs
It's really weird because this error started today and I have been using the same datasets and the same Rscript for the last 3 weeks without this error coming up.

Just in case, here is my code:

Code:
#Count matrix input
Cele_SPvsLR_old = read.csv (file.choose(), header=TRUE, row.names=1)
CeleDesign <- data.frame(
  row.names = colnames(Cele_SPvsLR_old),
  condition = factor(c("SP", "SP", "LR", "LR")))
dds <- DESeqDataSetFromMatrix(countData = Cele_SPvsLR_old,
                              colData = CeleDesign,
                              design = ~ condition)
dds

#Est size factor = normalize for library size
dds<- estimateSizeFactors(dds)
ddsLocal <- estimateDispersions(dds, fitType="local")
ddsLocal <- nbinomWaldTest(ddsLocal)
plotDispEsts(ddsLocal)

#Differential expression analysis
resultsNames(ddsLocal)
res <- results(ddsLocal, name= "condition_SP_vs_LR") 
res <- res[order(res$padj),]
head(res)
plotMA(ddsLocal, ylim=c(-2,2), main="DESeq2")
sum(res$padj < .1, na.rm=TRUE)

#filter for significant genes 
resSig <- res[ res$padj < 0.1, ]
And my sessionInfo()

Code:
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DESeq2_1.2.0            RcppArmadillo_0.3.920.1 Rcpp_0.10.5             GenomicRanges_1.14.1   
[5] XVector_0.2.0           IRanges_1.20.0          BiocGenerics_0.8.0     

loaded via a namespace (and not attached):
 [1] annotate_1.40.0      AnnotationDbi_1.24.0 Biobase_2.22.0       DBI_0.2-7           
 [5] genefilter_1.44.0    grid_3.0.2           lattice_0.20-24      locfit_1.5-9.1      
 [9] RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2        
[13] survival_2.37-4      tools_3.0.2          XML_3.95-0.2         xtable_1.7-1
Can anyone give me an explanation about why am I having this error now but not in the past using exactly the same Rscript and datasets? Does anyone know how to fix it?

Thanks!

Last edited by alisrpp; 10-24-2013 at 11:50 AM.
alisrpp is offline   Reply With Quote
Old 10-27-2013, 01:52 AM   #53
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Please use
Code:
resSig <- res[ which( res$padj < .1 ), ]
to get rid of the NAs.
Simon Anders is offline   Reply With Quote
Old 10-27-2013, 09:24 AM   #54
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Can I ask why you set it to .1 ?
sindrle is offline   Reply With Quote
Old 11-12-2013, 01:56 AM   #55
fpesce
Junior Member
 
Location: London

Join Date: Oct 2013
Posts: 4
Default

Hi,
I am using DESeq2 version 1.2.5 and when I use this design:

> design(dds) <- formula(~ AGE + SEX + CONDITION)

Defining the variables:
> colData(dds)$CONDITION <- relevel(colData(dds)$CONDITION, "R")
> colData(dds)$SEX <- relevel(colData(dds)$SEX, "M")
Whereas AGE is a continuous variable

I get this error:
> dds <- DESeq(dds)
using pre-existing size factors
estimating dispersions
you had estimated dispersions, replacing these
gene-wise dispersion estimates
error: inv(): matrix appears to be singular

On the other hand, there are no problems with these combinations:
> design(dds) <- formula(~ AGE)
> design(dds) <- formula(~ CONDITION)
> design(dds) <- formula(~ SEX + CONDITION)

While I get the same error with:
> design(dds) <- formula(~ AGE + CONDITION)

Any clue what's going on here?
Thank you so much in advance for your help
fpesce is offline   Reply With Quote
Old 11-12-2013, 02:53 AM   #56
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Could "AGE" have been converted to a factor at some point?
dpryan is offline   Reply With Quote
Old 11-12-2013, 03:10 AM   #57
fpesce
Junior Member
 
Location: London

Join Date: Oct 2013
Posts: 4
Default

Quote:
Originally Posted by dpryan View Post
Could "AGE" have been converted to a factor at some point?
No it has not. I checked:

> is(colData(dds)$AGE)
[1] "numeric" "vector" "atomic" "vectorORfactor"

Thanks for the suggestion
fpesce is offline   Reply With Quote
Old 11-13-2013, 11:13 AM   #58
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Can you please post colData(dds) (or some scrubbed version of it)? This helps me figure out what the design matrix looks like, and therefore why some designs are causing errors.
Michael Love is offline   Reply With Quote
Old 11-14-2013, 01:20 AM   #59
fpesce
Junior Member
 
Location: London

Join Date: Oct 2013
Posts: 4
Default

Quote:
Originally Posted by Michael Love View Post
Can you please post colData(dds) (or some scrubbed version of it)? This helps me figure out what the design matrix looks like, and therefore why some designs are causing errors.
Here it is:
> colData(dds)
DataFrame with 122 rows and 4 columns
CONDITION KEEP AGE SEX
<factor> <integer> <numeric> <factor>
ID1 A 1 47.06913 F
ID2 A 1 45.33333 M
ID3 A 1 59.63039 F
ID4 A 1 49.00753 M
ID5 A 1 34.94319 M
... ... ... ... ...
ID118 B 1 30.00684 M
ID119 B 1 41.90828 F
ID120 B 1 39.04449 F
ID121 B 1 39.64682 M
ID122 B 1 16.70363 M
fpesce is offline   Reply With Quote
Old 11-14-2013, 04:49 AM   #60
IsBeth
Member
 
Location: Spain

Join Date: Nov 2013
Posts: 28
Red face Estimate Dispersions with DESeq2

Hello! =)

I'm a beginner (Blutiger Anfänger) at bioinformatics and so I've got a beginners question, sorry! Nowadays I'm trying to run DESeq2 with a dataset without replicates. It's a dataset with two columns (2 conditions, "control" and "infected").
Which method do you recommend me to estimate the dispersion of my data? ( "pooled", "pooled-CR", "per-condition", "blind"?). I was reading about these methods but it's difficult for me to say which one I should use.

If I just use the command "estimateDispersions" without specifying the method to make use of, which one is applied?

What should I be careful with?


Thanks in advance!
IsBeth is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO