SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Reply
 
Thread Tools
Old 11-03-2014, 01:51 PM   #81
TomHarrop
Member
 
Location: New Zealand

Join Date: Jul 2014
Posts: 20
Default

Quote:
Originally Posted by Michael Love View Post
The estimator for the amount of shrinkage changed (this and all changes are noted in NEWS). Note that rlog is just one of the possible transformations, which we have found useful in many situations, but certainly it can shrink too much or too little given the diversity of datasets.

I'd try out the varianceStabilizingTranformation, or even log2(counts(dds, normalized=TRUE) + pc) for varying pseudocount.
Hi Michael,

Forgot to thank you for your suggestions. Simple log-transformation with pseudocount reveals the differences in distribution of expression values in the degraded libraries.

Cheers,

Tom
TomHarrop is offline   Reply With Quote
Old 11-10-2014, 04:47 AM   #82
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

hi Raphael,

In version 1.4, DESeq() replaces outliers for certain sample sizes as described in the help file ?DESeq, and that we therefore don't recommend using the replaceOutliers function manually anymore. Also see ?DESeq for description of how the original and replacement counts are stored. If you have further questions, maybe best to start a new thread, and post the code you are using. Also, you might as well update to the current Bioconductor release 3.0 / DESeq2 version 1.6.
Michael Love is offline   Reply With Quote
Old 01-06-2015, 04:41 PM   #83
harlequin
Junior Member
 
Location: Europe

Join Date: Oct 2012
Posts: 7
Default

Hi,

Can someone briefly explain how the "filterThreshold" attribute -which as far as I can tell is a threshold for the mean of normalized counts- of the results object is calculated in DESEq2? I've read the function manual but it's not entirely clear to me. Is it picked by an optimization algorithm so that the number of genes with adjusted P-values < 0.1 is maximum? Thanks!
harlequin is offline   Reply With Quote
Old 01-07-2015, 02:52 PM   #84
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Is it picked by an optimization algorithm so that the number of genes with adjusted P-values < 0.1 is maximum?
Yes, exactly so.
Simon Anders is offline   Reply With Quote
Old 01-12-2015, 07:36 AM   #85
zhuya5607
Member
 
Location: Stockholm

Join Date: Jun 2012
Posts: 18
Unhappy plotDispEsts() doesn't work



I can successfully follow all the commands in DESeq2 manual (updated version on December 16, 2014) except plotDispEsts(dds).

plotDispEsts(dds) gives the error message:
no slot of name "fitInfo" for this object of class "DESeqDataSet"

the following were the basic steps i ran

countData <- read.table("count_table_123vs789.txt",header=TRUE,row.names=1)
countData <- as.matrix(countData)
condition <- factor(c(rep("mirA",3),rep("mirB",3)))

coldata <- data.frame(row.names=colnames(countData),condition)
dds <- DESeqDataSetFromMatrix(countData=countData, colData=coldata, design=~condition)
dds <- DESeq(dds)
res <- results(dds)
plotDispEsts(dds)

Everything ran smoothly but the last step plotDispEsts(dds).
Can anyone please help me?
zhuya5607 is offline   Reply With Quote
Old 01-12-2015, 10:26 AM   #86
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

You have an older version of the older package "DESeq" masking the DESeq2 version.

We fixed the dispatch of plotDispEsts for DESeq/DESeq2/DEXSeq a year ago by creating a generic, so that all these functions can work with all libraries loaded, but if you have an outdated version of DESeq, you still experience the problem.

biocLite()

and update all. You might need to update your version of R as well. What's your

sessionInfo()
Michael Love is offline   Reply With Quote
Old 01-13-2015, 02:53 AM   #87
zhuya5607
Member
 
Location: Stockholm

Join Date: Jun 2012
Posts: 18
Default

Quote:
Originally Posted by Michael Love View Post
You have an older version of the older package "DESeq" masking the DESeq2 version.

We fixed the dispatch of plotDispEsts for DESeq/DESeq2/DEXSeq a year ago by creating a generic, so that all these functions can work with all libraries loaded, but if you have an outdated version of DESeq, you still experience the problem.

biocLite()

and update all. You might need to update your version of R as well. What's your

sessionInfo()
Thanks for your reply! You are probably right about this, I did have a old version of DESeq installed before. However, i used biocLite("DESeq") to update,
the same problem is still there. Is my R version too old?

Here is my sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] DESeq2_1.4.5 RcppArmadillo_0.4.600.0 Rcpp_0.11.3 GenomicRanges_1.16.4 GenomeInfoDb_1.0.2
[6] IRanges_1.22.10 BiocGenerics_0.10.0

loaded via a namespace (and not attached):
[1] AnnotationDbi_1.26.1 Biobase_2.24.0 DBI_0.3.1 DESeq_1.16.0 RColorBrewer_1.1-2 RSQLite_1.0.0
[7] XML_3.98-1.1 XVector_0.4.0 annotate_1.42.1 genefilter_1.46.1 geneplotter_1.42.0 grid_3.1.0
[13] lattice_0.20-29 locfit_1.5-9.1 splines_3.1.0 stats4_3.1.0 survival_2.37-7 tools_3.1.0
[19] xtable_1.7-4
zhuya5607 is offline   Reply With Quote
Old 01-13-2015, 06:02 AM   #88
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

I don't see why the problem should exist if you have updated BiocGenerics, DESeq, DESeq2 and DEXSeq to the same version of Bioconductor.

I recommend users stay with the current release of Bioconductor, which would be one version ahead of the one you have, by using the current release of R.

But if you don't have time to upgrade, or are in the middle of an analysis and need to save the state, a workaround is to use double colon to specify the library:
Code:
DESeq2::plotDispEsts
Michael Love is offline   Reply With Quote
Old 01-13-2015, 06:43 AM   #89
zhuya5607
Member
 
Location: Stockholm

Join Date: Jun 2012
Posts: 18
Default

Quote:
Originally Posted by Michael Love View Post
I don't see why the problem should exist if you have updated BiocGenerics, DESeq, DESeq2 and DEXSeq to the same version of Bioconductor.

I recommend users stay with the current release of Bioconductor, which would be one version ahead of the one you have, by using the current release of R.

But if you don't have time to upgrade, or are in the middle of an analysis and need to save the state, a workaround is to use double colon to specify the library:
Code:
DESeq2::plotDispEsts
OK, I updated R this time, install DESeq, DESeq2 and DEXSeq again using biocLite().
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] DESeq2_1.4.5 RcppArmadillo_0.4.600.0 Rcpp_0.11.3 GenomicRanges_1.16.4 GenomeInfoDb_1.0.2
[6] IRanges_1.22.10 BiocGenerics_0.10.0 BiocInstaller_1.14.3

loaded via a namespace (and not attached):
[1] annotate_1.42.1 AnnotationDbi_1.26.1 Biobase_2.24.0 DBI_0.3.1 DESeq_1.16.0 genefilter_1.46.1
[7] geneplotter_1.42.0 grid_3.1.2 lattice_0.20-29 locfit_1.5-9.1 RColorBrewer_1.1-2 RSQLite_1.0.0
[13] splines_3.1.2 stats4_3.1.2 survival_2.37-7 tools_3.1.2 XML_3.98-1.1 xtable_1.7-4
[19] XVector_0.4.0


However, i noticed a warning message when i load DESeq2

> library(DESeq2)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following object is masked _by_ ‘.GlobalEnv’:

plotDispEsts


The following objects are masked from ‘package parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply,
parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find,
get, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Rcpp
Loading required package: RcppArmadillo


OK, even I tried
Code:
DESeq2::plotDispEsts
, still not working
standardGeneric for "plotDispEsts" defined from package "BiocGenerics"

function (object, ...)
standardGeneric("plotDispEsts")
<environment: 0x7fa45c1e1148>
Methods may be defined for arguments: object
Use showMethods("plotDispEsts") for currently available ones.

> showMethods("plotDispEsts")
Function: plotDispEsts (package BiocGenerics)
object="DESeqDataSet"


########Problem solved now###
Because i was loading DESeq2 in the folder where old DESeq and DEXseq were installed, the old plotDispEsts always mask the new plotDispEsts object.
I just changed a new working directory and reload DESeq2, then problem solved. Thanks for you help, Michael!

Last edited by zhuya5607; 01-13-2015 at 07:11 AM. Reason: Problem solved!
zhuya5607 is offline   Reply With Quote
Old 01-22-2015, 10:59 AM   #90
Neuromancer
Member
 
Location: Goettingen, Germany

Join Date: Aug 2011
Posts: 28
Default Can an updated ENSEMBL database release have an effect on DESeq2 results?

Well, obviously it can, but the changes in the output I see are rather dramatic.

Here's in brief my situtation:

Data from an RNAseq experiment (mouse) was
mapped to GRCm38p2 in Feb. 2014, using STAR,
counted with HTSeq (latest version),
DEG estimations done with DESeq2 (latest version at that time)
and Ensemble release e75 GTF file.
Results were interesting and made sense, biologically.

Now we remapped everything to GRCm38p3
using Ensemble release e78 GTF and latest versions of those programms (which have not changed much).
Results are radically different, with some important changes gone, (which we validated by qPCR!)
(example: a gene with log2FC=-0,63, padj=0.0688 in the first mapping, and log2FC=-0.21, padj=0.22 in the second) - however, the counts for each sample and on average (baseMean) are very very similar, so I guess it's not the mapping that makes the difference.

Notably the number of annotated genes found in e78 is much higher compared to e75 so I expect this has an influence on the adjusted p-value - but how can it affect the log2FC!? Could it be that the number of genes tested affects the gene models / dispersions that DESeq2 estimates?

I would be happy about any comments.

Thanks.
Neuromancer is offline   Reply With Quote
Old 01-22-2015, 11:14 AM   #91
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Presumably e78 added a bunch of overlapping genes that are resulting in heavily altered counts for those genes.
dpryan is offline   Reply With Quote
Old 01-22-2015, 11:36 PM   #92
Neuromancer
Member
 
Location: Goettingen, Germany

Join Date: Aug 2011
Posts: 28
Default

Quote:
Originally Posted by dpryan View Post
Presumably e78 added a bunch of overlapping genes that are resulting in heavily altered counts for those genes.
Well, at least the raw counts are very(!) similar, not to say: almost exactly the same for most genes.
Also, with more/better genome information, I think, the DEG estimations should be getting better, not worse, right? We confirmed a couple of genes by qPCR and in many cases they are showing surprisingly accurate matches of the RNAseq results, but then again, we do find some genes where this nice match is completely lost using e78...
Neuromancer is offline   Reply With Quote
Old 01-22-2015, 11:48 PM   #93
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I wonder if the additional genes in the analysis are skewing the prior in either the fits or the p-value adjustment (just adding a background uniform p-value distribution in that case). That'd be the other likely possibility. What happens if you simply exclude the added genes from the analysis?
dpryan is offline   Reply With Quote
Old 01-23-2015, 06:03 AM   #94
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

re: changing the gene model from e75 to e78: Although it might seem at first that having the same read count for a given gene should mean that the results are identical, consider that all the parameter estimation steps involve looking at all the genes: including the library size normalization, the dispersion estimation, the posterior log fold changes, and the p-value adjustment (as Devon points out). If you want less dependence between genes, you could try using the setting betaPrior=FALSE with the DESeq() function, and then the inference is on maximum likelihood estimates of log fold change. This is a choice for the investigator, if they prefer slightly noisier LFC estimates, but an analysis such that individual genes will have less effect on each other's fold changes. But there will still be dependence between genes on dispersion estimation and p-value adjustment, so you shouldn't expect the p-values to be identical after changing gene models. The inference is nearly identical in terms of hypothesis testing, although genes on the margin of a significance boundary can have p-values move slightly one way or the other.
Michael Love is offline   Reply With Quote
Old 01-23-2015, 07:49 AM   #95
Neuromancer
Member
 
Location: Goettingen, Germany

Join Date: Aug 2011
Posts: 28
Default

Dear Michael and Devon,

thank you for taking the time to think about this and write an answer.
These are importnat consideradtions and we will take them into account now.

The other thing that has changed between the two analyses is of course the DESeq2 version (v1.4.1 vs 1.6.3), which might also influence results slightly, I guess.

Removing the new IDs is not really an option because it not only new ones, but also substitutes and other changes that are updated with new ensembl releases.

We'll try the option beta-prior=false to see what effect this has on our data set.

Thanks again,

Roman
Neuromancer is offline   Reply With Quote
Old 01-29-2015, 07:26 AM   #96
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Hi all,

I am new to DEseq2. Maybe this is silly question, but I do not understand why meanSdPlot fails.... any ideas to solve this problem?


library("vsn")
par(mfrow=c(1,3))
notAllZero <- (rowSums(counts(dds))>0)
meanSdPlot(log2(counts(dds,normalized=TRUE)[notAllZero,] + 1))
meanSdPlot(assay(rld[notAllZero,]))
meanSdPlot(assay(vsd[notAllZero,]))

Error: could not find function "meanSdPlot"


Thanks
Gonza is offline   Reply With Quote
Old 02-03-2015, 03:49 AM   #97
IsBeth
Member
 
Location: Spain

Join Date: Nov 2013
Posts: 28
Default

Hello!

I'm trying DESeq2 with synthetic RNA-seq count data without replicates (40% of the synthetic genes are differentially expressed (DE), with a up - and downregulation ratio of 50%). Although DESeq2 detects these ratios correctly, it doesn't detect the DE genes. The result table is empty, and my MAplot doesn't show differential expression either. Is there any solution to this? Do I have to change any parameter?
IsBeth is offline   Reply With Quote
Old 02-03-2015, 04:35 AM   #98
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

When you ran DESeq(), you should have seen the following message printed to console:

Code:
estimating dispersion by treating samples as replicates.
read the ?DESeq section on 'Experiments without replicates'
Did you read this section?
Michael Love is offline   Reply With Quote
Old 02-03-2015, 06:34 AM   #99
IsBeth
Member
 
Location: Spain

Join Date: Nov 2013
Posts: 28
Default

Yes, I did. And I'm sorry, since I've seen that this problem is treated in other posts (such as "DESeq2 without biol replicates"). I tried the following code, but it doesn't work by now (I need to obtain a result table with statistics like the adjusted p-values and lfc):

design_vector <- c("C", "T")

coldat=DataFrame(grp=factor(design_vector), each=1)
dds <- DESeqDataSetFromMatrix(raw_filter, colData=coldat, design =~grp)
dds <- DESeq(dds)

rld <- rlogTransformation( dds )
NOTE: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.

deseq2.res <- results(dds)


What do I have to change? Cause I'm sure I'm doing it completely wrong
IsBeth is offline   Reply With Quote
Old 02-03-2015, 06:50 AM   #100
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

I guess the point I want to emphasize is the quote from the original DESeq paper, regarding the without-replicates analysis:

"Some overestimation of the variance may be expected, which will make that approach conservative."
Furthermore, "while one may not want to draw strong conclusions from such an analysis, it
may still be useful for exploration and hypothesis generation."

The analysis without replicates is very conservative, and only recommended for exploration (for example, you can look at which log fold changes are largest in the MA plot).

So to answer your question from before:

"it doesn't detect the DE genes. The result table is empty, and my MAplot doesn't show differential expression either"

The without-replicates analysis is not expected to produce small p-values or adjusted p-values. We have overestimated the variance by treating the samples from different conditions as replicates for the purpose of estimating variance.
Michael Love is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO