SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Reply
 
Thread Tools
Old 02-03-2015, 07:07 AM   #101
IsBeth
Member
 
Location: Spain

Join Date: Nov 2013
Posts: 28
Default

Quote:
The without-replicates analysis is not expected to produce small p-values or adjusted p-values. We have overestimated the variance by treating the samples from different conditions as replicates for the purpose of estimating variance
Ok, so when estimating the variance this way, it is not sufficient to detect DE genes? I know that to work with no replicates is not appropiate, but in these cases I need to provide some information of the data for the end user . Isn't there some way to obtain a DE genes list (even if the results are to be taken with care)?
(I think that it was possible with DESeq)
IsBeth is offline   Reply With Quote
Old 02-03-2015, 07:26 AM   #102
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

DESeq() in DESeq2 also estimates p-values, and adjusted p-values (for those genes which pass independent filtering step).

I would guess that you are filtering with a p-value or adjusted p-value threshold which is less than all the p-values or adjusted p-values in the object returned by results(), and generating a 0-row dataframe.
Michael Love is offline   Reply With Quote
Old 02-04-2015, 02:45 AM   #103
IsBeth
Member
 
Location: Spain

Join Date: Nov 2013
Posts: 28
Default

Hi Michael,

You were absolutely right I had filtered the object returned from the results() function with a minimum log2foldChange of 1.5 and an adjusted p-value of 0.01. It was definitively too much. The lfc of the result data is less than 1 in most cases, and the adj p-value is superior than one...

With no filtering, I get the results table with the statistics I need, but no DE genes are detected and so none are plotted (with red points). Is there some way tu increase the statistical power (maybe in the results() function)?Or to change some parameters in order to mark genes as differentially expressed (even if their values are not magnific)?

Thank you for your time =)
IsBeth is offline   Reply With Quote
Old 02-18-2015, 07:48 PM   #104
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Here's a bit random and fascinating observation I've made when using DESeq2.

I've had three groups of experiments - with conditions labelled as "B", "12", and "24". I've ran pairwise comparison with donor-sensitive design on all three groups, getting differential expression for them in the same way:

B vs 12 (cond1 = B, cond2 = 12)
B vs 24 (cond1 = B, cond2 = 24)
12 vs 24 (cond1 = 12, cond2 = 24)

so, in first two cases, log2FC was reported for cond1/cond2 (i.e. positive numbers were for genes down-regulated at 12 or 24). In the latter case, log2FC was reported for cond2/cond1.

The differences were subtle enough that it took me a long long while to catch this error. I figured it's probably due to some sorting - either internal or in the condition file. Maybe it would make sense to write explicitly in the logs which one is cond1 and which one is cond2?

Cheers
apredeus is offline   Reply With Quote
Old 02-18-2015, 10:46 PM   #105
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

How exactly have you called the "results" function?

In any case, the first line of the output of the results function indicates the exact comparison that is reported. Can you check whether this line was correct in your case?
Simon Anders is offline   Reply With Quote
Old 02-18-2015, 11:06 PM   #106
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Quote:
Originally Posted by Simon Anders View Post
How exactly have you called the "results" function? In any case, the first line of the output of the results function indicates the exact comparison that is reported. Can you check whether this line was correct in your case?
Oh, I see what the problem was. I have an R function that does the following:

res <- results(ddsMatrix)
resOrdered <- res[order(res$padj),]

Thus I did not see the output you are talking about. Still though, maybe it's a good idea to print it along with messages "estimating size factors" etc?

At any rate, thank you for your reply!
apredeus is offline   Reply With Quote
Old 02-19-2015, 06:45 AM   #107
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

We cannot print it during model fitting, because we fit a model without knowing what comparison will be made. Will it be B vs A? Or C vs B? Or all the interaction terms together as a likelihood ratio test? This is why it is printed when you print the results object to the console, and the information is also stored as metadata columns:

mcols(res)

Check the DESeq2 vignette section on factor levels, which addresses your question.

R's factor function chooses a level ordering alphabetically unless you specify a meaningful order.
Michael Love is offline   Reply With Quote
Old 02-19-2015, 12:51 PM   #108
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

I see. I'll just add that call to my script then, should be enough. Thank you!
apredeus is offline   Reply With Quote
Old 03-26-2015, 03:13 AM   #109
TomHarrop
Member
 
Location: New Zealand

Join Date: Jul 2014
Posts: 20
Default

Hi again,

I'm using the LRT with DESeq2 as follows.

Code:
> design(dds)
~batch + stage
> dds < - DESeq(dds, test = "LRT", reduced = ~ batch)
I see that I can access the calculated deviance for each gene:

Code:
> mcols(mcols(dds), use.names=TRUE)[c(20,25),]
DataFrame with 2 rows and 2 columns
                    type                                   description
             <character>                                   <character>
LRTStatistic     results LRT statistic: '~ batch + stage' vs '~ batch'
deviance         results                    deviance of the full model
Just so I can illustrate to some colleagues what I'm doing using some actual data, I'm wondering, for a given gene, if it's possible to retrieve the expression values predicted by the full and reduced models from dds after running the LRT?

Thanks,

Tom
TomHarrop is offline   Reply With Quote
Old 03-26-2015, 05:07 AM   #110
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

The fitted means, mu_ij in the paper, for the full model are in assays(dds)[["mu"]]. Not that these include the size factor scaling, so you would have to divide those out to get the q_ij in the paper. See the formula in the vignette or paper for more details.

We don't store the reduced design fitted means, but you can generate these easily by running the GLM without a log fold change prior:

dds.reduced <- dds
design(dds.reduced) <- ~ batch
dds.reduced <- nbinomWaldTest(dds.reduced, betaPrior=FALSE)
assays(dds.reduced)[["mu"]]
Michael Love is offline   Reply With Quote
Old 03-26-2015, 08:34 AM   #111
TomHarrop
Member
 
Location: New Zealand

Join Date: Jul 2014
Posts: 20
Default

Perfect, thanks.
TomHarrop is offline   Reply With Quote
Old 03-26-2015, 10:57 AM   #112
rozitaa
Member
 
Location: Sweden

Join Date: Jun 2013
Posts: 51
Default

Hi,

I have recently have changed from DESeq to DESeq2. So I need some help in designing my contrast for the binomial tests. This is my design table:

Code:
sample	tissue	gut_microbiota
1	5231	Si5	GF
2	5231	PC	GF
3	5231	liver	GF
4	5232	Si5	GF
5	5232	PC	GF
6	5232	liver	GF
7	5233	Si5	GF
8	5233	PC	GF
9	5233	liver	GF
10	5234	Si5	GF
11	5234	PC	GF
12	5234	liver	GF
13	5161	Si5	mono
14	5161	PC	mono
15	5161	liver	mono
16	5162	Si5	mono
17	5162	PC	mono
18	5162	liver	mono
19	5163	Si5	mono
20	5163	PC	mono
21	5163	liver	mono
22	5164	Si5	prevExci
23	5164	PC	prevExci
24	5164	liver	prevExci
25	5164	liver	prevExci
26	5165	Si5	prevExci
27	5165	PC	prevExci
28	5165	liver	prevExci
29	5166	Si5	prevExci
30	5166	PC	prevExci
31	5166	liver	prevExci
32	5167	Si5	prev
33	5167	PC	prev
34	5167	liver	prev
35	5168	Si5	prev
36	5168	PC	prev
37	5168	liver	prev
38	5169	Si5	prev
39	5169	PC	prev
40	5169	liver	prev
41	5170	Si5	prev
42	5170	PC	prev
43	5170	liver	prev
44	5171	Si5	mono
45	5171	PC	mono
46	5171	liver	mono
47	5172	Si5	mono
48	5172	PC	mono
49	5172	liver	mono
50	5173	Si5	mono
51	5173	PC	mono
52	5173	liver	mono
53	5174	Si5	prev
54	5174	PC	prev
55	5174	liver	prev
56	5175	Si5	prev
57	5175	PC	prev
58	5175	liver	prev
59	5176	Si5	prev
60	5176	PC	prev
61	5176	liver	prev
62	5177	Si5	prevMono
63	5177	PC	prevMono
64	5177	liver	prevMono
65	5178	Si5	prevMono
66	5178	PC	prevMono
67	5178	liver	prevMono
68	5179	Si5	prevMono
69	5179	PC	prevMono
and this is how I have designed my count data:
dds = DESeqDataSetFromHTSeqCount(sampleTable=sample_table, directory='~/htseq/', design= ~ tissue + gut_microbiota)

Now I would like to test if any gut_microbiota level has significant effects on any of my genes at each separate tissue; (i.e. running the test for each tissue separately). How should I design my contrast formula? and How I should specify specific tissue at each test?

Thanks!
rozitaa is offline   Reply With Quote
Old 03-26-2015, 11:13 AM   #113
rozitaa
Member
 
Location: Sweden

Join Date: Jun 2013
Posts: 51
Default

I ran a code like this (hope it is the correct one?!):

Code:
dds_LRT_liver <- nbinomLRT(dds[, dds$tissue=='liver'], full=~gut_microbiota, reduced=~1)
dds_LRT_PC <- nbinomLRT(dds[, dds$tissue=='PC'], full=~gut_microbiota, reduced=~1)
dds_LRT_Si5 <- nbinomLRT(dds[, dds$tissue=='Si5'], full=~gut_microbiota, reduced=~1)
rozitaa is offline   Reply With Quote
Old 06-02-2015, 05:16 AM   #114
enrico16
Junior Member
 
Location: UK

Join Date: Jan 2015
Posts: 3
Default

See: https://support.bioconductor.org/p/68277/

--

Hi Michael,

Is the Cook's distance-based flagging of p-values performed when using a continuous variable?

I don't see how the condition:

"At least 3 replicates are required for flagging, as it is difficult to judge which sample
might be an outlier with only 2 replicates."


can realistically be met for continuous variables.

If that's the case, how can I remove genes that display such a behavior?
Please let me know if this needs clarification.

Thanks!

Last edited by enrico16; 06-02-2015 at 07:00 AM.
enrico16 is offline   Reply With Quote
Old 06-10-2015, 05:42 AM   #115
Joe Dever
Junior Member
 
Location: EU

Join Date: Mar 2015
Posts: 3
Default

Hi,

I am not familiar with R but I used DESeq in the past following its vignette.
I can't understand how to use DESeq2. I tried to follow the section for HT-Seq input but then it switches to loading the Pasilla dataset.

Is there any tutorial anywhere showing clearly how to use DESeq2 with HTseq-count, like the DESeq vignette did? Please advise.
Joe Dever is offline   Reply With Quote
Old 06-10-2015, 06:39 AM   #116
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

The first step is the only step which matters how you generate the counts. This is the step when you build the DESeqDataSet. So you only need to follow the htseq steps up until DESeqDataSetFromHTSeqCount(). After that, you have the DESeqDataSet, which you can save as "dds", and then you can follow the rest of the steps below. You can either follow this instruction in the vignette, or you can also read the help file for this particular function by typing ? and the function name into R and press enter:

?DESeqDataSetFromHTSeqCount

Note that you can name the object however you like. To follow the rest of the workflow you can choose 'dds':

dds <- DESeqDataSetFromHTSeqCount(...)

where you will fill in ... with the appropriate code from the vignette and help.
Michael Love is offline   Reply With Quote
Old 06-11-2015, 01:30 AM   #117
Joe Dever
Junior Member
 
Location: EU

Join Date: Mar 2015
Posts: 3
Default

Quote:
Originally Posted by Michael Love View Post
The first step is the only step which matters how you generate the counts. This is the step when you build the DESeqDataSet. So you only need to follow the htseq steps up until DESeqDataSetFromHTSeqCount(). After that, you have the DESeqDataSet, which you can save as "dds", and then you can follow the rest of the steps below. You can either follow this instruction in the vignette, or you can also read the help file for this particular function by typing ? and the function name into R and press enter:

?DESeqDataSetFromHTSeqCount

Note that you can name the object however you like. To follow the rest of the workflow you can choose 'dds':

dds <- DESeqDataSetFromHTSeqCount(...)

where you will fill in ... with the appropriate code from the vignette and help.
Thanks for your reply, I'll try again.
Joe Dever is offline   Reply With Quote
Old 06-23-2015, 12:02 PM   #118
amolinaro
Junior Member
 
Location: Toronto, Canada

Join Date: Jun 2015
Posts: 3
Default

Hi all,

I'm doing single cell RNAseq on a heterogeneous population of cells & I would like to do DE analysis to identify different subpopulations. I plan on using the tophat/cufflinks/monocle pipeline since it was specifically designed for single cell data, but I would also like to apply a raw count method to verify the results. Can DESeq2 be used for DE analysis of single cell data? If so, how would I go about doing that and what would the output look like? I'm very new to bioinformatics so apologies if these are silly questions...

Thanks in advance!
amolinaro is offline   Reply With Quote
Old 06-23-2015, 09:16 PM   #119
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Hello

single cell data is very peculiar, and as a rule I'd say, no - you can not use regular methods for DE analysis of single-cell data. Look at SCDE by P. Kharchenko et. al:

http://pklab.med.harvard.edu/scde/index.html

Overall, people more often use PCA than DE with single-cell data. As you work with them some more, you probably will see why
apredeus is offline   Reply With Quote
Old 06-29-2015, 03:50 AM   #120
Caitriona McEvoy
Junior Member
 
Location: Dublin

Join Date: Aug 2014
Posts: 3
Default

Hi all,
I am very new to r & DEseq2 but have been following the excellent vignette. I have encountered a problem though. I can create the object rld as per the command in the vignette, but when I try to run PCA as suggested I repeatedly encounter the following error:

plotPCA(rld, intgroup=c("Fibrosis", "Sample.ID"))

Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘plotPCA’ for signature ‘"SummarizedExperiment"’
I have also tried:
plotPCA( DESeqTransform(rld ) )
and get the following:
Error in plotPCA(DESeqTransform(rld)) :
error in evaluating the argument 'x' in selecting a method for function 'plotPCA': Error: could not find function "DESeqTransform"
Can anyone advise?
Caitriona
Caitriona McEvoy is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO