Seqanswers Leaderboard Ad

**Simon Anders** · 02-18-2015, 11:46 PM

How exactly have you called the "results" function?

In any case, the first line of the output of the results function indicates the exact comparison that is reported. Can you check whether this line was correct in your case?

**apredeus** · 02-19-2015, 12:06 AM

Originally posted by Simon Anders View Post

How exactly have you called the "results" function? In any case, the first line of the output of the results function indicates the exact comparison that is reported. Can you check whether this line was correct in your case?

Oh, I see what the problem was. I have an R function that does the following:

res <- results(ddsMatrix)
resOrdered <- res[order(res$padj),]

Thus I did not see the output you are talking about. Still though, maybe it's a good idea to print it along with messages "estimating size factors" etc?

At any rate, thank you for your reply!

**Michael Love** · 02-19-2015, 07:45 AM

We cannot print it during model fitting, because we fit a model without knowing what comparison will be made. Will it be B vs A? Or C vs B? Or all the interaction terms together as a likelihood ratio test? This is why it is printed when you print the results object to the console, and the information is also stored as metadata columns:

mcols(res)

Check the DESeq2 vignette section on factor levels, which addresses your question.

R's factor function chooses a level ordering alphabetically unless you specify a meaningful order.

**apredeus** · 02-19-2015, 01:51 PM

I see. I'll just add that call to my script then, should be enough. Thank you!

**TomHarrop** · 03-26-2015, 03:13 AM

Hi again,

I'm using the LRT with DESeq2 as follows.

Code:

> design(dds)
~batch + stage
> dds < - DESeq(dds, test = "LRT", reduced = ~ batch)

I see that I can access the calculated deviance for each gene:

Code:

> mcols(mcols(dds), use.names=TRUE)[c(20,25),]
DataFrame with 2 rows and 2 columns
                    type                                   description
             <character>                                   <character>
LRTStatistic     results LRT statistic: '~ batch + stage' vs '~ batch'
deviance         results                    deviance of the full model

Just so I can illustrate to some colleagues what I'm doing using some actual data, I'm wondering, for a given gene, if it's possible to retrieve the expression values predicted by the full and reduced models from dds after running the LRT?

Thanks,

Tom

**Michael Love** · 03-26-2015, 05:07 AM

The fitted means, mu_ij in the paper, for the full model are in assays(dds)[["mu"]]. Not that these include the size factor scaling, so you would have to divide those out to get the q_ij in the paper. See the formula in the vignette or paper for more details.

We don't store the reduced design fitted means, but you can generate these easily by running the GLM without a log fold change prior:

dds.reduced <- dds
design(dds.reduced) <- ~ batch
dds.reduced <- nbinomWaldTest(dds.reduced, betaPrior=FALSE)
assays(dds.reduced)[["mu"]]

**TomHarrop** · 03-26-2015, 08:34 AM

Perfect, thanks.

**rozitaa** · 03-26-2015, 10:57 AM

Hi,

I have recently have changed from DESeq to DESeq2. So I need some help in designing my contrast for the binomial tests. This is my design table:

Code:

sample	tissue	gut_microbiota
1	5231	Si5	GF
2	5231	PC	GF
3	5231	liver	GF
4	5232	Si5	GF
5	5232	PC	GF
6	5232	liver	GF
7	5233	Si5	GF
8	5233	PC	GF
9	5233	liver	GF
10	5234	Si5	GF
11	5234	PC	GF
12	5234	liver	GF
13	5161	Si5	mono
14	5161	PC	mono
15	5161	liver	mono
16	5162	Si5	mono
17	5162	PC	mono
18	5162	liver	mono
19	5163	Si5	mono
20	5163	PC	mono
21	5163	liver	mono
22	5164	Si5	prevExci
23	5164	PC	prevExci
24	5164	liver	prevExci
25	5164	liver	prevExci
26	5165	Si5	prevExci
27	5165	PC	prevExci
28	5165	liver	prevExci
29	5166	Si5	prevExci
30	5166	PC	prevExci
31	5166	liver	prevExci
32	5167	Si5	prev
33	5167	PC	prev
34	5167	liver	prev
35	5168	Si5	prev
36	5168	PC	prev
37	5168	liver	prev
38	5169	Si5	prev
39	5169	PC	prev
40	5169	liver	prev
41	5170	Si5	prev
42	5170	PC	prev
43	5170	liver	prev
44	5171	Si5	mono
45	5171	PC	mono
46	5171	liver	mono
47	5172	Si5	mono
48	5172	PC	mono
49	5172	liver	mono
50	5173	Si5	mono
51	5173	PC	mono
52	5173	liver	mono
53	5174	Si5	prev
54	5174	PC	prev
55	5174	liver	prev
56	5175	Si5	prev
57	5175	PC	prev
58	5175	liver	prev
59	5176	Si5	prev
60	5176	PC	prev
61	5176	liver	prev
62	5177	Si5	prevMono
63	5177	PC	prevMono
64	5177	liver	prevMono
65	5178	Si5	prevMono
66	5178	PC	prevMono
67	5178	liver	prevMono
68	5179	Si5	prevMono
69	5179	PC	prevMono

and this is how I have designed my count data:
dds = DESeqDataSetFromHTSeqCount(sampleTable=sample_table, directory='~/htseq/', design= ~ tissue + gut_microbiota)

Now I would like to test if any gut_microbiota level has significant effects on any of my genes at each separate tissue; (i.e. running the test for each tissue separately). How should I design my contrast formula? and How I should specify specific tissue at each test?

Thanks!

**rozitaa** · 03-26-2015, 11:13 AM

I ran a code like this (hope it is the correct one?!):

Code:

dds_LRT_liver <- nbinomLRT(dds[, dds$tissue=='liver'], full=~gut_microbiota, reduced=~1)
dds_LRT_PC <- nbinomLRT(dds[, dds$tissue=='PC'], full=~gut_microbiota, reduced=~1)
dds_LRT_Si5 <- nbinomLRT(dds[, dds$tissue=='Si5'], full=~gut_microbiota, reduced=~1)

**enrico16** · 06-02-2015, 05:16 AM

See: https://support.bioconductor.org/p/68277/

--

Hi Michael,

Is the Cook's distance-based flagging of p-values performed when using a continuous variable?

I don't see how the condition:

"At least 3 replicates are required for flagging, as it is difficult to judge which sample
might be an outlier with only 2 replicates."

can realistically be met for continuous variables.

If that's the case, how can I remove genes that display such a behavior?
Please let me know if this needs clarification.

Thanks!

**Joe Dever** · 06-10-2015, 05:42 AM

Hi,

I am not familiar with R but I used DESeq in the past following its vignette.
I can't understand how to use DESeq2. I tried to follow the section for HT-Seq input but then it switches to loading the Pasilla dataset.

Is there any tutorial anywhere showing clearly how to use DESeq2 with HTseq-count, like the DESeq vignette did? Please advise.

**Michael Love** · 06-10-2015, 06:39 AM

The first step is the only step which matters how you generate the counts. This is the step when you build the DESeqDataSet. So you only need to follow the htseq steps up until DESeqDataSetFromHTSeqCount(). After that, you have the DESeqDataSet, which you can save as "dds", and then you can follow the rest of the steps below. You can either follow this instruction in the vignette, or you can also read the help file for this particular function by typing ? and the function name into R and press enter:

?DESeqDataSetFromHTSeqCount

Note that you can name the object however you like. To follow the rest of the workflow you can choose 'dds':

dds <- DESeqDataSetFromHTSeqCount(...)

where you will fill in ... with the appropriate code from the vignette and help.

**Joe Dever** · 06-11-2015, 01:30 AM

Originally posted by Michael Love View Post

The first step is the only step which matters how you generate the counts. This is the step when you build the DESeqDataSet. So you only need to follow the htseq steps up until DESeqDataSetFromHTSeqCount(). After that, you have the DESeqDataSet, which you can save as "dds", and then you can follow the rest of the steps below. You can either follow this instruction in the vignette, or you can also read the help file for this particular function by typing ? and the function name into R and press enter:

?DESeqDataSetFromHTSeqCount

Note that you can name the object however you like. To follow the rest of the workflow you can choose 'dds':

dds <- DESeqDataSetFromHTSeqCount(...)

where you will fill in ... with the appropriate code from the vignette and help.

Thanks for your reply, I'll try again.

**amolinaro** · 06-23-2015, 12:02 PM

Hi all,

I'm doing single cell RNAseq on a heterogeneous population of cells & I would like to do DE analysis to identify different subpopulations. I plan on using the tophat/cufflinks/monocle pipeline since it was specifically designed for single cell data, but I would also like to apply a raw count method to verify the results. Can DESeq2 be used for DE analysis of single cell data? If so, how would I go about doing that and what would the output look like? I'm very new to bioinformatics so apologies if these are silly questions...

Thanks in advance!

**apredeus** · 06-23-2015, 09:16 PM

Hello

single cell data is very peculiar, and as a rule I'd say, no - you can not use regular methods for DE analysis of single-cell data. Look at SCDE by P. Kharchenko et. al:

SCDE by Kharchenko Lab at Harvard DBMI

http://pklab.med.harvard.edu/scde/index.html

Overall, people more often use PCA than DE with single-cell data. As you work with them some more, you probably will see why

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 21 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 23 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News