Seqanswers Leaderboard Ad

**dpryan** · 09-29-2014, 06:21 AM

1) Use DESeq2 rather than DESeq

2) Don't use RPKM for statistics unless you really really know what you're doing (in which case you probably wouldn't want to use it anyway).

3) Use patient as a blocking factor in a GLM (you can do this with DESeq2, edgeR, limma, etc.).

4) Human experiments are pretty noisy, advise the biologists to drastically increase the number of patients (if possible) next time.

**SylvainL** · 09-29-2014, 07:53 AM

Hi dpryan,

Yes, I will use the raw counts as explained in both manuals of DESeq and DESeq2.

I noticed in the vignette of DESeq2 that you can use "patient" as parameter and I wondered if it will do what I want. So you confirm I can NOT do it with DESeq... (pity because I'm using an old version of R, version 2.14, because of lack of compatibility with R 3.0 with some of other packages I really need for other analyses. But it's ok, I will see if I can install two versions of R on the same machine).

Considering the noise, that's also why I first wanted to really consider all the triplicates without the patient factor... expecting that the noise would be not significant and then be able to focus on the real changes due to the different stimuli; but actually, doing so, I do not have in the differentially expressed genes one of their interest genes (and confirmed by qPCR).

These experiments are much more complexed than I did described (many stimuli and time-courses) so getting triplicates was already not that bad considering they work with primary cells...

**dpryan** · 09-29-2014, 09:06 AM

You can do that with DESeq as well, it just lacks a lot of beneficial functionality.

**SylvainL** · 09-29-2014, 09:48 AM

How can I do it with DESeq? Can I add the "patient" parameter in addition of the "condition" one, like count ~patient + condition Sorry if my question seem naive...

**dpryan** · 09-29-2014, 09:57 AM

Yup, exactly. You'll have to use the GLM commands rather than nbinomLRT (or whatever it's called in DESeq), but that's not a problem. I should note that the same design (~patient+condition or more complicated things like ~patient+treatment*condition*time) can be used in edgeR, limma, and pretty much any other similar R package.

**SylvainL** · 10-01-2014, 12:09 AM

Hi,

I have an additional question: Once I did both fit on the full and the reduced models (in my case patient + condition; and condition alone since I want to see how taking into consideration the patient improves the DE analysis) and calculating the adjusted pvalues, I am a bit stuck...

Here is the part of my script...

fit1 <- fitNbinomGLMs(cds, count ~ patient + condition)
fit0 <- fitNbinomGLMs(cds, count ~ condition)
pvalsGLM <- nbinomGLMTest(fit1, fit0)
padjGLM <- p.adjust(pvalsGLM, method="BH")

Now, how can I get the final fold-changes? Are they the same than simply calculated by the function nbinomTest (in this case the fitNbinomGLMs I guess) and I have to replace the pvalues by the padlGLM?

I looked in the vignette and it is still unclear for me...

Sorry if my question is really too stupid.

Sylvain

**SylvainL** · 10-01-2014, 05:13 AM

Actually I have some doubts...

When I look at the DE genes between two conditions (without considering the patient), I get 450 significant differentially expressed genes...
When I do the script above, I have only 13!!

So it seems obvious I do something wrong. I am a bit comcerned about the fact that when I compare patient + condition to condition only, I will get as output the genes which are behaving differently between the patients. And it is exactly the opposite I want, but I am not really sure how to get the interest guys.

So what I want are the genes which have similar fold-changes after the stimulus, but these genes may have different basal expression between the patients...

I wonder if I should not just consider every patient separately, then do DE without replicates and then simply compare the 3 fold-changes obtained by each gene in each patient... Not the most elegant and probably not the fastest...

Sylvain

**dpryan** · 10-01-2014, 05:24 AM

It's quite possible that what you're seeing is an effect due to the uncontrolled patient effect (i.e., false positives) in the result with 450 DE genes. This is particularly true given the N=3 (your residual degrees of freedom are pretty much shot).

The comparison you're interested is ~patient+condition vs. ~patient (or just use a Wald test and look at the appropriate coefficient in the ~patient+condition test).

**SylvainL** · 10-01-2014, 05:34 AM

Hi dpryan,

so to summarize, if I want to get the genes affected by a stimulus in all 3 patients, meaning in fold-changes, even if the basal levels of these genes may differ in all 3 patients, I have to compare ¬condition+patient to ~patient...

And then I take the padjGLM to know which genes are significant and the fold-changes will be in the fit1?

Is it ok doing like that?

The differences between no patient taken into consideration and being taken into consideration is really more important than I thought... Really good to know!!

Thanks in advance

s.

**dpryan** · 10-01-2014, 05:42 AM

Note that comparing ~patient+condition versus ~patient will be looking for changes due to any condition (it's like a one-way ANOVA). Maybe you want this, maybe not, depends on the question you're trying to answer. But yes, the "patient" factor absorbs the differences in basal expression.

Yes, use p.adjust() and use the adjusted p-values to determine significance (note that you can use a looser cut off than the normal 0.05).

**SylvainL** · 10-01-2014, 06:04 AM

Thanks,

actually, after calculating the dispersion on the whole dataset, I wanted to subset it only to the two conditions I want to compare to get only the changes due to one stimulus (oif compared to the "no stimulus" condition).

s.

**SylvainL** · 10-01-2014, 11:24 PM

Hi,

after one night thinking about this analysis, I have one more question (sorry to bother but I really dislike doing something without understanding at least a minimum):
When I compare ~patient+condition to ~patient, I understand I will get in the output (meaning after sorting by pvalue threshold) the guys which are differentially expressed between my conditions, even if the basal level of expression is different between the patients... I want to be sure I will also get the guys affected by the treatment, even if the basal expressions are similar in the patients...

To make it sure, will I get all the genes having similar fold-chanages after the treatment, with or without similar basal expression levels between patients?

Many thanks in advance,

s.

**dpryan** · 10-01-2014, 11:34 PM

I assume that "treatment" and "condition" are the same here (otherwise, just show an example of how your samples are grouped).

Comparing ~patient+condition to ~patient will find everything affected by condition regardless of whether there's a patient-effect. So don't worry that you'll be screwing things up for genes for which there is no patient effect (you will get slightly different results for them, but I think most people would argue that that's OK).

**SylvainL** · 10-01-2014, 11:47 PM

Yes, treatment and condition are the same...

My data are pooled like that:
sample#1: patient A, no stimulus
sample#2: patient A, stimulus 1
sample#3: patient A, stimulus 2
sample#4: patient B, no stimulus
sample#5: patient B, stimulus 1
sample#6: patient B, stimulus 2
sample#7: patient C, no stimulus
sample#8: patient C, stimulus 1
sample#9: patient C, stimulus 2

Let's say for gene a, the basal level of expression (in the no stimulus condition) are the same in all the patients, but this gene is induced 10-fold by stimulus 1 in each patient. I expect to have this guy in the significant genes (if the basal expression level is high enough...).
Now, for gene b, let's say this guy is still induced 10-fold by stimulus1 in all the patients, but it's level in patient B was 5-fold lower than in patients A and C (so it's expression level in patient B in the stimulus 1 condition is still more or less 5-fold lower than in both patients A and C in the stimulus 1 condition). I still expect to have this guy in the significantly differentially expressed genes...

s.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 58 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Differential Expression in different patients

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News