Seqanswers Leaderboard Ad

**dpryan** · 09-15-2015, 01:36 AM

Unbalanced designs aren't a problem, you just have lower power with the variants containing fewer samples.

**Fischer** · 09-15-2015, 01:50 AM

Thank you for reply!
Then do you think that is correct an analysis with deseq2 in my case?
Practically I only have low accuracy in the results of these variants, right?

**dpryan** · 09-15-2015, 02:17 AM

Sure, I'd still use DESeq2 if this were my dataset.

**Fischer** · 09-15-2015, 02:33 AM

Thank you again for reply! I have another question, if you can help me again..
Becouse we had a problem in our lab, some of the samples (15%) were extracted with the Hiseq, while the remaining with the Myseq.. so the initial frequencies of miRNAs in the samples are differents because of the use of two different instruments (Hiseq frequencies are higher).. It could be a problem for the data analysis or DESeq2 solves this problem with normalization?

**dpryan** · 09-15-2015, 04:41 AM

By "extracted" I assume you mean "sequenced". Were the HiSeq and MiSeq libraries prepared at the same time? If everything was prepared at the same time and with the same procedure and just sequenced on different machines then the library size normalization will take care of things. If not, then you should add a batch nuisance variable into your model.

**Fischer** · 09-15-2015, 04:58 AM

Originally posted by dpryan View Post

Were the HiSeq and MiSeq libraries prepared at the same time?

Yes, they were prepared at the same time and with the same kit.

Originally posted by dpryan View Post

If everything was prepared at the same time and with the same procedure and just sequenced on different machines then the library size normalization will take care of things. If not, then you should add a batch nuisance variable into your model.

The only difference is in the sequencer machine. We used both Hiseq and Miseq, so some samples have an higher number of reads than other.

**dpryan** · 09-15-2015, 05:13 AM

OK, in theory that should be OK. In practice, though, it's good to make a PCA plot and then see if samples start clustering by machine. If that's the case then you have a notable machine effect and can just add a variable to your model. Alternatively, you could see if svaseq finds a meaningful batch effect worthy of compensation.

**Fischer** · 09-15-2015, 05:50 AM

Ok, I created a new variable that identify Hiseq/Miseq and I redid the model with these commands ("categories" is the "disease variants" variable, "machine" is the new variable ):

pg2 <- newCountDataSet(countTable,categories)

countD <- counts(pg2)

colData <- data.frame(rownames=colnames(countD), condition=categories, mach=machine)

cds <- DESeqDataSetFromMatrix(countData=countD,colData=colData, design=~condition+mach)

dds <- DESeq(cds)

Is the model correct?
then I made PCA:

rld <- rlog(dds)
plotPCA(rld, intgroup=c("mach"))

this is the result:

Attached Files

Rplot.jpeg (50.6 KB, 35 views)

**dpryan** · 09-15-2015, 06:21 AM

I guess there is a batch effect (glad I suggested you check!). You might also figure out what's going on with those 2 samples leading to PC1.

**Fischer** · 09-16-2015, 12:00 AM

Thank you so much for your suggestion!
Because these two samples have a strange behavior, in your opinion, can I delete them from analysis? For design it wouldn't be a problem because they are "disease variant 2" samples.

this is the results without these two samples, and with a model design:
~variant+mach

Attached Files

Rplot.jpeg (35.5 KB, 27 views)

**dpryan** · 09-16-2015, 12:06 AM

You should try to see if there's a good reason why they're doing that first (not to mention also doing some hierarchical clustering). In general, though, I would say that those samples are good candidates for exclusion if they can't be otherwise explained (e.g., due to having much lower coverage).

**Neuromancer** · 09-16-2015, 01:22 AM

Hi DESeq2 experts,

I have a very related question. My group design is as following:
Control
A, n=4
B, n=8

KO
C, n=4
D, n=12

Groups A,C are untreated, B,D treated.
So far so good, I used DESeq2 to compare AvsB and CvsD and now I am looking at the differences of these comparisons (rather than directly comparing BvsD, which I am also doing, but that's not the question here).
As you can imagine I get a more DE genes in CvsD, as D has 50% more samples than B, while A and C have the same number of samples. But it's a lot more (AvsB: ~1000; CvsD: ~2500, so 2.5x more, using same FDR/log2FC cutoffs of course).

So my question is: Is my "meta-comparison", i.e. looking at what is different in both comparisons actually valid? And is the 2.5-fold difference in DE genes more likely to be a result of group D having higher n (so CvsD has more power than AvsB) or could it also be due to experimental condition, which would be great as that would be biologically meaningful (which was of course the hypothesis)?

To be more precise: in my CvsD comparison I get a highly interesting group of genes, so good enrichment of this pathway, while in my AvsB comparison I don't get any of those - and now I'm afraid that this might be due to design rather than biology!

Any suggestions would be much appreciated.

**dpryan** · 09-16-2015, 01:31 AM

Comparing lists that made based on p-value and fold-change thresholds is the path of last resort. Your design lends itself nicely to a factorial treatment and those are the questions that likely make the most biological sense...so just do that instead.

**Neuromancer** · 09-16-2015, 01:48 AM

Thanks @dpryan!

You are probably right. I'll have a look at factorial design then.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 50 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Problem: DESeq2 analysis with very unbalanced design

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News