SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq2 experimental design/analysis crh Bioinformatics 1 05-05-2015 11:30 PM
DESeq2 help.. design pinuscatcus Bioinformatics 1 04-10-2015 05:53 AM
Unbalanced block design analysis with DESeq2 youssefd Bioinformatics 6 03-13-2015 03:43 AM
Deseq2 design Wei-HD Bioinformatics 11 09-04-2014 11:13 AM
Paired design versus unpaired design in DESeq2 KristenC RNA Sequencing 1 05-29-2014 11:05 AM

Reply
 
Thread Tools
Old 09-15-2015, 01:18 AM   #1
Fischer
Junior Member
 
Location: Italy

Join Date: Mar 2015
Posts: 8
Default Problem: DESeq2 analysis with very unbalanced design

Hello,

I have a question about my DESeq2 analysis.
I have to compare expression of miRNAs in different disease variants but I have a problem, because the design is not balanced ; this is my data:
Disease variant 1: 5 replicates
Disease variant 2: 26 replicates
Disease variant 3: 5 replicates
Disease variant 4: 8 replicates
Disease variant 1,3,4 have fewer patients than the variant 2 because are rare variants of the disease, then their frequency in a population is very low (it is difficult to find replicates!).
Can deseq2 working well with these unbalanced samples?

PS. Sorry if there are errors in the text, but I don't speak English very well
Thank you very much in advance,
Fischer
Fischer is offline   Reply With Quote
Old 09-15-2015, 01:36 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Unbalanced designs aren't a problem, you just have lower power with the variants containing fewer samples.
dpryan is offline   Reply With Quote
Old 09-15-2015, 01:50 AM   #3
Fischer
Junior Member
 
Location: Italy

Join Date: Mar 2015
Posts: 8
Default

Thank you for reply!
Then do you think that is correct an analysis with deseq2 in my case?
Practically I only have low accuracy in the results of these variants, right?
Fischer is offline   Reply With Quote
Old 09-15-2015, 02:17 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Sure, I'd still use DESeq2 if this were my dataset.
dpryan is offline   Reply With Quote
Old 09-15-2015, 02:33 AM   #5
Fischer
Junior Member
 
Location: Italy

Join Date: Mar 2015
Posts: 8
Default

Thank you again for reply! I have another question, if you can help me again..
Becouse we had a problem in our lab, some of the samples (15%) were extracted with the Hiseq, while the remaining with the Myseq.. so the initial frequencies of miRNAs in the samples are differents because of the use of two different instruments (Hiseq frequencies are higher).. It could be a problem for the data analysis or DESeq2 solves this problem with normalization?
Fischer is offline   Reply With Quote
Old 09-15-2015, 04:41 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

By "extracted" I assume you mean "sequenced". Were the HiSeq and MiSeq libraries prepared at the same time? If everything was prepared at the same time and with the same procedure and just sequenced on different machines then the library size normalization will take care of things. If not, then you should add a batch nuisance variable into your model.
dpryan is offline   Reply With Quote
Old 09-15-2015, 04:58 AM   #7
Fischer
Junior Member
 
Location: Italy

Join Date: Mar 2015
Posts: 8
Default

Quote:
Originally Posted by dpryan View Post
Were the HiSeq and MiSeq libraries prepared at the same time?
Yes, they were prepared at the same time and with the same kit.

Quote:
Originally Posted by dpryan View Post
If everything was prepared at the same time and with the same procedure and just sequenced on different machines then the library size normalization will take care of things. If not, then you should add a batch nuisance variable into your model.
The only difference is in the sequencer machine. We used both Hiseq and Miseq, so some samples have an higher number of reads than other.
Fischer is offline   Reply With Quote
Old 09-15-2015, 05:13 AM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

OK, in theory that should be OK. In practice, though, it's good to make a PCA plot and then see if samples start clustering by machine. If that's the case then you have a notable machine effect and can just add a variable to your model. Alternatively, you could see if svaseq finds a meaningful batch effect worthy of compensation.
dpryan is offline   Reply With Quote
Old 09-15-2015, 05:50 AM   #9
Fischer
Junior Member
 
Location: Italy

Join Date: Mar 2015
Posts: 8
Default

Ok, I created a new variable that identify Hiseq/Miseq and I redid the model with these commands ("categories" is the "disease variants" variable, "machine" is the new variable ):

pg2 <- newCountDataSet(countTable,categories)

countD <- counts(pg2)

colData <- data.frame(rownames=colnames(countD), condition=categories, mach=machine)

cds <- DESeqDataSetFromMatrix(countData=countD,colData=colData, design=~condition+mach)

dds <- DESeq(cds)

Is the model correct?
then I made PCA:

rld <- rlog(dds)
plotPCA(rld, intgroup=c("mach"))

this is the result:

Attached Images
File Type: jpeg Rplot.jpeg (50.6 KB, 33 views)

Last edited by Fischer; 09-15-2015 at 05:54 AM.
Fischer is offline   Reply With Quote
Old 09-15-2015, 06:21 AM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I guess there is a batch effect (glad I suggested you check!). You might also figure out what's going on with those 2 samples leading to PC1.
dpryan is offline   Reply With Quote
Old 09-16-2015, 12:00 AM   #11
Fischer
Junior Member
 
Location: Italy

Join Date: Mar 2015
Posts: 8
Default

Thank you so much for your suggestion!
Because these two samples have a strange behavior, in your opinion, can I delete them from analysis? For design it wouldn't be a problem because they are "disease variant 2" samples.

this is the results without these two samples, and with a model design:
~variant+mach

Attached Images
File Type: jpeg Rplot.jpeg (35.5 KB, 25 views)

Last edited by Fischer; 09-16-2015 at 12:05 AM.
Fischer is offline   Reply With Quote
Old 09-16-2015, 12:06 AM   #12
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You should try to see if there's a good reason why they're doing that first (not to mention also doing some hierarchical clustering). In general, though, I would say that those samples are good candidates for exclusion if they can't be otherwise explained (e.g., due to having much lower coverage).
dpryan is offline   Reply With Quote
Old 09-16-2015, 01:22 AM   #13
Neuromancer
Member
 
Location: Goettingen, Germany

Join Date: Aug 2011
Posts: 28
Default

Hi DESeq2 experts,

I have a very related question. My group design is as following:
Control
A, n=4
B, n=8

KO
C, n=4
D, n=12

Groups A,C are untreated, B,D treated.
So far so good, I used DESeq2 to compare AvsB and CvsD and now I am looking at the differences of these comparisons (rather than directly comparing BvsD, which I am also doing, but that's not the question here).
As you can imagine I get a more DE genes in CvsD, as D has 50% more samples than B, while A and C have the same number of samples. But it's a lot more (AvsB: ~1000; CvsD: ~2500, so 2.5x more, using same FDR/log2FC cutoffs of course).

So my question is: Is my "meta-comparison", i.e. looking at what is different in both comparisons actually valid? And is the 2.5-fold difference in DE genes more likely to be a result of group D having higher n (so CvsD has more power than AvsB) or could it also be due to experimental condition, which would be great as that would be biologically meaningful (which was of course the hypothesis)?

To be more precise: in my CvsD comparison I get a highly interesting group of genes, so good enrichment of this pathway, while in my AvsB comparison I don't get any of those - and now I'm afraid that this might be due to design rather than biology!

Any suggestions would be much appreciated.
Neuromancer is offline   Reply With Quote
Old 09-16-2015, 01:31 AM   #14
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Comparing lists that made based on p-value and fold-change thresholds is the path of last resort. Your design lends itself nicely to a factorial treatment and those are the questions that likely make the most biological sense...so just do that instead.
dpryan is offline   Reply With Quote
Old 09-16-2015, 01:48 AM   #15
Neuromancer
Member
 
Location: Goettingen, Germany

Join Date: Aug 2011
Posts: 28
Default

Thanks @dpryan!

You are probably right. I'll have a look at factorial design then.
Neuromancer is offline   Reply With Quote
Old 09-16-2015, 03:08 AM   #16
Fischer
Junior Member
 
Location: Italy

Join Date: Mar 2015
Posts: 8
Default

Thank you again, finally we decided to exclude these samples from the analysis!
I have only last question for you about interpreting the results.. when I make the MAplot of a comparison, how do I know what level is represented by the red line?
For example:

res2 <- results(dds,contrast=c("condition","V1","V2"))
resOrd_res2 <- res2[order(res2$padj),]
plotMA(resOrd_res2,alpha=0.05,main="1A vs. 1B",ylim=c(-3,3))

In V1 vs V2 comparison I found 1 miRNA DE, that is represented (by a red point) in MAplot below the red line.
Is this miRNA under-expressed in the Variant 1 with respect to the Variant 2 (then V2 is the red line), or vice versa?

Last edited by Fischer; 09-16-2015 at 03:16 AM.
Fischer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO