View Single Post
Old 11-17-2014, 06:04 AM   #1
Junior Member
Location: Belgium

Join Date: Nov 2014
Posts: 4
Default Unbalanced block design analysis with DESeq2


I have RNAseq samples from 11 Patients. For each patient
the samples were taken from two difference locations (Ileum and Caecum)
and had two different inflammation status at those locations.
However, the design is not balanced as I do not have 4 samples for each patient.
Below is a description of the dataset

1   Caecum     NI
1    Ileum     NI
3   Caecum      I
3    Ileum     NI
4    Ileum      I
5   Caecum     NI
5   Caecum      I
5   Caecum     NI
6    Ileum      I
6   Caecum      I
7   Caecum     NI
7    Ileum      I
8    Ileum     NI
8    Ileum      I
9    Ileum     NI
9    Ileum      I
9   Caecum      I
10    Ileum     NI
10   Caecum     NI
12    Ileum     NI
12   Caecum     NI
14    Ileum     NI
14    Ileum      I
14    Ileum      I

I am interested in testing if there is a difference between the Location in general, the Status in general,
the Status given the Location and the interaction between status and location.

Therefore, I used DESeq2 with the following design
~ Patient + Location + Status + Location:Status
From what I understood this will remove the variation due to Patient and test for
the effect of Location, Status and the interaction between the two.

To check the difference for Location in general, I extracted the results of the DESeq analysis
using contrast c("Location", "Ileum", "Caecum"). For Status, I used contrast = c("Status", "NI", "I")

To test for the effect of Status given the Location, I used
contrast = list("LocationCaecum.StatusI", "LocationCaecum.StatusNI").

To check for the effect of Location on Status (i.e. inflamed caecum has additional effect
than Caecum alone and inflammation alone) I used
contrast = list(c("StatusI", "LocationCaecum.StatusI"), c("StatusNI", "LocationCaecum.StatusNI"))

Considering that the high amount of missing samples (or incomplete blocks), does the design
formula I am using make sense? For instance, when comparing on Location, does the pairing have any value
as it will only work for the following 8 samples (Or am I wrong?)

Sample Location Status
5   Caecum     NI
5   Caecum      I
8    Ileum     NI
8    Ileum      I
9    Ileum     NI
9    Ileum      I
14   Ileum     NI
14   Ileum      I
And is it OK to use the whole dataset for these tests or it is better to subset it first
then do a DESeq2 analysis for each case separately. For example select all NI samples, then compare them between Ileum and Caecum with the design below

design = formula(~ Patient + Status)

Thank you very much in advance,
youssefd is offline   Reply With Quote