SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   DESeq2 error : the model matrix is not full rank (http://seqanswers.com/forums/showthread.php?t=33032)

NicoBxl 08-22-2013 11:35 PM

DESeq2 error : the model matrix is not full rank
 
Hi,

I've got an error in a DESeq2 analysis. So I've 30 samples, with a 3 factor design. So I put a Replicate column in my design . it's for technical replicate. I don't know if it's a good idea...

So when I use DESeq2 :

Code:

dds <- DESeqDataSetFromMatrix(countData=OAR.readCount,colData=design,design= ~ Stranded + Replicate + condition )
and I have an error :

invalid class “DESeqDataSet” object: the model matrix is not full rank, i.e. one or more variables in the design formula are linear combinations of the others

Anyone has an idea to solve this ?

Thanks,

N.

Code:

Sample        condition        Stranded        Replicate
A.1        Ctrl        No        A
B.1        Ctrl        No        B
C.1        Ctrl        No        C
D.1        Tum        No        D
E.1        Tum        No        E
F.1        Tum        No        F
G.1        Tum        No        G
H.1        Tum        No        H
I.1        Tum        No        I
J.1        Tum        No        J
K.1        Tum        No        K
L.1        Tum        No        L
M.1        Ctrl        Yes        M
N.1        Tum        Yes        N
O.1        Tum        Yes        O
P.1        Tum        Yes        P
E.2        Tum        Yes        E
F.2        Tum        Yes        F
I.2        Tum        Yes        I
Q.1        Tum        Yes        Q
K.2        Tum        Yes        K
R.1        Tum        Yes        R
S.1        Tum        Yes        S
T.1        Tum        Yes        T
L.2        Tum        Yes        L
O.2        Tum        Yes        O
T.2        Tum        Yes        T
I.3        Tum        Yes        I
L.3        Tum        Yes        L


Wolfgang Huber 08-25-2013 06:57 AM

Dear N.

try removing either one of the columns "Sample" or "Replicate", it seems they are redundant of each other.

Best wishes
Wolfgang

rozitaa 03-26-2015 05:28 AM

Hi,

I also get the same error but I don't have same columns as N. has!!! Can anyone help me with this? mine look like this:

Code:

muss_log        tissue        gut_microbiota
1        5231        Si5        GF
2        5231        PC        GF
3        5231        liver        GF
4        5232        Si5        GF
5        5232        PC        GF
6        5232        liver        GF
7        5233        Si5        GF
8        5233        PC        GF
9        5233        liver        GF
10        5234        Si5        GF
11        5234        PC        GF
12        5234        liver        GF
13        5161        Si5        mono
14        5161        PC        mono
15        5161        liver        mono
16        5162        Si5        mono
17        5162        PC        mono
18        5162        liver        mono
19        5163        Si5        mono
20        5163        PC        mono
21        5163        liver        mono
22        5164        Si5        prevExci
23        5164        PC        prevExci
24        5164        liver        prevExci
25        5164        liver        prevExci
26        5165        Si5        prevExci
27        5165        PC        prevExci
28        5165        liver        prevExci
29        5166        Si5        prevExci
30        5166        PC        prevExci
31        5166        liver        prevExci
32        5167        Si5        prev
33        5167        PC        prev
34        5167        liver        prev
35        5168        Si5        prev
36        5168        PC        prev
37        5168        liver        prev
38        5169        Si5        prev
39        5169        PC        prev
40        5169        liver        prev
41        5170        Si5        prev
42        5170        PC        prev
43        5170        liver        prev
44        5171        Si5        mono
45        5171        PC        mono
46        5171        liver        mono
47        5172        Si5        mono
48        5172        PC        mono
49        5172        liver        mono
50        5173        Si5        mono
51        5173        PC        mono
52        5173        liver        mono
53        5174        Si5        prev
54        5174        PC        prev
55        5174        liver        prev
56        5175        Si5        prev
57        5175        PC        prev
58        5175        liver        prev
59        5176        Si5        prev
60        5176        PC        prev
61        5176        liver        prev
62        5177        Si5        prevMono
63        5177        PC        prevMono
64        5177        liver        prevMono
65        5178        Si5        prevMono
66        5178        PC        prevMono
67        5178        liver        prevMono
68        5179        Si5        prevMono
69        5179        PC        prevMono


dpryan 03-26-2015 06:11 AM

What's your design?

rozitaa 03-26-2015 06:16 AM

Quote:

Originally Posted by dpryan (Post 163203)
What's your design?


Code:

dds = DESeqDataSetFromHTSeqCount(sampleTable=sample_table, directory='~/dataAnalysis/Petia/barley_mice_rna-seq/htseq/', design= ~ muss_log + tissue + gut_microbiota)

dpryan 03-26-2015 06:19 AM

You've already accounted for "gut_microbiota" with "muss_log", the latter determines the former.

rozitaa 03-26-2015 06:38 AM

Quote:

Originally Posted by dpryan (Post 163205)
You've already accounted for "gut_microbiota" with "muss_log", the latter determines the former.

How? I cannot understand that!!! It's true that each replicates can only have one gut_microbiota status but also for each sample I have three measurements (i.e. measuring three different tissues) that I want to take care of this within sample correlations. For example GF status is in 4 different samples 5231, 5232, 5233, 5234; I cannot see how gut_microbiota can cover muss_log!!!!

dpryan 03-26-2015 07:23 AM

Quote:

How?
Let's take a simpler example:

Code:

df <- data.frame(Sample=factor(c(1:10)), Group=c(rep("A",5), rep("B",5)))
mm <- model.matrix(~Sample+Group, df)
mm

Have a look at "mm". The last column is simply the sum of columns 6-10, meaning that if you estimate those coefficients, then the last coefficient is completely determined by them (or inversely, if you estimate it then the others are already determined). Something along these lines is also the case for the model matrix in your design.

rozitaa 03-26-2015 08:00 AM

Quote:

Originally Posted by dpryan (Post 163214)
Let's take a simpler example:

Code:

df <- data.frame(Sample=factor(c(1:10)), Group=c(rep("A",5), rep("B",5)))
mm <- model.matrix(~Sample+Group, df)
mm

Have a look at "mm". The last column is simply the sum of columns 6-10, meaning that if you estimate those coefficients, then the last coefficient is completely determined by them (or inversely, if you estimate it then the others are already determined). Something along these lines is also the case for the model matrix in your design.


Thanks!
So I will remove muss_log!

chammer 03-14-2017 06:10 AM

I have been trying to use DESeq2 for analysing RNA-seq data in R and I have a small question about that.

So to start, the conditions table looks like this:

Code:

sample        donor    virus    vpu    sex
DonorA1_01    A1    none    mock    male
DonorA1_02    A1    CH293    wt    male
DonorA1_03    A1    CH293    stop    male
DonorA1_04    A1    CH293    R50K    male
DonorA1_05    A1    CH293    teth_count    male
DonorA1_06    A1    CH077    wt    male
DonorA1_07    A1    CH077    stop    male
DonorA1_08    A1    CH077    R50K    male
DonorA1_09    A1    CH077    teth_count    male
DonorA1_10    A1    STC01    wt    male
DonorA1_11    A1    STC01    stop    male
DonorA1_12    A1    STC01    R50K    male
DonorA1_13    A1    STC01    teth_count    male
DonorX_01    X    none    mock    female
DonorX_02    X    CH293    wt    female
DonorX_03    X    CH293    stop    female
DonorX_04    X    CH293    R50K    female
DonorX_05    X    CH293    teth_count    female
DonorX_06    X    CH077    wt    female
DonorX_07    X    CH077    stop    female
DonorX_08    X    CH077    R50K    female
DonorX_09    X    CH077    teth_count    female
DonorX_10    X    STC01    wt    female
DonorX_11    X    STC01    stop    female
DonorX_12    X    STC01    R50K    female
DonorX_13    X    STC01    teth_count    female
DonorY_01    Y    none    mock    male
DonorY_02    Y    CH293    wt    male
DonorY_03    Y    CH293    stop    male
DonorY_04    Y    CH293    R50K    male
DonorY_05    Y    CH293    teth_count    male
DonorY_06    Y    CH077    wt    male
DonorY_07    Y    CH077    stop    male
DonorY_08    Y    CH077    R50K    male
DonorY_09    Y    CH077    teth_count    male
DonorY_10    Y    STC01    wt    male
DonorY_11    Y    STC01    stop    male
DonorY_12    Y    STC01    R50K    male
DonorY_13    Y    STC01    teth_count    male
DonorZ_01    Z    none    mock    female
DonorZ_02    Z    CH293    wt    female
DonorZ_03    Z    CH293    stop    female
DonorZ_04    Z    CH293    R50K    female
DonorZ_05    Z    CH293    teth_count    female
DonorZ_06    Z    CH077    wt    female
DonorZ_07    Z    CH077    stop    female
DonorZ_08    Z    CH077    R50K    female
DonorZ_09    Z    CH077    teth_count    female
DonorZ_10    Z    STC01    wt    female
DonorZ_11    Z    STC01    stop    female
DonorZ_12    Z    STC01    R50K    female
DonorZ_13    Z    STC01    teth_count    female

Now when I specify the model for differential expression analysis as dds <- DESeqDataSetFromTximport(txi, samples, ~vpu+donor+virus), I get an error message:

Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.

Same problem if the model is “vpu + virus” or "donor + sex".

I understand that this is because of collinearity among the variables but I am not sure how to resolve the issue as in my case all covariates are collinear to each other (and not just a pair of collinear variables).

Any help on this will be highly appreciated!

Thanks!
Chris


All times are GMT -8. The time now is 04:33 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.