Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 error : the model matrix is not full rank

    Hi,

    I've got an error in a DESeq2 analysis. So I've 30 samples, with a 3 factor design. So I put a Replicate column in my design . it's for technical replicate. I don't know if it's a good idea...

    So when I use DESeq2 :

    Code:
    dds <- DESeqDataSetFromMatrix(countData=OAR.readCount,colData=design,design= ~ Stranded + Replicate + condition )
    and I have an error :

    invalid class “DESeqDataSet” object: the model matrix is not full rank, i.e. one or more variables in the design formula are linear combinations of the others

    Anyone has an idea to solve this ?

    Thanks,

    N.

    Code:
    Sample	condition	Stranded	Replicate
    A.1	Ctrl	No	A
    B.1	Ctrl	No	B
    C.1	Ctrl	No	C
    D.1	Tum	No	D
    E.1	Tum	No	E
    F.1	Tum	No	F
    G.1	Tum	No	G
    H.1	Tum	No	H
    I.1	Tum	No	I
    J.1	Tum	No	J
    K.1	Tum	No	K
    L.1	Tum	No	L
    M.1	Ctrl	Yes	M
    N.1	Tum	Yes	N
    O.1	Tum	Yes	O
    P.1	Tum	Yes	P
    E.2	Tum	Yes	E
    F.2	Tum	Yes	F
    I.2	Tum	Yes	I
    Q.1	Tum	Yes	Q
    K.2	Tum	Yes	K
    R.1	Tum	Yes	R
    S.1	Tum	Yes	S
    T.1	Tum	Yes	T
    L.2	Tum	Yes	L
    O.2	Tum	Yes	O
    T.2	Tum	Yes	T
    I.3	Tum	Yes	I
    L.3	Tum	Yes	L

  • #2
    Dear N.

    try removing either one of the columns "Sample" or "Replicate", it seems they are redundant of each other.

    Best wishes
    Wolfgang
    Wolfgang Huber
    EMBL

    Comment


    • #3
      Hi,

      I also get the same error but I don't have same columns as N. has!!! Can anyone help me with this? mine look like this:

      Code:
      muss_log	tissue	gut_microbiota
      1	5231	Si5	GF
      2	5231	PC	GF
      3	5231	liver	GF
      4	5232	Si5	GF
      5	5232	PC	GF
      6	5232	liver	GF
      7	5233	Si5	GF
      8	5233	PC	GF
      9	5233	liver	GF
      10	5234	Si5	GF
      11	5234	PC	GF
      12	5234	liver	GF
      13	5161	Si5	mono
      14	5161	PC	mono
      15	5161	liver	mono
      16	5162	Si5	mono
      17	5162	PC	mono
      18	5162	liver	mono
      19	5163	Si5	mono
      20	5163	PC	mono
      21	5163	liver	mono
      22	5164	Si5	prevExci
      23	5164	PC	prevExci
      24	5164	liver	prevExci
      25	5164	liver	prevExci
      26	5165	Si5	prevExci
      27	5165	PC	prevExci
      28	5165	liver	prevExci
      29	5166	Si5	prevExci
      30	5166	PC	prevExci
      31	5166	liver	prevExci
      32	5167	Si5	prev
      33	5167	PC	prev
      34	5167	liver	prev
      35	5168	Si5	prev
      36	5168	PC	prev
      37	5168	liver	prev
      38	5169	Si5	prev
      39	5169	PC	prev
      40	5169	liver	prev
      41	5170	Si5	prev
      42	5170	PC	prev
      43	5170	liver	prev
      44	5171	Si5	mono
      45	5171	PC	mono
      46	5171	liver	mono
      47	5172	Si5	mono
      48	5172	PC	mono
      49	5172	liver	mono
      50	5173	Si5	mono
      51	5173	PC	mono
      52	5173	liver	mono
      53	5174	Si5	prev
      54	5174	PC	prev
      55	5174	liver	prev
      56	5175	Si5	prev
      57	5175	PC	prev
      58	5175	liver	prev
      59	5176	Si5	prev
      60	5176	PC	prev
      61	5176	liver	prev
      62	5177	Si5	prevMono
      63	5177	PC	prevMono
      64	5177	liver	prevMono
      65	5178	Si5	prevMono
      66	5178	PC	prevMono
      67	5178	liver	prevMono
      68	5179	Si5	prevMono
      69	5179	PC	prevMono

      Comment


      • #4
        What's your design?

        Comment


        • #5
          Originally posted by dpryan View Post
          What's your design?

          Code:
          dds = DESeqDataSetFromHTSeqCount(sampleTable=sample_table, directory='~/dataAnalysis/Petia/barley_mice_rna-seq/htseq/', design= ~ muss_log + tissue + gut_microbiota)

          Comment


          • #6
            You've already accounted for "gut_microbiota" with "muss_log", the latter determines the former.

            Comment


            • #7
              Originally posted by dpryan View Post
              You've already accounted for "gut_microbiota" with "muss_log", the latter determines the former.
              How? I cannot understand that!!! It's true that each replicates can only have one gut_microbiota status but also for each sample I have three measurements (i.e. measuring three different tissues) that I want to take care of this within sample correlations. For example GF status is in 4 different samples 5231, 5232, 5233, 5234; I cannot see how gut_microbiota can cover muss_log!!!!

              Comment


              • #8
                How?
                Let's take a simpler example:

                Code:
                df <- data.frame(Sample=factor(c(1:10)), Group=c(rep("A",5), rep("B",5)))
                mm <- model.matrix(~Sample+Group, df)
                mm
                Have a look at "mm". The last column is simply the sum of columns 6-10, meaning that if you estimate those coefficients, then the last coefficient is completely determined by them (or inversely, if you estimate it then the others are already determined). Something along these lines is also the case for the model matrix in your design.

                Comment


                • #9
                  Originally posted by dpryan View Post
                  Let's take a simpler example:

                  Code:
                  df <- data.frame(Sample=factor(c(1:10)), Group=c(rep("A",5), rep("B",5)))
                  mm <- model.matrix(~Sample+Group, df)
                  mm
                  Have a look at "mm". The last column is simply the sum of columns 6-10, meaning that if you estimate those coefficients, then the last coefficient is completely determined by them (or inversely, if you estimate it then the others are already determined). Something along these lines is also the case for the model matrix in your design.

                  Thanks!
                  So I will remove muss_log!

                  Comment


                  • #10
                    I have been trying to use DESeq2 for analysing RNA-seq data in R and I have a small question about that.

                    So to start, the conditions table looks like this:

                    Code:
                    sample         donor    virus    vpu    sex
                    DonorA1_01    A1    none    mock    male
                    DonorA1_02    A1    CH293    wt    male
                    DonorA1_03    A1    CH293    stop    male
                    DonorA1_04    A1    CH293    R50K    male
                    DonorA1_05    A1    CH293    teth_count    male
                    DonorA1_06    A1    CH077    wt    male
                    DonorA1_07    A1    CH077    stop    male
                    DonorA1_08    A1    CH077    R50K    male
                    DonorA1_09    A1    CH077    teth_count    male
                    DonorA1_10    A1    STC01    wt    male
                    DonorA1_11    A1    STC01    stop    male
                    DonorA1_12    A1    STC01    R50K    male
                    DonorA1_13    A1    STC01    teth_count    male
                    DonorX_01    X    none    mock    female
                    DonorX_02    X    CH293    wt    female
                    DonorX_03    X    CH293    stop    female
                    DonorX_04    X    CH293    R50K    female
                    DonorX_05    X    CH293    teth_count    female
                    DonorX_06    X    CH077    wt    female
                    DonorX_07    X    CH077    stop    female
                    DonorX_08    X    CH077    R50K    female
                    DonorX_09    X    CH077    teth_count    female
                    DonorX_10    X    STC01    wt    female
                    DonorX_11    X    STC01    stop    female
                    DonorX_12    X    STC01    R50K    female
                    DonorX_13    X    STC01    teth_count    female
                    DonorY_01    Y    none    mock    male
                    DonorY_02    Y    CH293    wt    male
                    DonorY_03    Y    CH293    stop    male
                    DonorY_04    Y    CH293    R50K    male
                    DonorY_05    Y    CH293    teth_count    male
                    DonorY_06    Y    CH077    wt    male
                    DonorY_07    Y    CH077    stop    male
                    DonorY_08    Y    CH077    R50K    male
                    DonorY_09    Y    CH077    teth_count    male
                    DonorY_10    Y    STC01    wt    male
                    DonorY_11    Y    STC01    stop    male
                    DonorY_12    Y    STC01    R50K    male
                    DonorY_13    Y    STC01    teth_count    male
                    DonorZ_01    Z    none    mock    female
                    DonorZ_02    Z    CH293    wt    female
                    DonorZ_03    Z    CH293    stop    female
                    DonorZ_04    Z    CH293    R50K    female
                    DonorZ_05    Z    CH293    teth_count    female
                    DonorZ_06    Z    CH077    wt    female
                    DonorZ_07    Z    CH077    stop    female
                    DonorZ_08    Z    CH077    R50K    female
                    DonorZ_09    Z    CH077    teth_count    female
                    DonorZ_10    Z    STC01    wt    female
                    DonorZ_11    Z    STC01    stop    female
                    DonorZ_12    Z    STC01    R50K    female
                    DonorZ_13    Z    STC01    teth_count    female
                    Now when I specify the model for differential expression analysis as dds <- DESeqDataSetFromTximport(txi, samples, ~vpu+donor+virus), I get an error message:

                    Error in checkFullRank(modelMatrix) :
                    the model matrix is not full rank, so the model cannot be fit as specified.
                    One or more variables or interaction terms in the design formula are linear
                    combinations of the others and must be removed.

                    Same problem if the model is “vpu + virus” or "donor + sex".

                    I understand that this is because of collinearity among the variables but I am not sure how to resolve the issue as in my case all covariates are collinear to each other (and not just a pair of collinear variables).

                    Any help on this will be highly appreciated!

                    Thanks!
                    Chris

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X