Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 paired data: "model is not full rank"

    Dear all,

    I have a question that is similar to others I've seen, but I am still not sure how to solve it.

    Data is:
    * RNA-seq from Illumina Hiseq machine
    * Mus musculus
    * 24 samples

    Input is count data from htseq-count.

    I have different types of mice (control, mutant1, mutant2) and different parts of the brain sequenced for each (part1, part2).

    So data is paired, e.g. part1 and part2 from the same mouse are paired.

    Here is what my sample table looks like (ordered by condition), pairs basically represents the different mice:

    sampleName fileName condition pairs
    Control_part1_1 Control_part1_1_htseq.txt Control_part1 10
    Control_part1_2 Control_part1_2_htseq.txt Control_part1 12
    Control_part2_1 Control_part2_1_htseq.txt Control_part2 10
    Control_part2_2 Control_part2_2_htseq.txt Control_part2 12
    mutant1_part1_4 mutant1_part1_4_htseq.txt mutant1_part1 1
    mutant1_part1_5 mutant1_part1_5_htseq.txt mutant1_part1 2
    mutant1_part1_1 mutant1_part1_1_htseq.txt mutant1_part1 3
    mutant1_part1_2 mutant1_part1_2_htseq.txt mutant1_part1 4
    mutant1_part1_3 mutant1_part1_3_htseq.txt mutant1_part1 5
    mutant1_part2_1 mutant1_part2_1_htseq.txt mutant1_part2 1
    mutant1_part2_2 mutant1_part2_2_htseq.txt mutant1_part2 2
    mutant1_part2_3 mutant1_part2_3_htseq.txt mutant1_part2 3
    mutant1_part2_4 mutant1_part2_4_htseq.txt mutant1_part2 4
    mutant1_part2_5 mutant1_part2_5_htseq.txt mutant1_part2 5
    mutant2_part1_1 mutant2_part1_1_htseq.txt mutant2_part1 6
    mutant2_part1_2 mutant2_part1_2_htseq.txt mutant2_part1 7
    mutant2_part1_3 mutant2_part1_3_htseq.txt mutant2_part1 8
    mutant2_part1_4 mutant2_part1_4_htseq.txt mutant2_part1 9
    mutant2_part2_1 mutant2_part2_1_htseq.txt mutant2_part2 6
    mutant2_part2_2 mutant2_part2_2_htseq.txt mutant2_part2 7
    mutant2_part2_3 mutant2_part2_3_htseq.txt mutant2_part2 8
    mutant2_part2_4 mutant2_part2_4_htseq.txt mutant2_part2 9

    # I first tried unpaired analysis
    se <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable,
    directory=".",
    design=~condition)
    se1 <- DESeq(se)

    # then paired analysis, taking into account the "pairs" column
    se.p <- se1
    design(se.p) <- formula(~pairs+condition)

    se.p <- DESeq(se.p)

    and I get the following error message:

    "Error in checkFullRank(modelMatrix) :
    the model matrix is not full rank, so the model cannot be fit as specified.
    One or more variables or interaction terms in the design formula are linear
    combinations of the others and must be removed.
    See the section 'Model matrix not full rank' in vignette('DESeq2')
    "

    So I went to section "3.12 Model matrix not full rank" of the manual, and now I am a bit confused, as I don't quite understand why it fails (I don't really see the linear combination) and I don't know how to modify the design accurately so as to take into account the paired information...

    The example in the vignette modifies the "ind" column (which corresponds to my "pairs" column) the following way:
    ## grp ind cnd ind.n
    ## 1 X 1 A 1
    ## 2 X 1 B 1
    ## 3 X 2 A 2
    ## 4 X 2 B 2
    ## 5 Y 3 A 1
    ## 6 Y 3 B 1
    ## 7 Y 4 A 2
    ## 8 Y 4 B 2

    If I modify my design similarly, I would get (I just copy these 3 columns as it gets messy):

    condition pairs pairs.n
    Control_part1 10 1
    Control_part1 12 2
    Control_part2 10 1
    Control_part2 12 2
    mutant1_part1 1 1
    mutant1_part1 2 2
    mutant1_part1 3 3
    mutant1_part1 4 4
    mutant1_part1 5 5
    mutant1_part2 1 1
    mutant1_part2 2 2
    mutant1_part2 3 3
    mutant1_part2 4 4
    mutant1_part2 5 5
    mutant2_part1 6 1
    mutant2_part1 7 2
    mutant2_part1 8 3
    mutant2_part1 9 4
    mutant2_part2 6 1
    mutant2_part2 7 2
    mutant2_part2 8 3
    mutant2_part2 9 4

    Is it correct to use this design? Any suggestion?

    Thanks!

  • #2
    Pairs 10+12 is the same as Control_part1 and Control_part2. You have similar linear combinations with the mutant conditions.

    I would suggest a slightly modified version of that trick from the DESeq2 vignette. Instead of using pairs.n, use something like:

    Code:
    df$pairs.n2 <- factor(c(1,12,1,12,1:5,1:5,1,7:9,1,7:9))
    "~condition+pairs.n2" should be full rank and still model the same thing.

    Comment


    • #3
      Hi Devon,

      Thanks a lot for your answer!

      There is something unclear to me regarding this, and regarding what I had previously posted.
      There are several "1s" present in the pairs information, so I am wondering if the model can get confused with those..

      Let's take an example from my data:

      condition pairs.n2
      Control_part1 1
      Control_part2 1
      mutant1_part1 1

      Control_part1 and Control_part2 are from the same mouse, but mutant1_part1 is a completely different mouse.
      By putting 1 for the 3 of them, doesn't the model consider that they are dependent?

      Thanks!
      Last edited by sbcn; 09-10-2015, 05:02 AM.

      Comment


      • #4
        They would only end up being dependent if they had the same condition. In reality, pairs.n2 with a value of 1 will get incorporated into the intercept, so this will become the baseline level for the other coefficients. That ends up being the whole trick and is in fact why you were having rank issues before.

        BTW, sorry if this explanation is "as clear as mud". I have yet to come up with a good way of explaining issues related to model matrices without a white board to draw on.

        Comment


        • #5
          Thanks Devon!
          I don't think your explanation is unclear, it is just the whole concept that is quite hard to understand...

          Given your last answer: does it mean that my model is wrong, in the sense that Control_part1 and Control_part2 won't be considered dependent, although they were extracted from the same mouse, because they don't have the same condition?

          Comment


          • #6
            We're using somewhat different meanings of dependent. I meant that they won't be described by the same coefficients.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            50 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X