Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • chammer
    Junior Member
    • Feb 2014
    • 5

    DESeq2 collinearity issue

    Hi everybody,

    I have been trying to use DESeq2 for analyzing RNA-seq data, and ran into a problem.

    The conditions table looks like this:

    Code:
    sample	donor	virus	vpu
    DonorA1_01	A1	none	mock
    DonorA1_02	A1	CH293	wt
    DonorA1_03	A1	CH293	stop
    DonorA1_04	A1	CH293	R50K
    DonorA1_05	A1	CH293	teth_count
    DonorA1_06	A1	CH077	wt
    DonorA1_07	A1	CH077	stop
    DonorA1_08	A1	CH077	R50K
    DonorA1_09	A1	CH077	teth_count
    DonorA1_10	A1	STC01	wt
    DonorA1_11	A1	STC01	stop
    DonorA1_12	A1	STC01	R50K
    DonorA1_13	A1	STC01	teth_count
    DonorX_01	X	none	mock
    DonorX_02	X	CH293	wt
    DonorX_03	X	CH293	stop
    DonorX_04	X	CH293	R50K
    DonorX_05	X	CH293	teth_count
    DonorX_06	X	CH077	wt
    DonorX_07	X	CH077	stop
    DonorX_08	X	CH077	R50K
    DonorX_09	X	CH077	teth_count
    DonorX_10	X	STC01	wt
    DonorX_11	X	STC01	stop
    DonorX_12	X	STC01	R50K
    DonorX_13	X	STC01	teth_count
    DonorY_01	Y	none	mock
    DonorY_02	Y	CH293	wt
    DonorY_03	Y	CH293	stop
    DonorY_04	Y	CH293	R50K
    DonorY_05	Y	CH293	teth_count
    DonorY_06	Y	CH077	wt
    DonorY_07	Y	CH077	stop
    DonorY_08	Y	CH077	R50K
    DonorY_09	Y	CH077	teth_count
    DonorY_10	Y	STC01	wt
    DonorY_11	Y	STC01	stop
    DonorY_12	Y	STC01	R50K
    DonorY_13	Y	STC01	teth_count
    DonorZ_01	Z	none	mock
    DonorZ_02	Z	CH293	wt
    DonorZ_03	Z	CH293	stop
    DonorZ_04	Z	CH293	R50K
    DonorZ_05	Z	CH293	teth_count
    DonorZ_06	Z	CH077	wt
    DonorZ_07	Z	CH077	stop
    DonorZ_08	Z	CH077	R50K
    DonorZ_09	Z	CH077	teth_count
    DonorZ_10	Z	STC01	wt
    DonorZ_11	Z	STC01	stop
    DonorZ_12	Z	STC01	R50K
    DonorZ_13	Z	STC01	teth_count
    When I specify the model for differential expression analysis as dds <- DESeqDataSetFromTximport(txi, samples, ~vpu+donor+virus), I get an error message:

    Error in checkFullRank(modelMatrix) :
    the model matrix is not full rank, so the model cannot be fit as specified.
    One or more variables or interaction terms in the design formula are linear
    combinations of the others and must be removed.

    Same problem if the model is “vpu + virus” only.

    I understand that this is because of collinearity among the variables but I am not sure how to resolve the issue.

    Any help would be highly appreciated!

    Thanks!
    Chris
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    The problem is that "virus==none" is the same as "vpu==mock".

    Comment

    • chammer
      Junior Member
      • Feb 2014
      • 5

      #3
      Thanks a lot!
      With respect to the most important comparisons, which is VPU "wt" vs. "stop", leaving out the non-infected samples works.
      But if I want to compare non-infected vs. infected samples, how could this be resolved? Using an additional dummy variable?
      How would that look like?

      Thanks!
      Chris

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        My guess (since I haven't a clue about the background to your experiment) is that you want vpu="Wt" for the mock infected samples. The combination of vpu and virus would then be distinct.

        *Edit*: Alternatively, set the virus="none" samples to what they're mock infected with (e.g., CH293), which I suspect will be clearer.

        Comment

        • chammer
          Junior Member
          • Feb 2014
          • 5

          #5
          Perfect, thanks a lot for your help, Devon!

          Best,
          Chris

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM
          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          12 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          48 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          107 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          125 views
          0 reactions
          Last Post SEQadmin2  
          Working...