Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 collinearity issue

    Hi everybody,

    I have been trying to use DESeq2 for analyzing RNA-seq data, and ran into a problem.

    The conditions table looks like this:

    Code:
    sample	donor	virus	vpu
    DonorA1_01	A1	none	mock
    DonorA1_02	A1	CH293	wt
    DonorA1_03	A1	CH293	stop
    DonorA1_04	A1	CH293	R50K
    DonorA1_05	A1	CH293	teth_count
    DonorA1_06	A1	CH077	wt
    DonorA1_07	A1	CH077	stop
    DonorA1_08	A1	CH077	R50K
    DonorA1_09	A1	CH077	teth_count
    DonorA1_10	A1	STC01	wt
    DonorA1_11	A1	STC01	stop
    DonorA1_12	A1	STC01	R50K
    DonorA1_13	A1	STC01	teth_count
    DonorX_01	X	none	mock
    DonorX_02	X	CH293	wt
    DonorX_03	X	CH293	stop
    DonorX_04	X	CH293	R50K
    DonorX_05	X	CH293	teth_count
    DonorX_06	X	CH077	wt
    DonorX_07	X	CH077	stop
    DonorX_08	X	CH077	R50K
    DonorX_09	X	CH077	teth_count
    DonorX_10	X	STC01	wt
    DonorX_11	X	STC01	stop
    DonorX_12	X	STC01	R50K
    DonorX_13	X	STC01	teth_count
    DonorY_01	Y	none	mock
    DonorY_02	Y	CH293	wt
    DonorY_03	Y	CH293	stop
    DonorY_04	Y	CH293	R50K
    DonorY_05	Y	CH293	teth_count
    DonorY_06	Y	CH077	wt
    DonorY_07	Y	CH077	stop
    DonorY_08	Y	CH077	R50K
    DonorY_09	Y	CH077	teth_count
    DonorY_10	Y	STC01	wt
    DonorY_11	Y	STC01	stop
    DonorY_12	Y	STC01	R50K
    DonorY_13	Y	STC01	teth_count
    DonorZ_01	Z	none	mock
    DonorZ_02	Z	CH293	wt
    DonorZ_03	Z	CH293	stop
    DonorZ_04	Z	CH293	R50K
    DonorZ_05	Z	CH293	teth_count
    DonorZ_06	Z	CH077	wt
    DonorZ_07	Z	CH077	stop
    DonorZ_08	Z	CH077	R50K
    DonorZ_09	Z	CH077	teth_count
    DonorZ_10	Z	STC01	wt
    DonorZ_11	Z	STC01	stop
    DonorZ_12	Z	STC01	R50K
    DonorZ_13	Z	STC01	teth_count
    When I specify the model for differential expression analysis as dds <- DESeqDataSetFromTximport(txi, samples, ~vpu+donor+virus), I get an error message:

    Error in checkFullRank(modelMatrix) :
    the model matrix is not full rank, so the model cannot be fit as specified.
    One or more variables or interaction terms in the design formula are linear
    combinations of the others and must be removed.

    Same problem if the model is “vpu + virus” only.

    I understand that this is because of collinearity among the variables but I am not sure how to resolve the issue.

    Any help would be highly appreciated!

    Thanks!
    Chris

  • #2
    The problem is that "virus==none" is the same as "vpu==mock".

    Comment


    • #3
      Thanks a lot!
      With respect to the most important comparisons, which is VPU "wt" vs. "stop", leaving out the non-infected samples works.
      But if I want to compare non-infected vs. infected samples, how could this be resolved? Using an additional dummy variable?
      How would that look like?

      Thanks!
      Chris

      Comment


      • #4
        My guess (since I haven't a clue about the background to your experiment) is that you want vpu="Wt" for the mock infected samples. The combination of vpu and virus would then be distinct.

        *Edit*: Alternatively, set the virus="none" samples to what they're mock infected with (e.g., CH293), which I suspect will be clearer.

        Comment


        • #5
          Perfect, thanks a lot for your help, Devon!

          Best,
          Chris

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X