Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Batch effect help

    I am attempting to analyse RNA-seq data using either edgeR or DESeq2 to determine differentially expressed genes. The experimental conditions are as follows:

    Mice were either fed a high fat diet or a low fat diet.
    Each cage had 3 mice in it, but each group has a total of 6 mice. (i.e., for the low fat diet, there are 6 mice, which have come from 2 different cages).

    Therefore, I am attempting to correct for any batch effects due to the different cages the mice were raised in, this is where I am having a problem.

    Essentially, I have the following: ( I have attached an image of the table too)

    sample treatment cage
    1 low 1
    2 low 1
    3 low 1
    4 low 2
    5 low 2
    6 low 2
    7 high 3
    8 high 3
    9 high 3
    10 high 4
    11 high 4
    12 high 4

    I have attempted to include this fact into a design matrix but I keep getting the same error of incomplete rank. I know the experimental design is not ideal, but it is what we have to work with.

    If anyone has any ways of correcting for the different cages, that would be great.

    Thanks for any help,
    E
    Attached Files

  • #2
    Unless the cage mates are litter mates it would be unlikely for there to be a cage effect.

    That aside, try this:

    treatment cage
    low 1
    low 1
    low 1
    low 2
    low 2
    low 2
    high 1
    high 1
    high 1
    high 2
    high 2
    high 2

    The design is then ~treatment*cage.
    Last edited by dpryan; 10-27-2015, 06:21 AM. Reason: Had an extra 3 sample!

    Comment


    • #3
      thanks for the reply.
      I didn't think there would be a cage effect, but the count results cluster on a PCA plot depending on cages more so than treatment, so thinking that there is potentially a cage effect.

      Would what you suggest not say that samples 1-3 and 7-9 are from the same cage and the other 6 from the same cage?

      I am sorry, not very good with the stats, much better with the other aspects of bioinformatics.

      Thanks

      Comment


      • #4
        Yes, it says they're from the same cage, but the interaction term then allows them to have different actual effects. This is just a trick to avoid the rank deficiency issue and you'll see it used whenever batches occur completely within experimental groups.

        Comment


        • #5
          I should probably expand upon my last reply. What that grouping and the design effectively do is fit an effect for each Treatment_Cage pairing, of which there are 4. Since the Cage is just a nuisance variable for you I then formulated things in a way that you could just get the "cage-corrected treatment effect" without having to also deal with contrasts.

          Comment


          • #6
            That sounds like exactly what I need in that case. Thanks very much for your help, I shall include that into the analysis when I next get back to the office.

            Thanks

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X