Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq - High Count Variablity across Samples

    Hello,

    I am performing a comparison of gene expression between two groups with ten (biological replicates)samples in each group with DESeq. Unfortunately, the control group has significantly less reads than the experimental group for most of the samples involved, and the sizeFactors range from .54-1.47 across all samples. When performing a variance stabilizing transformation on the normalized data, and grouping the samples in a distance matrix(heatmap), the samples largely group based on total reads instead of the treatment. I am unsure if the normalization method employed by DESeq can handle this wide variation in reads across samples and across groups? Does anyone have suggestions for handling the normalization in this situation or for assessing the effect of treatment overall? Thanks for any suggestions, I'd be happy to provide more details.
    -David

  • #2
    Size factors of your range are quite common, and DESeq's main functionality, i.e., testing for differential expression, copes well with it. Hence, just go ahead and run your tests.

    The VST needs to resort to a certain approximation (details on request) and hence the heatmap might become misleading if the size factors are different. This does not affect the actual test functions because they do not use the VST.

    Comment


    • #3
      Thanks for the info. Do you know a convenient way to assess global changes in gene expression across samples to group samples in this case? In the vignette for example, the blinded dispersion estimates followed by the vst and distance matrix allowed an unbiased grouping of similar samples(given similar sizeFactors). What if one were to measure the covariance of each sample versus every other, using normalized ratios of individual gene counts to the average gene counts across all samples? Would this allow some sort of grouping between samples with positive vs. negative covariance? Or would you run into the same problem of high variance genes skewing the comparison, if so, could one group the genes according to expression or variance and try this? Thanks again, I am currently trying to generate a list of differentially expressed genes which I am confident are related to the treatment and not high inter-animal variability. I have checked some with qpcr with mixed results so far...
      -David

      Comment


      • #4
        I am not quite sure I understand your problem. You want to know which genes changed due to treatment and want to guard against within-group variability. This is the default use case for DESeq, and you will get a statistically sound result if you follow the standard work-flow (which does not use the VST).

        Hence, why again do you want to use the VST? You will need to explain your setup in more detail.

        BTW, checking by qPCR is only very rarely useful. It helps to avoid technical noise (if you think that qPCR is more precise than RNA-Seq) but as you main worry is sample-to-sample variation due to biological causes (i.e., actual expression differences rather than measurement errors), measuring the same samples with another technique will not tell you anything new.

        Comment


        • #5
          DESeq - High Count Variablity across Samples

          Dear Simon,
          I am using DESeq in the analysis of RNAseq data, but I'm still doing experiments with the package, to learn how to use it properly for my particular of data... In this analysis I have two 'control' (replicate) samples and only one 'test' sample (and I will not have replicates for this condition unfortunately). My goal now is just to see whether or not I can use the two control samples as replicates, since the 'controlled' conditions in which the plant material was collected were slightly different.

          Regarding your previous post I'm not sure if I understood well.

          Originally posted by Simon Anders View Post
          The VST needs to resort to a certain approximation (details on request) and hence the heatmap might become misleading if the size factors are different. This does not affect the actual test functions because they do not use the VST.
          So does this mean that if there is some (high) variation between size factors, we may not trust on the results retrieved after VST?
          I am facing "similar" results to what was reported in the DESeq vignette, although in my case the number of replicates is reduced.
          Specifically if I build heatmaps (for count data and sample-to-sample distances) using VST data, my two replicates for 'control' condition cluster together. But when I use untransformed counts one of the 'control' samples clusters with the 'test' sample.

          What intrigues me now is the fact that the size factors are

          test:1.8420157
          control1:0.8258893 (control1 is the one that clusters differently)
          control2:0.6850067

          So my question is this: can I just "trust" on these results and accept my two controls as replicates, or this is a case when "heatmaps might become misleading"...?

          thank you in advance

          Pedro
          Last edited by pbarros; 12-06-2012, 09:26 AM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 08:47 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X