Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dav1dmartin
    Junior Member
    • Aug 2010
    • 2

    DESeq - High Count Variablity across Samples

    Hello,

    I am performing a comparison of gene expression between two groups with ten (biological replicates)samples in each group with DESeq. Unfortunately, the control group has significantly less reads than the experimental group for most of the samples involved, and the sizeFactors range from .54-1.47 across all samples. When performing a variance stabilizing transformation on the normalized data, and grouping the samples in a distance matrix(heatmap), the samples largely group based on total reads instead of the treatment. I am unsure if the normalization method employed by DESeq can handle this wide variation in reads across samples and across groups? Does anyone have suggestions for handling the normalization in this situation or for assessing the effect of treatment overall? Thanks for any suggestions, I'd be happy to provide more details.
    -David
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #2
    Size factors of your range are quite common, and DESeq's main functionality, i.e., testing for differential expression, copes well with it. Hence, just go ahead and run your tests.

    The VST needs to resort to a certain approximation (details on request) and hence the heatmap might become misleading if the size factors are different. This does not affect the actual test functions because they do not use the VST.

    Comment

    • dav1dmartin
      Junior Member
      • Aug 2010
      • 2

      #3
      Thanks for the info. Do you know a convenient way to assess global changes in gene expression across samples to group samples in this case? In the vignette for example, the blinded dispersion estimates followed by the vst and distance matrix allowed an unbiased grouping of similar samples(given similar sizeFactors). What if one were to measure the covariance of each sample versus every other, using normalized ratios of individual gene counts to the average gene counts across all samples? Would this allow some sort of grouping between samples with positive vs. negative covariance? Or would you run into the same problem of high variance genes skewing the comparison, if so, could one group the genes according to expression or variance and try this? Thanks again, I am currently trying to generate a list of differentially expressed genes which I am confident are related to the treatment and not high inter-animal variability. I have checked some with qpcr with mixed results so far...
      -David

      Comment

      • Simon Anders
        Senior Member
        • Feb 2010
        • 995

        #4
        I am not quite sure I understand your problem. You want to know which genes changed due to treatment and want to guard against within-group variability. This is the default use case for DESeq, and you will get a statistically sound result if you follow the standard work-flow (which does not use the VST).

        Hence, why again do you want to use the VST? You will need to explain your setup in more detail.

        BTW, checking by qPCR is only very rarely useful. It helps to avoid technical noise (if you think that qPCR is more precise than RNA-Seq) but as you main worry is sample-to-sample variation due to biological causes (i.e., actual expression differences rather than measurement errors), measuring the same samples with another technique will not tell you anything new.

        Comment

        • pbarros
          Junior Member
          • Jul 2012
          • 7

          #5
          DESeq - High Count Variablity across Samples

          Dear Simon,
          I am using DESeq in the analysis of RNAseq data, but I'm still doing experiments with the package, to learn how to use it properly for my particular of data... In this analysis I have two 'control' (replicate) samples and only one 'test' sample (and I will not have replicates for this condition unfortunately). My goal now is just to see whether or not I can use the two control samples as replicates, since the 'controlled' conditions in which the plant material was collected were slightly different.

          Regarding your previous post I'm not sure if I understood well.

          Originally posted by Simon Anders View Post
          The VST needs to resort to a certain approximation (details on request) and hence the heatmap might become misleading if the size factors are different. This does not affect the actual test functions because they do not use the VST.
          So does this mean that if there is some (high) variation between size factors, we may not trust on the results retrieved after VST?
          I am facing "similar" results to what was reported in the DESeq vignette, although in my case the number of replicates is reduced.
          Specifically if I build heatmaps (for count data and sample-to-sample distances) using VST data, my two replicates for 'control' condition cluster together. But when I use untransformed counts one of the 'control' samples clusters with the 'test' sample.

          What intrigues me now is the fact that the size factors are

          test:1.8420157
          control1:0.8258893 (control1 is the one that clusters differently)
          control2:0.6850067

          So my question is this: can I just "trust" on these results and accept my two controls as replicates, or this is a case when "heatmaps might become misleading"...?

          thank you in advance

          Pedro
          Last edited by pbarros; 12-06-2012, 09:26 AM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Pathogen Surveillance with Advanced Genomic Tools
            by seqadmin




            The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
            03-24-2025, 11:48 AM
          • seqadmin
            New Genomics Tools and Methods Shared at AGBT 2025
            by seqadmin


            This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

            The Headliner
            The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
            03-03-2025, 01:39 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-20-2025, 05:03 AM
          0 responses
          49 views
          0 reactions
          Last Post seqadmin  
          Started by seqadmin, 03-19-2025, 07:27 AM
          0 responses
          57 views
          0 reactions
          Last Post seqadmin  
          Started by seqadmin, 03-18-2025, 12:50 PM
          0 responses
          50 views
          0 reactions
          Last Post seqadmin  
          Started by seqadmin, 03-03-2025, 01:15 PM
          0 responses
          201 views
          0 reactions
          Last Post seqadmin  
          Working...