Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 normalization

    Hi,
    I have data from single cell RNASeq experiments that I am doing differential expression analysis on using DeSeq2. This dataset is not typical-the variability between replicates is much higher, for example and there are libraries with skewed distribution where relatively fewer number of genes may contribute to a large proportion of reads. Things begin to get complicated when I do data normalization, I get size factors from 14- 0.03 although the total number of raw counts does not vary by more than 2.2 fold between individual libraries. Is there a way to fix the normalization? I guess at some point I have to exclude the outlier libraries, but first I wish to try and improve the normalization before throwing out potentially useful data... My gut feeling is that things will be better if I can fix the normalization because DeSeq has worked for other single cell experiments where the size factors were more along the expected lines....Have tried to use the FPKM normalization and method in cuffdiff but it is much worser probably because the libraries are very 3' biased...Thanks for your thoughts and inputs on this.

  • #2
    You can specify your own size factors with the function sizeFactors.
    For example, you could just calculate the size factors by dividing the total number of reads for each sample by the number of reads for the sample with the lowest total number of reads.

    I was under the impression, though, that the method used by DESeq2 was more robust in precisely this case, where the distribution is skewed in favour of a small number of genes.
    estimateSizeFactors uses the median ratio method. When I read the [i]Anders et al.[i]'s article though, the formula used is not so clear to me, but that is probably due to my limited knowledge of mathematics.

    Are you sure you just didn't just make a mistake while generating the count files?
    A range from 14 to 0.03 just sounds ridiculous.

    Comment


    • #3
      I'd recommend looking at sample-sample scatterplots, and you can examine the size factors ratios for each pair. Here's an example of some pseudocode for how the size factors are calculated in a robust way in DESeq/DESeq2, specifically more robust than the total number of reads:

      Code:
      library(DESeq2)
      x = c(1:50,101:110)
      y = c(2 * 1:60)
      m = cbind(x,y)
      plot(m)
      sf = estimateSizeFactorsForMatrix(m)
      abline(0, sf[2]/sf[1], col="red")
      cs = colSums(m)
      abline(0, cs[2]/cs[1], col="blue")

      Comment


      • #4
        Normally single-cell experiments have spike-ins that are intended for normalization. Does your dataset have them?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X