Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DiffBind read counts normalisations

    Is there a way to turn off the normalisation by the library size (either full or effective) in DiffBind?

    The problem is that I the bam files I want to use are already normalised in a specific way which I don't want to change. I am looking for the way to turn off the normalisation in DiffBind. I found the option SCORE which can be DBA_RAW_READS, but it doesn't seem to affect subsequent differential analysis by any of the methods available. Does anyone know the way around this problem?

  • #2
    We ended up just running the underlying DESeq2/edgeR with manually set sizeFactors/normFactors. Seemed easier than going through DiffBind and not being 100% sure if normalization was actually off...

    Comment


    • #3
      There isn't a way to prevent the underlying differential analysis method from normalizing the read counts. The score parameter only changes the scoring method used for plots of global (non-analyzed) data. As fanli suggests, the best way is to use an underlying method (edgerR/DESeq2) directly if you want to do something more sophisticated.

      Comment


      • #4
        On a related note, is there any preference for how to incorporate information from control (e.g. IgG) libraries? From what I understand, DiffBind subtracts control reads in each binding interval. But is it equally valid to use the overall number of control reads in all binding intervals as a second normalization factor?

        Comment


        • #5
          This has generated a lot of discussion recently. Subtracting the reads, as DiffBind does by default, is a bit of a kludge that potentially violates the assumptions of the statistics underlying the methods for estimating dispersion for the negative binomial. The reads could be incorporated into the normalization using offsets -- essentially treating them like a copy number variation (cf the ABCD-DNA paper of Robinson et al, Genome Research 2012). This is mostly important for cases where you have different control for different conditions.

          Currently our preferred method is to use the control reads to generate blacklists of regions with anomalous coverage, then remove all reads in these regions in all samples. Then do peak calling with the blacklisted read files, including the control tracks. For the differential analysis, we then basically ignore the control reads, as they have already been used to a) remove regions where false positives/negative are likely and b) identify regions of interest. We use the Bioconductor package GreyListChIP to generate the blacklists from control reads prior to filtering.

          Here's a link to a recent post on this from another perspective: https://support.bioconductor.org/p/82099/

          Comment


          • #6
            I was actually wondering if there is a difference between running DESeq2 on its own or inside of DiffBind. Based on my experiments using the same data, the results of these two approaches are very different. From reading in the internet, I have got an idea that there is actually the difference: in DESeq2 statistical analysis it assumes that most of the genes don't change and in DiffBind it is assumed that everything outside of peaks don't change but peaks do. I might be completely wrong, please, correct me, if I am.

            Comment


            • #7
              Is there a way then to specify sizeFactors manually?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              56 views
              0 likes
              Last Post seqadmin  
              Working...
              X