Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Kleido
    Junior Member
    • Jan 2014
    • 4

    What "high duplication rate" means

    Hi all,
    Could someone define "high duplication rate" in ChIP seq data analysis?
    Thks
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Am I correct in guessing that you ran FastQC on your reads and it showed a really high duplication rate? If so, I would typically consider that normal unless whatever you're pulling down is rather non-specific in where it's located/binds.

    Comment

    • Kleido
      Junior Member
      • Jan 2014
      • 4

      #3
      Actually the bioinformatician who ran the analysis said that, so I'm just trying to understand since he couldn't explain it to me

      Comment

      • SNPsaurus
        Registered Vendor
        • May 2013
        • 525

        #4
        That probably refers to PCR duplicates... that is, even though you may have 90 reads at a location, they are likely to be 90 copies of the same original DNA fragment and so should not be considered independent binding events of your protein to DNA. This happens when low-complexity libraries are heavily amplified, which is common for ChIP-Seq.

        You can determine if duplicates are a problem because you would expect that reads start at multiple locations across a genomic region where your protein was cross-linked, because the DNA was sheared randomly. If the reads start at only a few locations and there are multiple reads at each of the starts, then those are going to be duplicates.

        As dpryan mentioned, with ChIP-Seq perfectly good data can look highly duplicated, since there might be very high coverage of reads in a constrained space. So it takes some actual examination of how the coverage looks to tell if it is just stacked high or duplicated.
        Last edited by SNPsaurus; 01-30-2014, 11:23 PM.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment

        • dpryan
          Devon Ryan
          • Jul 2011
          • 3478

          #5
          Just to add to SNPsaurus' response, keep in mind that gauging duplication rate is difficult if you have single-end reads. Then, the maximum coverage of a single position after removing what appear to be PCR duplicates is twice whatever your read-length is. Of course, for Chip-seq, this is unrealistic, so unless you have paired-end reads you're probably better off ignoring PCR duplicates.

          Comment

          • Kleido
            Junior Member
            • Jan 2014
            • 4

            #6
            Thank you dpryan and SNPsaurus, that's very helpful.

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            38 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            100 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            121 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            114 views
            0 reactions
            Last Post SEQadmin2  
            Working...