Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Failed chip-seq experiments

    Hi All,

    I was wondering whether you could share your experiences and thoughts on the following scenario.

    Say you are analyzing someone elses data and as a first point of call you do some quality checking based on whichever guidelines are available (such as the Encode guidelines for Chip-seq for examples). During your quality assessment, you observe that the data is of extreme poor quality (lets continue with the chip-seq example and say you observe 85-95% PCR duplication etc). At this point you think, this experiment probably should be repeated but due to time restrictions, you continue with stringent parameters (including the PCR duplicates) and take the resulting overlapping peaks from 2 different methods (observed with IDR < 0.01 for 2 samples). You present the handful of peaks observed to the owner of the data, but they aren't happy with what they are seeing. They ask you to relax the parameters and flood the system with noise, violating any guidelines, just for the sake of having some data to base future experiments on.

    - Have you come across a scenario like this?
    - What are your thoughts on using a less stringent methods on an extremely poor data set?

    Thanks in advance for any responses

  • #2
    Are you sure they're PCR duplicates rather than real duplicates? If so, how do you know?

    Comment


    • #3
      You know because a library of a million 'effective reads' (after dedup) distributed in the largish genome of a mammal isn't useful, and no antibody is so good that you'll only get the enriched regions. And if they were real, you'd still have many starts in a small region, not a start here, one there, one over yonder...

      I basically stand my ground. People come to me because I have the experience in these data sets, and my advice is: find the error and repeat the experiment.

      Do not waste time with this unusable data set. Mostly they come around when I start repeating the old 'if your positive control is no different from your negative control you can draw no conclusions from this experiment' mantra.

      If they're desperate or dense enough not to grasp the above point, it's best to cut your losses and suggest they find somebody else to look at their data.

      Comment


      • #4
        Thanks ffinkernagel for sharing your experience.

        To answer your question, Brian Bushnell, the way that I assessed PCR duplicates is though the following:

        1) In FASTQC look at the duplication level tab, this will give a rough estimate.

        Result 90-95% duplication level.

        2) On the file containing aligned reads (usually bam file) calculate the fraction of non-redundant reads (NRF in Encode guidelines) by calculating the number of unique genomic positions/all uniquely mapped read.

        Result: 12-17% of reads are non-redundant

        3) Sort your aligned reads (bam file) according to chromosomal location and perform samtools rmdup and picard MarkDuplicates (with option REMOVE_DUPLICATES=TRUE). Calculate the percentage of reads remaining from the original and assess if there are any difference between samtools and Picard.

        Result: between 95- 99% of reads removed

        4) Visualize your aligned reads in a genome viewer such as IGV. If the reads stack up on top of each other, with black spaces in between stacks, rather than diagonally overlapping reads, you have PCR duplicates
        Last edited by Anomilie; 05-27-2014, 06:42 PM.

        Comment


        • #5
          Anomilie,

          I do not get point 4 of your last post. Could you provide a visual for vertically stacked reads versus diagonally stacked reads from IGV? I am doing ATAC and I get lots of duplicate reads 50-90% despite all efforts to optimise cell numbers and PCR cycles for library amplification.

          Thanks you.

          Comment


          • #6
            Point 4 is the most important one, or in my opinion, the only one really able to indicate a difference between PCR and real duplicates. If the coverage is randomly distributed (in terms of start coordinates), then duplication events are implied to be real duplicates that naturally result from high sequencing depth. However, if you have a few stacks in which all the reads line up perfectly with the same start location, and few or no reads starting between the stacks, this implies PCR duplicates.

            Comment


            • #7
              The percentage of duplicated reads is not meaningful without knowing the total number. It is possible to have an excellent ChIP to yield only a few million unique reads, if you sequence this on one lane most reads will be duplicates. And if you have one good replicate and one that failed then the IDR will only give you the common false positives. Just call peaks on unique starts and look at the wiggle in a genome browser.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X