Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tec
    Member
    • Apr 2008
    • 14

    duplicate reads in ChIPSeq

    Hello community,

    we have a problem concernig a illumina sequenced ChIPSeq experiment.
    After mapping and viewing the reads in the UCSC GB surprisedly 99% of the reads map to some unique locations. The corresonding reads share the same start and end coordinate and there are no additional cluster of duplication surrounding a location in terms of the origional fragment lenght.

    Does anyone have an idea? I would very much appreciate your assistance

    tec
  • simonandrews
    Simon Andrews
    • May 2009
    • 870

    #2
    I'm not really clear on what you're saying here. Do you find that 99% of your reads are the same sequence, with exactly the same start and end positions? If that's the case I'd suspect that you may have just ended up sequencing a primer rather than your library. Sometimes these primer sequences can map to a reference genome and give a false impression that you're seeing a real genomic sequence.

    Alternatively are you saying that you have many clusters (if so, how many?), but that in each one you see just a single read duplicated many times, with no other overlapping reads? In this case I'd suspect a problem with your library preparation - probably in one of the PCR steps. This is assuming that your library was prepared using random fragmentation (sonnication or similar). If your library was generated by restriction digestion then this is what you'd expect to see.

    Have you checked the mapping efficiency of your sequence (ie what proportion of clusters were able to be mapped to your reference). This might give a clue as to what's gone wrong.

    Comment

    • tec
      Member
      • Apr 2008
      • 14

      #3
      duplicate reads in ChIPSeq

      Hello simonandrews,

      -> Alternatively are you saying that you have many clusters (if so, how many?), but that in each one you see just a single read duplicated many times, with no other overlapping reads?

      Thats exactly what i see. I work with the human genome and can detect at least clusters on every chromosome. Using seqmap for mapping of ~ 5 mil single reads it outputs ~ 15.000 unique locations of single reads - all other fall in this locations (duplicates). The mapping efficiency is ~ 65% as expected.
      (mapping with eland gives the same proportion)

      The library was prepared using random fragmentation (sonication) and the initial fragment length is ~ 200 - 400 bp.

      I have no idea what's gone wrong. What could happend during the library preparation?

      Thanks! tec

      Comment

      • simonandrews
        Simon Andrews
        • May 2009
        • 870

        #4
        My immediate thought would be that you could have had a step in your library prep where you lost virtually all of your input material, and that a subsequent PCR step dramatically amplified what was left and produced a large number of duplicated reads.

        Comment

        • tec
          Member
          • Apr 2008
          • 14

          #5
          ok, but how this could happend??? (..a virtually loss?)

          The library was prepared using the standard illumina protocol and kit.
          We sequencend another ChIPSeq experiment and there was no such problem.

          Thanks! tec

          Comment

          • tec
            Member
            • Apr 2008
            • 14

            #6
            duplicate reads in ChIPSeq !?

            Hello all,

            the problem with duplicate reads still keeps me busy..
            Therefore we performed a Topo cloning resequencing check of the library.
            Surprisingly, over 75% of the clones were unique - which doesn't correlate with the sequencing run!!!

            Does anyone have an idea???

            Thanks! tec

            Comment

            • dvh
              Member
              • Jul 2008
              • 35

              #7
              Thats just a sampling issue.

              Say there are only 1000 unique molecules in the library:

              If you topo/sanger sequence x100, only a few will look like duplicates.

              But if you nex-gen sequence 10,000 most will look like duplicates.

              Make another library with more DNA input, less PCR...

              Comment

              • tec
                Member
                • Apr 2008
                • 14

                #8
                Originally posted by dvh View Post
                Thats just a sampling issue.

                Say there are only 1000 unique molecules in the library:

                If you topo/sanger sequence x100, only a few will look like duplicates.

                But if you nex-gen sequence 10,000 most will look like duplicates.

                Make another library with more DNA input, less PCR...
                i agree!
                But taken the fact into acount that another library showed exact the same distribution in the topo/sanger sequencing and the Illumina sequencing gave nice results - i am confused.
                Is it possible that during the preparation of the flow cell, e.g. cluster generation.., something went wrong which could led to that result???

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                48 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                107 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                125 views
                0 reactions
                Last Post SEQadmin2  
                Working...