Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Source of duplicate reads and possible ways of reducing:

    Hi,

    I was pondering some possible sources of duplicate reads as well as ways of reducing them if you were to expand the raw number of sequenced reads. I imagine it's related to the ePCR phase. It'd be nice to reduce the number of duplicates significantly. Any ideas? Thanks!

    J

  • #2
    remove pcr entirely!

    Comment


    • #3
      that's funny.

      Comment


      • #4
        it's more likely in your library construction, and not in ePCR. While you can get errors during ePCR, it doesnt make sense that you get over amplification. ePCR takes place inside a microreactor with 1 molecule....that would not explain over representation.

        What library type and how much starting material?

        Comment


        • #5
          I think pzumbo meant remove any pre-ePCR PCR steps. This should not be necessary, although reducing those PCR steps would probably help. Also look for any "bottlenecks" in the library prep. If the total number of input molecules drops drastically at any step, then you are also drastically reducing the complexity of your library. That is the thing about the pre-ePCR PCR steps--they make everything look okay, but once you sequence, you discover your library is terribly bottomed out...

          --
          Phillip

          Comment


          • #6
            'traditional' PCR is known to be a cause of duplicate sequences,

            FRT-seq: amplification-free, strand-specific transcriptome sequencing (Lira Mamanova, Robert M Andrews, Keith D James, Elizabeth M Sheridan, Peter D Ellis, Cordelia F Langford, Tobias W B Ost, John E Collins & Daniel J Turner)

            ePCR is *thought* to be able to remove biases typically associated with traditional PCR.
            i'm sure, however, that there is a great divide between practice and theory. In fact, Prüfer et al report

            "As previously described, emulsion PCR can produce a substantial number of clusters of identical fragments if a low concentration of DNA is used. We identify these emulsion PCR duplicates using the following algorithm: reads are sorted into buckets according to the first six positive flow values. A new cluster containing two reads from a bucket is formed if these reads have at least 89% sequence similarity over the full length of the shorter read including the 454 adapter sequence. A read is added to an existing cluster if the same condition is met by any one of the sequences in the cluster (single-linkage clustering). The algorithm identified 736,426 of a total of 2,796,944 reads, or 26%, to be duplicates of other sequences." ("Computational challenges in the analysis of ancient DNA")

            as said, remove PCR, entirely!

            Comment


            • #7
              The so-called duplicate reads in your Prüfer et al excerpt might not result from emPCR at all. They might just be repetitive DNA. Short repetitive elements, like SINEs compose a significant fraction of mammalian genomes.

              That said, some of the duplicates probably did come from multiple beads in a single microreactor or background amplicon contamination of their lab.

              I think most regard third generation sequencers to be defined by their ability to sequence single molecules. So, we are heading in the no PCR direction.

              But for the moment, except for the few locations where single molecule sequencers are placed, limiting the number of cycles of pre-em/bridge PCR will suffice for most.

              And there are plenty of biases likely to derive from the new single molecule technologies. So there is a "better the devil you know" argument to be made for using PCR judiciously.

              --
              Phillip

              Comment


              • #8
                yes. the theoretical is fun to quote, but I live in the real world.
                and in the real world , multiple templates in a microreactor result in a weak/mixed signal bead that is easily thrown out by software, and in no way makes repetitive elements.

                Comment


                • #9
                  So, previously published results to the effect that, "emulsion PCR can produce a substantial number of clusters of identical fragments", are a lie? Interesting -- perhaps there are multiple "real" worlds, then?

                  Comment


                  • #10
                    also, in the pretend world whereby emulsion PCR produces duplicate fragments, it is thought to be a result, not of multiple templates in a single microreactor, but of emulsion inclusions containing multiple beads.

                    Comment


                    • #11
                      Originally posted by paul z View Post
                      also, in the pretend world whereby emulsion PCR produces duplicate fragments, it is thought to be a result, not of multiple templates in a single microreactor, but of emulsion inclusions containing multiple beads.
                      How many 1um beads can you get in a reactor that effciently amplify with limited primers/reagents? I would guess that results in far fewer duplicates than the bulk amplification.

                      See this paper if you haven't already...http://seqanswers.com/forums/showthread.php?t=1370

                      Comment


                      • #12
                        Originally posted by snetmcom View Post
                        yes. the theoretical is fun to quote, but I live in the real world.
                        and in the real world , multiple templates in a microreactor result in a weak/mixed signal bead that is easily thrown out by software, and in no way makes repetitive elements.
                        Trolling now snetmcom? Yes, that would we one way you could go...

                        Here is what I wrote:

                        That said, some of the duplicates probably did come from multiple beads in a single microreactor or background amplicon contamination of their lab.
                        Nothing about multiple templates in a single microreactor.
                        Multiple beads in a single microreactor would create twin beads, identically templated.

                        --
                        Phillip

                        Comment


                        • #13
                          use CD-hit program to remove duplicates

                          Comment


                          • #14
                            Originally posted by ECO View Post
                            How many 1um beads can you get in a reactor that effciently amplify with limited primers/reagents? I would guess that results in far fewer duplicates than the bulk amplification.
                            It would depend on the diameter of the microreactor, obviously. Not much QC done upon creation of microreactors -- it is a numbers game. Some of the microreactors will be too small to template a single bead. Others will be large enough to hold several. Alternatively, microreactors could coalesce at some point during thermalcycling. Most adjacent microreactors will have no templates. So such a coalescence will frequently result in identically templated beads.

                            Another way to get a duplicate read, likely well understood by anyone running a sequencing core to produce 3730XL Sanger reads, is by signal bleed. I have heard of this occurring in 454 runs. You could look for these because SOLiD (and 454) reads have coordinates.

                            Originally posted by ECO View Post
                            See this paper if you haven't already...http://seqanswers.com/forums/showthread.php?t=1370
                            Okay, but 454 is already there. Most of their library construction protocols use no pre-emPCR amplification steps. Certainly the Neanderthal genome paper Pzumbo is invoking does not. And, they see duplicate reads.

                            Also, at the risk of confusing the issue, if you are talking RNA-seq using the Ambion Whole Transcriptome kit, there is another source of "duplicate" reads. That is the RNAase III digestion used to fragment the DNA. It will have strongly biased cleavage sites in most RNAs. But you would certainly not want to remove these reads if you were doing DGE -- it would throw your count off!

                            --
                            Phillip

                            Comment


                            • #15
                              This is one of my libraries in question, maybe I can shed a little light. This was an Exon Capture library, my first. I followed the protocol, which asks for 12 cycles during the nick translation step. Regular fragment libraries ask for 2-10 depending on the amount of starting DNA, which in this case was 3ug. So, I think that might be a little overkill. You only need .5ug going into hybridization, we ended up with closer to 2ug after nick translation. There is also a 12 cycle post-hyb amplification. Since this was my first go at the exon-cap I did it by the book but for the next batch of samples we are going to reduce the post-hyb amplification for sure, and it will be interesting to see how much this reduces duplication. We ended up with between 10 and 20 ng/ul for each sample, which seems really, really high to me.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X