Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Where does all the DNA go?

    The standard Roche protocol for shotgun library construction asks for 10 ug of input DNA to yield a few million templated beads for sequencing. Rule of thumb: 1 ug of 1 kb double stranded DNA is 1 trillion (1E+12) molecules[1].

    Get that? 10 trillion molecules to start with so that I can sequence less than 10 million of them. What happened to the other 9,999,990,000,000 molecules?

    Not really fair to the Roche protocol? Usually one ends up with enough library to sequence more than 10 million bead? Plug your own numbers in. My guess is the molecular yield from this technique will be no better than 0.1%.

    I do not mean to single out Roche here, I think protocols for all instrument systems are looking at fractions of a percent molecular yield. As long as one has plenty of DNA, maybe it does not matter. But sometimes DNA (or RNA) is limiting, no?

    And what if there is bias in the loss process? Most of us sweat adding a few more cycles of PCR into our library prep procedure because we know PCR can bias our results. But I have never met a single person who worried that the 99.9% (add as many nines as you care to) of DNA molecules being lost during library construction might have a sequence-composition biased component to their loss.

    If I get any response (other than a blank stare) from those designing these protocols about the molecular yield, it usually that the yields in each step are not 100%. The implication, I presume, is that these yield losses are multiplicative. Fair enough, how many steps with 50% yield do I need to lose 99.9% of my DNA? That would be ten steps.

    I do not think most library construction steps have yields as low as 50%. Instead, I think it more likely that:

    (A) A few steps have extremely low molecular yields and

    (B) The protocols we are using rely on our being able to visualize the molecules and their size distribution for purposes of quality control.

    I am going to ignore (B) for the purposes of the rest of this post.

    As for (A), most of the methodologies I see being developed for low amounts of starting material are focused on amplification. Might be worth taking a look at where DNA (or RNA) is being lost and tightening that up. A couple of places to look would be % of ends successfully repaired after mechanical fragmentation of DNA and chemical DNA damage. The latter may be a non-issue or not. But think about it, how often do you worry about the redox-state of your DNA? How about UV damage from the sunlight streaming in through your lab windows?

    Might 90% of the molecules in a typical DNA prep be impossible to replicate without repair beyond the end repair we normally deploy? Could that number be 99% or 99.9%? Real question. I would like to know.

    --
    Phillip

    (Notes)
    1. Okay, yeah, using some standard numbers, like 650 MW for a base pair, the number is really 926 billion molecules, not 1 trillion. But nothing I discuss here would be sensitive to less than 10% tolerances, so the difference is safe to ignore...

  • #2
    genome?

    If that's the methodology, my question is

    So.....what genome ends up getting sequenced?

    Comment


    • #3
      Originally posted by Joann View Post
      If that's the methodology, my question is

      So.....what genome ends up getting sequenced?
      Not sure I follow you.

      Comment


      • #4
        Check out these papers: each would suggest that the "I need enough DNA to visualize" angle is why so much input DNA; you apparently can get by with very little DNA if you have a better way to track it & quantitate it.

        Anyone here routinely using these protocols or similar ones? How do they really behave?

        Background Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. Results We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. Conclusion The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

        BMC Genomics. 2009 Mar 19;10:116.
        Digital PCR provides sensitive and absolute calibration for high throughput sequencing.
        White RA 3rd, Blainey PC, Fan HC, Quake SR.

        Department of Bioengineering at Stanford University and Howard Hughes Medical Institute, Stanford, CA 94305, USA. [email protected]
        BACKGROUND: Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. RESULTS: We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. CONCLUSION: The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

        PMID: 19298667 [PubMed - indexed for MEDLINE]


        Nucleic Acids Res. 2008 Jan;36(1):e5. Epub 2007 Dec 15.
        From micrograms to picograms: quantitative PCR reduces the material demands of high-throughput sequencing.
        Meyer M, Briggs AW, Maricic T, Höber B, Höffner B, Krause J, Weihmann A, Pääbo S, Hofreiter M.

        Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany. [email protected]
        Current efforts to recover the Neandertal and mammoth genomes by 454 DNA sequencing demonstrate the sensitivity of this technology. However, routine 454 sequencing applications still require microgram quantities of initial material. This is due to a lack of effective methods for quantifying 454 sequencing libraries, necessitating expensive and labour-intensive procedures when sequencing ancient DNA and other poor DNA samples. Here we report a 454 sequencing library quantification method based on quantitative PCR that effectively eliminates these limitations. We estimated both the molecule numbers and the fragment size distributions in sequencing libraries derived from Neandertal DNA extracts, SAGE ditags and bonobo genomic DNA, obtaining optimal sequencing yields without performing any titration runs. Using this method, 454 sequencing can routinely be performed from as little as 50 pg of initial material without titration runs, thereby drastically reducing costs while increasing the scope of sample throughput and protocol development on the 454 platform. The method should also apply to Illumina/Solexa and ABI/SOLiD sequencing, and should therefore help to widen the accessibility of all three platforms.

        Comment


        • #5
          Yes, I de-emphasized this point in my original post, because it has received some attention and methods have been developed to address part of this particular issue. (Doing QC on the size distribution of a library you cannot see on a gel or lab chip would still be tricky.)

          Note, however that both papers appear to suffer from the same dismal molecular yields of "input DNA" to "library molecules".

          In the Meyer's paper, the Bonobo sample (table 2) starts with 500 ng and a mean fragment size of 500 bases. Using the "1 ug of 1kb DNA is about 1 trillion molecules" rule of thumb I suggested earlier -- that equals 1 trillion 500 base molecules (double stranded). Meyers succeeds in isolating 50,000 beads after enrichment from 1 trillion molecules he started with.

          Molecular yield: 5E+04/1E+12 = 5E-08
          that is, 0.000005%

          That yield is a 3x overestimate if you only count sequence-pass reads generated.

          Similarly, if you look at "additional file 2" in the White paper, the lowest input DNA amount used in a shotgun library is 0.7 ug of 550 bp mean size. Again over 1 trillion molecules to start with. This yielded 7E+05 to 1E+06 ssDNA library molecules (depending on quantitation method). That is an excellent molecular yield: 0.0001%. I still would like to know where most of the 99.9999% of the molecules went, though..

          But both papers show that trillions of library molecules are not necessary to get a good emulsion PCR. That is pretty well accepted these days.

          White et al. http://www.biomedcentral.com/1471-2164/10/116 do at least point in the direction of the 500 pound gorilla:

          It is natural to expect that library preparation protocols developed with the capacity to handle up to five micrograms of input are far from optimal with respect to minimizing loss from nanogram or picogram samples. A procedure optimized for trace samples with reduced reaction volumes and media quantities, possibly formatted in a microfluidic chip, has the potential to dramatically improve the recovery of library molecules, allowing preparation of sequencing libraries from quantities of sample comparable to that actually required for the sequencing run, e.g. close to or less than one picogram.
          --
          Phillip

          Comment


          • #6
            How many single/di/trinucleotides are being generated by the fragmentation process - invisible to visualization and percent DNA lost in the fragment sizing step. Philip I get your line of inquiry and I am wondering if single molecule sequencing improves on the abysmal efficiency.
            My 1.25 cent

            Comment


            • #7
              Originally posted by What_Da_Seq View Post
              How many single/di/trinucleotides are being generated by the fragmentation process - invisible to visualization and percent DNA lost in the fragment sizing step. Philip I get your line of inquiry and I am wondering if single molecule sequencing improves on the abysmal efficiency.
              My 1.25 cent
              Depends on the method. Nebulization/Hydroshear probably produces very few oligomers. But size selection will drastically reduce the amount of DNA. Still this usually is not much more than 90% loss.

              My guess is that the majority of the subsequent loss is a result of (1)Unrepairable ends and (2)DNA damage that prevents DNA replication.

              --
              Phillip

              Comment


              • #8
                In the figure 2 of this paper you can see where does you DNA go and the efficiency of each step in the 454 library preparation process:

                There seem to be the biggest loss at the NaOH melting step

                Comment


                • #9
                  Originally posted by McTomo View Post
                  In the figure 2 of this paper you can see where does you DNA go and the efficiency of each step in the 454 library preparation process:

                  There seem to be the biggest loss at the NaOH melting step
                  Thanks, that is very interesting. I knew about the highly variable and generally extremely low yields from the library immobilization/ssDNA elution. Bruce Roe's lab, for example, discards that step altogether. But the Maricic and Paabo method does make it seem much more attractive.

                  A couple of notes. This paper only deals with post adaptor ligation DNA loss, because it starts with a PCR product/library molecule. Also, even the 99% potential loss of this step only explains 2 of the >6 orders of magnitude of DNA loss in the Roche protocol.

                  As I've mentioned before I think most of the rest is probably the result of un-repairable ends and DNA damage that has rendered a given strand un-replicatable.

                  --
                  Phillip

                  Comment


                  • #10
                    Originally posted by pmiguel View Post
                    A couple of notes. This paper only deals with post adaptor ligation DNA loss, because it starts with a PCR product/library molecule. Also, even the 99% potential loss of this step only explains 2 of the >6 orders of magnitude of DNA loss in the Roche protocol.
                    If you multiply the losses in each step of the library preparation process (starting at the blunting), you come to the ~10% of the starting DNA ending up in the 454 library. Even though the PCR product was used, it has to be repaired: overhanging A's have to be removed and the phosphates have to be added. However, I agree that there might to be other types of damage that appear in the sheared genomic DNA that can't be repaired.

                    Comment


                    • #11
                      Originally posted by McTomo View Post
                      If you multiply the losses in each step of the library preparation process (starting at the blunting), you come to the ~10% of the starting DNA ending up in the 454 library. Even though the PCR product was used, it has to be repaired: overhanging A's have to be removed and the phosphates have to be added. However, I agree that there might to be other types of damage that appear in the sheared genomic DNA that can't be repaired.
                      Yes, my guess is that non-enzymatic fragmentation methods produce some ends that cannot be repaired by the typical T4-polymerase/T4-PNK. I posted my speculation on this topic, based largely on a very old paper:

                      http://seqanswers.com/forums/showthread.php?t=2759


                      The upshot was that sonication predominantly broke C-O bonds. While these C-O breaks may proceed through solvolysis to C-OH ends, other outcomes are conceivable. Unclear what ends nebulization/hydroshearing produce.

                      While an unrepairable end, on either end, of a DNA fragment will prevent creation of a library amplicon from that fragment, there are other issues to consider. DNA damage may prevent replication of a DNA strand. How damaged is the typical DNA prep? I'm sure this has been considered in the literature. But a PCR reaction, lacking the support of a cellular environment, would be much more susceptible to chain-terminating DNA damage than an in vivo assay would detect.

                      I think this is why the SOLiD protocols invariably utilize a pre-ePCR, PCR step. That way amplifiable library molecules will predominate in a sample and assays of that pre-amplified sample will more accurately predict that sample's behavior in ePCR.

                      --
                      Phillip

                      Comment


                      • #12
                        How many genomes (human haploid) are in a ug of DNA?

                        Comment


                        • #13
                          Originally posted by happy View Post
                          How many genomes (human haploid) are in a ug of DNA?
                          How about the chicken haploid genome instead? That is 1 billion bp.

                          If 1 ug of 1 thousand bp fragments is 1 trillion molecules, that is the same as saying that a quadrillion bp genome (1 thousand x 1 trillion = 1E+03 x 1E+12 = 1E+15 = 1 quadrillion) is 1 ug.

                          So:

                          Code:
                          genome             genome
                          size (bp) 	   mass
                          -------------------------------
                          1 quadrillion 	   1 ug
                          1 trillion 	   1 ng
                          1 billion 	   1 pg
                          1 million          1 fg
                          So a haploid chicken genome is 1 pg. 1 million haploid chicken genomes are in a ug of chicken DNA.

                          That means a haploid human genome is 3 pg. So 1 ug of human DNA is roughly 333,333 human genomes.

                          --
                          Phillip

                          Comment


                          • #14
                            Great Thread,

                            Stepping back a bit to dissect this we have been testing how far we can go with simpler Fragment libraries as a first measure of this. Most Circularized protocols have many inefficient steps and we began by quantitating how much DNA we can get from just fragmenting DNA and adapting it and then counting distinct molecules on the back end we have gone as low as 750pg of already sheared DNA to generate 30-40M distinct 50mer human reads. I think this is a very key point. This is roughly 300 copies of the genome but most importantly, we didnt covaris this DNA. It came from Maternal blood streams so it was enzymaticaly digested in situ or in vivo. It also has a very different GC content than Covaris DNA not surprisingly.
                            The reason I find this is intriguing is that all methods eventually go through a final Frag adaptor ligation so its important to know the efficiency of this step and its after all the simplest to measure. We will be backing up into the various circularization protocols shortly but already know the SOLiD circles are 10-20% efficient at the lengths mentioned above.

                            In terms of Covaris'd DNA, I will look through our data but we have performed 600M read on 1 ug buccal DNA Covaris'd from a patient and not saturated this library. We probably need to go deeper to understand if the different shearing methods are playing a damaging effect.

                            I found the complete genomic paper fairly well written in regards to exact pmols at each step. Lots of amplification along the way but its clear we need protocols which speak to these quants at every step with the other platforms as well.

                            The final point I'd add to the discussion is that not all quantified DNA is amplifiable or makes it to a bead or to a cluster. We're working with emPCR on SOLiD and we assume a 1/2 to 2/3rds of our reactors have DNA and no beads. We lean on the pushing the bead poisson high and the template poisson low as 2 beads in a reactor dont kill us but 2 templates do.

                            Similar effects may exist on the poisson curves for clusters...ie Flow cells must be flooded with 1 concentration where only a portion of this concentration can seed the flow cell surface but molecules exist throughout the whole volume and I'm still unclear if both surfaces amplify and only one being imaged creates another factor of 2 loss?

                            Comment


                            • #15
                              Originally posted by Nitrogen-DNE-sulfer View Post
                              Great Thread,

                              [...]
                              from just fragmenting DNA and adapting it and then counting distinct molecules on the back end we have gone as low as 750pg of already sheared DNA to generate 30-40M distinct 50mer human reads. I think this is a very key point. This is roughly 300 copies of the genome but most importantly, we didnt covaris this DNA. It came from Maternal blood streams so it was enzymaticaly digested in situ or in vivo.
                              [...]
                              (30-40M unique starts, right? Two reads that map to the same start position in a genome may (or may not) derive from a single input DNA molecule. That is why using single-sided reads makes this process difficult to assess.)

                              40M reads implies 40M 80-130 bp insert amplicons. What yield does that represent?

                              1 pg of 100 bp DNA is roughly 10 million molecules. So you started with 7.5 billion molecules (presuming they were all ~100 bp -- which, obviously, they would not have been). That would imply that roughly 1 in 200 of the original molecules were successfully converted to templated beads.

                              Was this DNA already size selected when you measured it as 750 pg?

                              I would expect any smear of DNA to be <10% in the fairly tight size distribution (150-200 bp for the full amplicon length == 80-130 bp of insert). Even as low as 1% would not be surprising.

                              So, yes that result would be consistent with nearly all the molecules being ligatable on both ends and amplifiable. Or less than 10% of them being so. Hard to say.

                              Originally posted by Nitrogen-DNE-sulfer View Post
                              The reason I find this is intriguing is that all methods eventually go through a final Frag adaptor ligation so its important to know the efficiency of this step and its after all the simplest to measure.
                              By qPCR? By sequencing, it is not so simple. The human genome is replete with repetitive DNA, so single end reads are difficult to assess as to their derivation from a unique chunk of your original sample DNA. This is because the pre-PCR amplification step would make lots of copies of all amplifiable amplicons.


                              Originally posted by Nitrogen-DNE-sulfer View Post
                              In terms of Covaris'd DNA, I will look through our data but we have performed 600M read on 1 ug buccal DNA Covaris'd from a patient and not saturated this library. We probably need to go deeper to understand if the different shearing methods are playing a damaging effect.
                              Yes, especially since minor changes in the shearing buffer may lead to different outcomes. I'm not a chemist, but if the C-O bond breakage that apparently predominates in sonication-mediated DNA fragmentation

                              http://seqanswers.com/forums/showthread.php?t=2759

                              can result in different fragment ends, then factors such as pH may influence which end-type does result. That is, a break between the C5' and O or C3' and O may result in the desired outcome: hydrolytic restoration of the end to a 5' or 3' OH. Or it could result in undesired outcomes such as ribose-sugar ring opening or maybe even loss of C5' entirely. (Again, I'm not a chemist, the above is rampant speculation.) Point being, mixtures of T4-polymerase/T4-PNK probably cannot repair the latter outcomes into something ligatable.

                              Originally posted by Nitrogen-DNE-sulfer View Post
                              [...]
                              The final point I'd add to the discussion is that not all quantified DNA is amplifiable or makes it to a bead or to a cluster. We're working with emPCR on SOLiD and we assume a 1/2 to 2/3rds of our reactors have DNA and no beads. We lean on the pushing the bead poisson high and the template poisson low as 2 beads in a reactor dont kill us but 2 templates do.
                              Then it does not seem you would lose many amplicons in ePCR. That is, bead poisson high: therefore nearly all reactors have beads. Template poisson low -- most of the reactors have a bead but no template, but where there is a template it will almost certainly have a bead to bind it.

                              Originally posted by Nitrogen-DNE-sulfer View Post
                              Similar effects may exist on the poisson curves for clusters...ie Flow cells must be flooded with 1 concentration where only a portion of this concentration can seed the flow cell surface but molecules exist throughout the whole volume and I'm still unclear if both surfaces amplify and only one being imaged creates another factor of 2 loss?
                              I don't have a Solexa, so I don't know. But I will note, tangentially, that it is interesting that after a couple of years of direct competition between Solexa and SOLiD, it now appears that the two platforms are veering into slightly different niches. Solexa, with paired-end 100 base reads seems poised to conquer the de novo sequencing niche. Whereas SOLiD appears to have abandoned longer reads to concentrate on increasing read numbers. Which, everything else being equal, would give them control of the resequencing niche (including digital gene expression). That said, everything else is not equal. Illumina had instruments out in the field at least a full year before AB did. And then there is the PacBio instrument looming...

                              --
                              Phillip

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X