Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • The Beginning of the End for Exome Sequencing

    By Radoje Drmanac | Oct 27, 2011 | 11:34 AM

    Today, researchers use two approaches to identifying disease-associated variants in the human genome: exome sequencing, which targets the protein-coding regions that make up approximately 1% of the genome, and whole genome sequencing, which investigates the vast majority of the genome and includes both coding and non-coding regions. Historically, researchers opted for exome sequencing because it cost less, protein-coding variants were more easily interpreted than non-coding ones, and it had been used successfully to identify disease-causing variants in several cases. However, targeted sequencing of only a specified list of protein-coding sequences means that DNA variations outside of those regions are missed. Moreover, exome capture by hybridization can introduce considerable coverage variability, affecting comparative analysis and limiting discovery efforts. Perhaps the greatest disadvantage of exome sequencing is its low sensitivity to copy number and structural variations, whereas whole genome sequencing can detect these variation events as well as many copy neutral events, such as uniparental disomies and inversions or translocations.

    Technical considerations also suggest that the most accurate and effective way to sequence the exome may be to sequence the whole human genome. Current commercially available exome targeting kits typically cover an incomplete portion of the exome. Kits from different vendors rely on different definition of ‘exome’, and even kits from a single vendor get frequently updated. The net effect is that the general term ‘exome sequencing’ refers to the sequencing of different genome regions amongst experiments using different kits. Even with each kit, not all of the desired exome is captured, as certain exons are excluded during design of the capture probes because of reasons including size or hybridization thermodynamics. Furthermore, the addition of selection itself may introduce biases that prevent the detection of exonic variants.

    While a significant proportion of sequencing variants may have no discernible effect on phenotype, thousands of well-annotated and conserved elements implicated in disease exist outside of protein-coding regions. Haussler’s team at UCSC1 recently discovered ~3M non-coding evolutionary conserved sequences in human genome at 10% detection sensitivity. At 28 bases in length, on average, such regulatory sequences can constitute >20% of genome compared to ~1% in coding sequences. I think this and other recent results open a new frontier in human genetics. In addition, recent studies now show a far larger fraction of the human genome is systematically transcribed than previously thought, resulting in the discovery and characterization of new classes of non-protein-coding genes. Moreover, a sizable fraction of loci identified by genome-wide association studies lie within so-called “gene deserts”, i.e., genomic regions with no known protein-coding genes.

    The practical utility of whole human genome sequencing for identifying disease-associated variants was recently demonstrated in a study published in Nature2. In a study of 38 multiple myeloma patients, 23 tumor-normal pairs were investigated using whole genome sequencing, 16 were examined using exome sequencing, and one pair was sequenced by both methods. The results showed that the mutation frequency in coding regions was significantly less than in the intronic and intergenic regions due to negative selection pressure against mutations disrupting the coding sequence. In addition, 18 statistically significant mutated non-coding regions were identified. While exome sequencing identified most of the significantly mutated genes, half of the total protein-coding mutations occurred in chromosomal aberrations such as translocations, most of which would have been missed by sequencing only the exome. Recurrent point mutations in non-coding regions would also have been missed. The paper concludes that whole genome sequencing offers the most comprehensive analysis of coding, non-coding, and other functional elements of the genome.

    Finally, cost has become much less of an issue. For approximately $4,000, just a little more than the cost of sequencing an exome, Complete Genomics offers whole human genome sequencing for projects with 50 samples or more with all the benefits of this comprehensive genetic test. Data provided for each genome has on average 55x mapped coverage, and typically greater than 95% of the calls on both alleles within the coding and non-coding regions of the genome. Our resulting genomic data includes files of detected and annotated coding and non-coding sequence variants (SNPs, small indels, CNVs, and SVs), data summary reports, and a full set of supporting data for these results. In addition, the use of genome variants is as easy as using exome variants with our included annotation of typical known regulatory elements. And the sequencing can be done quickly, with large studies comprised of hundreds of whole genomes completed in just a few months with guaranteed quality. This is the reason that some projects planned for exome sequencing have already switched to whole genome sequencing. It’s mainly inertia and lack of awareness of the progress of whole genome sequencing that has prevented a faster switch.

    I strongly believe that whole human genome sequencing now offers researchers a much more informative, cost-effective, easy to use, rapid and comprehensive alternative to exome sequencing for identifying disease-associated variants in the human genome.

    References

    1Lowe, et al, “Three Periods of Regulatory Innovation During Vertebrate Evolution”, Science, 333:1019-1023 (2011)

    2Chapman, et al., “Initial Genome Sequencing and Analysis of Multiple Myeloma,” Nature 471:467-472 (2011)

  • #2
    I definitely support the move from exome to complete genome.

    However, this article neglects that whole genome sequencing does not solve the exome sequencing problem immediately.

    If you look at "callability" the story is not so simple (see the Marguiles paper for more details). Coverage is meaningless without considering callability. In fact, the recent paper by Marguiles et al showed that from whole genome sequencing with 50X coverage, only 80% of the exome was callable. This can easily translate into not having completely and sufficiently sequenced upwards of 70% of the genes!! (my guess from some experiences with exome sequencing).

    The advantage to whole genome is you no longer have to worry about bad baits... but you will need an enormous amount to achieve complete callability of the entire exome. Exome sequencing is still cheaper for the moment....

    Comment


    • #3
      The author is from Complete Genomics. Thus it's not surprising that he is promoting whole genome sequencing.

      Comment


      • #4
        Sheesh, what a load of bollocks in there. Clearly this is a 'smartly' written advert - unfortunately scientists aren't so dumb to fall for it.

        That aside - there is a certain truth to it, although discounting exome sequencing at this stage is a little premature - plus the pricing info is inaccurate to say the least.

        Comment


        • #5
          I have a feeling exome sequencing or a variation of it will be around for a while. Check out the PR from BGI today (http://www.genomeweb.com//node/98888...q_v=4bbae1996c) for a different take on the matter. From my side it really breaks down to your perspective. If my daughter has cancer and I want the best result I go for a 50x tumor normal pair, right now that is around 8 lanes total on our HiSeq runs. Alternatively, I'm a researcher trying to pan as many samples as possible I probably go for an exon capture method targeting 100x average coverage but now I run 32 samples (16 tumor normal/pairs) The dividing line will be if you start getting 1 genome per lane I'd guess as the cost of capture will start being the most costly line item per sample and at a certain point.

          Comment


          • #6
            Originally posted by NextGenSeq View Post
            The author is from Complete Genomics. Thus it's not surprising that he is promoting whole genome sequencing.
            I agree with you

            Comment


            • #7
              Originally posted by Jon_Keats View Post
              I have a feeling exome sequencing or a variation of it will be around for a while. Check out the PR from BGI today (http://www.genomeweb.com//node/98888...q_v=4bbae1996c) for a different take on the matter. From my side it really breaks down to your perspective. If my daughter has cancer and I want the best result I go for a 50x tumor normal pair, right now that is around 8 lanes total on our HiSeq runs. Alternatively, I'm a researcher trying to pan as many samples as possible I probably go for an exon capture method targeting 100x average coverage but now I run 32 samples (16 tumor normal/pairs) The dividing line will be if you start getting 1 genome per lane I'd guess as the cost of capture will start being the most costly line item per sample and at a certain point.
              If your daughter has cancer, isn't an even better bet would be to do RNA sequencing for both the tumor cells and the normal cells of the same type?

              Of course, in addition to that, doing exome sequencing on both as well will be even better

              Comment


              • #8
                For many tumor types, 50X coverage isn't nearly deep enough. Most tumor samples have admixed normal tissue, which dilute the tumor sample. There's also tumor heterogeneity. I think most folks planning on tumor analysis by capture are shooting for more like 1000s of X.

                Complete' recent publication of their Long Fragment Read technology should open a new set of questions for using exome sequencing in cancer. If you have a high quality (not FFPE) sample, then LFR by reading out long-range information may provide valuable additional data -- particularly the question of whether multiple hits in tumor suppressors are in trans or in cis (same chromatid or different). On the other hand, the degree to which cellular and copy number heterogeneity will foul their algorithm was completely unaddressed in their publication.

                Comment


                • #9
                  Originally posted by krobison View Post
                  For many tumor types, 50X coverage isn't nearly deep enough. Most tumor samples have admixed normal tissue, which dilute the tumor sample. There's also tumor heterogeneity. I think most folks planning on tumor analysis by capture are shooting for more like 1000s of X.

                  Complete' recent publication of their Long Fragment Read technology should open a new set of questions for using exome sequencing in cancer. If you have a high quality (not FFPE) sample, then LFR by reading out long-range information may provide valuable additional data -- particularly the question of whether multiple hits in tumor suppressors are in trans or in cis (same chromatid or different). On the other hand, the degree to which cellular and copy number heterogeneity will foul their algorithm was completely unaddressed in their publication.
                  Thanks for your input. So 100x on the tumor exome and 50x on the normal exome is enough?

                  Do you think for cancer analysis, RNA sequencing is more important than exome/genome sequencing because it is closer to the phenotype?

                  Comment


                  • #10
                    RNA sequencing is closer to the phenotype, but requires much more careful sample handling and preparation. Many oncogenes are expressed at tiny amounts and there is background expression from the normal tissue, so RNA-Seq may miss key mutations.

                    I've quit following the literature closely, but it did seem like whole genome, whole exome & RNA-Seq were all making major contributions to identifying oncogenes. The choice of technique seems to some degree be lab-specific, though there are probably disease considerations as well. In the clinic, I generally think amplicon sequencing will probably succeed first, with targeted capture ("exome") being close behind. Routine whole genome sequence for cancer might be a while -- the ability to get extremely high depth is a big advantage for capture & amplicon here.

                    Sufficient depth will depend on the degree of admixture, and in the end is a sensitivity question (deeper gets you greater sensitivity; how much is required depends on the degree of admixture).

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    31 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X