Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell L

    The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line.

    PLoS Genet 6(1): e1000832.

    Authors: Clark MJ, Homer N, O'Connor BD, Chen Z, Eskin A, et al. 2010 U87MG Decoded:

    BACKGROUND: U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date.

    Author Summary Glioblastoma has a particularly dismal prognosis with median survival time of less than fifteen months. Here, we describe the broad genome sequencing of U87MG, a commonly used and thus well-studied glioblastoma cell line. One of the major features of the U87MG genome is the large number of chromosomal abnormalities, which can be typical of cancer cell lines and primary cancers. The systematic, thorough, and accurate mutational analysis of the U87MG genome comprehensively identifies different classes of genetic mutations including single-nucleotide variations (SNVs), insertions/deletions (indels), and translocations. We found 2,384,470 SNVs, 191,743 small indels, and 1,314 large structural variations. Known gene models were used to predict the effect of these mutations on protein-coding sequence. Mutational analysis revealed 512 genes homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and up to 35 by interchromosomal translocations. The major mutational mechanisms in this brain cancer cell line are small indels and large structural variations. The genomic landscape of U87MG is revealed to be much more complex than previously thought based on lower resolution techniques. This mutational analysis serves as a resource for past and future studies on U87MG, informing them with a thorough description of its mutational state.

  • #2
    Thanks for posting this, Nils.

    I look forward to any comments about our work.
    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
    Projects: U87MG whole genome sequence [Website] [Paper]

    Comment


    • #3
      Nils & Michael:

      I've been meaning to write something longer on my blog, but it seems to be stuck on the procrastination non-express.

      It would appear you were much more successful at finding short indels than the Sanger paper which used SOLiD -- in particular their automated pipeline failed to find a known oncogenic 2-nt deletion.

      How much of the credit do you think is due to BFAST and how much from longer read lengths (2x50 vs 2x25)? Any other factors?

      Keith R.

      Comment


      • #4
        Hi Keith,

        Glad the paper piqued your interest. Both of the reasons you came up with increased our sensitivity to indels. I'd also suggest that our relatively high coverage can also be in part credited with our success identifying small indels.

        BFAST in particular is quite sensitive to indels. You may also have noticed that we were able to detect some relatively large indels (up to 21 bases in length). BFAST was able to correctly align over these events.

        You can read the BFAST paper (check the supplemental materials--it explains in great detail) for some enlightenment about why it's able to do this perhaps better than alternative aligners.

        Michael
        Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
        Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
        Projects: U87MG whole genome sequence [Website] [Paper]

        Comment


        • #5
          Thanks for posting it - this is a very exciting paper for those of us working on cancer cell line sequencing! Especially since it is using BFAST, an aligner that we are very keen on using for somatic mutation detection.

          From this paper I have also gotten interested in the SeqWare pipeline - I see that the pipeline is capable of automated annotation of gene mutations (eg - calling SNVs & indels and assigning them a description of "frameshift, start-codon loss, etc" ). And this with UCSC KnownGene tables ! Looking forward to trying it.

          Lastly, I would like to practice my alignments and analysis on the exon captured illumina sequencing data, but I am a bit confused about which of the three datasets is the one with the Illumina exon capture?

          Comment


          • #6
            Originally posted by NGSfan View Post
            Thanks for posting it - this is a very exciting paper for those of us working on cancer cell line sequencing! Especially since it is using BFAST, an aligner that we are very keen on using for somatic mutation detection.

            From this paper I have also gotten interested in the SeqWare pipeline - I see that the pipeline is capable of automated annotation of gene mutations (eg - calling SNVs & indels and assigning them a description of "frameshift, start-codon loss, etc" ). And this with UCSC KnownGene tables ! Looking forward to trying it.

            Lastly, I would like to practice my alignments and analysis on the exon captured illumina sequencing data, but I am a bit confused about which of the three datasets is the one with the Illumina exon capture?

            http://www.ncbi.nlm.nih.gov/sites/en...57&report=full
            Yeah, the SeqWare database is really interesting and useful and I really encourage you to play around with it.

            As for the Illumina pull-down data, looks like it hasn't been uploaded to SRA yet! Those three sets are the SOLiD data.

            For resources from the paper, I strongly suggest anyone interested go look at http://genome.ucla.edu/U87 because there are many useful links including direct links to variant files.
            Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
            Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
            Projects: U87MG whole genome sequence [Website] [Paper]

            Comment


            • #7
              Thanks for reminding me to check out the paper's webpage - that has a lot of useful info!

              Please let us know when the Illumina reads come out.

              Btw - I didn't catch it in the paper, but as part of the variant detection, did you guys try recalibrating the quality scores using the GATK software? They make some pretty convincing arguments how this improves variant calls. If I get the Illumina data to practice on, I will try recalibration with GATK and see what happens out of curiousity.

              Comment


              • #8
                Everything we did is described in the paper.

                I have heard like you have that GATK is good for variant calling, though, and we are looking into it for current projects.
                Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                Projects: U87MG whole genome sequence [Website] [Paper]

                Comment


                • #9
                  Originally posted by NGSfan View Post
                  Please let us know when the Illumina reads come out.
                  Just FYI, the Illumina Exon Pull-Down Data is now available on the U87MG page:



                  The BAM file was aligned as described in the paper (using BFAST). The raw FASTQ is also provided.

                  There's been a lot of demand for that data in particular.
                  Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                  Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                  Projects: U87MG whole genome sequence [Website] [Paper]

                  Comment


                  • #10
                    Excellent!! Thank you kindly for the update! I'm looking forward to practicing pair end alignments with the BFAST program! It will give me a head start on our own data.

                    Comment


                    • #11
                      Nils,

                      I am trying to figure out wich BFAST settings you used and how the alignments were filtered. In the paper you write "We choose the “best scoring” alignment, accepting an alignment only if it was at least the equivalent edit distance of two color errors away from the next best alignment", is this the same as the -A 2 or A 3 option?

                      Comment


                      • #12
                        Originally posted by Chipper View Post
                        Nils,

                        I am trying to figure out wich BFAST settings you used and how the alignments were filtered. In the paper you write "We choose the “best scoring” alignment, accepting an alignment only if it was at least the equivalent edit distance of two color errors away from the next best alignment", is this the same as the -A 2 or A 3 option?
                        We used "-A 3" in "bfast postprocess", then a minimum mapping quality of 20 assuming you left "-q" in "bfast localalign" as the default.

                        Nils

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        27 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        31 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        27 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X