Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell L

    The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line.

    PLoS Genet 6(1): e1000832.

    Authors: Clark MJ, Homer N, O'Connor BD, Chen Z, Eskin A, et al. 2010 U87MG Decoded:

    BACKGROUND: U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date.

    Author Summary Glioblastoma has a particularly dismal prognosis with median survival time of less than fifteen months. Here, we describe the broad genome sequencing of U87MG, a commonly used and thus well-studied glioblastoma cell line. One of the major features of the U87MG genome is the large number of chromosomal abnormalities, which can be typical of cancer cell lines and primary cancers. The systematic, thorough, and accurate mutational analysis of the U87MG genome comprehensively identifies different classes of genetic mutations including single-nucleotide variations (SNVs), insertions/deletions (indels), and translocations. We found 2,384,470 SNVs, 191,743 small indels, and 1,314 large structural variations. Known gene models were used to predict the effect of these mutations on protein-coding sequence. Mutational analysis revealed 512 genes homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and up to 35 by interchromosomal translocations. The major mutational mechanisms in this brain cancer cell line are small indels and large structural variations. The genomic landscape of U87MG is revealed to be much more complex than previously thought based on lower resolution techniques. This mutational analysis serves as a resource for past and future studies on U87MG, informing them with a thorough description of its mutational state.
  • Michael.James.Clark
    Senior Member
    • Apr 2009
    • 207

    #2
    Thanks for posting this, Nils.

    I look forward to any comments about our work.
    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
    Projects: U87MG whole genome sequence [Website] [Paper]

    Comment

    • krobison
      Senior Member
      • Nov 2007
      • 734

      #3
      Nils & Michael:

      I've been meaning to write something longer on my blog, but it seems to be stuck on the procrastination non-express.

      It would appear you were much more successful at finding short indels than the Sanger paper which used SOLiD -- in particular their automated pipeline failed to find a known oncogenic 2-nt deletion.

      How much of the credit do you think is due to BFAST and how much from longer read lengths (2x50 vs 2x25)? Any other factors?

      Keith R.

      Comment

      • Michael.James.Clark
        Senior Member
        • Apr 2009
        • 207

        #4
        Hi Keith,

        Glad the paper piqued your interest. Both of the reasons you came up with increased our sensitivity to indels. I'd also suggest that our relatively high coverage can also be in part credited with our success identifying small indels.

        BFAST in particular is quite sensitive to indels. You may also have noticed that we were able to detect some relatively large indels (up to 21 bases in length). BFAST was able to correctly align over these events.

        You can read the BFAST paper (check the supplemental materials--it explains in great detail) for some enlightenment about why it's able to do this perhaps better than alternative aligners.

        Michael
        Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
        Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
        Projects: U87MG whole genome sequence [Website] [Paper]

        Comment

        • NGSfan
          Senior Member
          • Apr 2009
          • 181

          #5
          Thanks for posting it - this is a very exciting paper for those of us working on cancer cell line sequencing! Especially since it is using BFAST, an aligner that we are very keen on using for somatic mutation detection.

          From this paper I have also gotten interested in the SeqWare pipeline - I see that the pipeline is capable of automated annotation of gene mutations (eg - calling SNVs & indels and assigning them a description of "frameshift, start-codon loss, etc" ). And this with UCSC KnownGene tables ! Looking forward to trying it.

          Lastly, I would like to practice my alignments and analysis on the exon captured illumina sequencing data, but I am a bit confused about which of the three datasets is the one with the Illumina exon capture?

          Comment

          • Michael.James.Clark
            Senior Member
            • Apr 2009
            • 207

            #6
            Originally posted by NGSfan View Post
            Thanks for posting it - this is a very exciting paper for those of us working on cancer cell line sequencing! Especially since it is using BFAST, an aligner that we are very keen on using for somatic mutation detection.

            From this paper I have also gotten interested in the SeqWare pipeline - I see that the pipeline is capable of automated annotation of gene mutations (eg - calling SNVs & indels and assigning them a description of "frameshift, start-codon loss, etc" ). And this with UCSC KnownGene tables ! Looking forward to trying it.

            Lastly, I would like to practice my alignments and analysis on the exon captured illumina sequencing data, but I am a bit confused about which of the three datasets is the one with the Illumina exon capture?

            http://www.ncbi.nlm.nih.gov/sites/en...57&report=full
            Yeah, the SeqWare database is really interesting and useful and I really encourage you to play around with it.

            As for the Illumina pull-down data, looks like it hasn't been uploaded to SRA yet! Those three sets are the SOLiD data.

            For resources from the paper, I strongly suggest anyone interested go look at http://genome.ucla.edu/U87 because there are many useful links including direct links to variant files.
            Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
            Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
            Projects: U87MG whole genome sequence [Website] [Paper]

            Comment

            • NGSfan
              Senior Member
              • Apr 2009
              • 181

              #7
              Thanks for reminding me to check out the paper's webpage - that has a lot of useful info!

              Please let us know when the Illumina reads come out.

              Btw - I didn't catch it in the paper, but as part of the variant detection, did you guys try recalibrating the quality scores using the GATK software? They make some pretty convincing arguments how this improves variant calls. If I get the Illumina data to practice on, I will try recalibration with GATK and see what happens out of curiousity.

              Comment

              • Michael.James.Clark
                Senior Member
                • Apr 2009
                • 207

                #8
                Everything we did is described in the paper.

                I have heard like you have that GATK is good for variant calling, though, and we are looking into it for current projects.
                Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                Projects: U87MG whole genome sequence [Website] [Paper]

                Comment

                • Michael.James.Clark
                  Senior Member
                  • Apr 2009
                  • 207

                  #9
                  Originally posted by NGSfan View Post
                  Please let us know when the Illumina reads come out.
                  Just FYI, the Illumina Exon Pull-Down Data is now available on the U87MG page:



                  The BAM file was aligned as described in the paper (using BFAST). The raw FASTQ is also provided.

                  There's been a lot of demand for that data in particular.
                  Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                  Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                  Projects: U87MG whole genome sequence [Website] [Paper]

                  Comment

                  • NGSfan
                    Senior Member
                    • Apr 2009
                    • 181

                    #10
                    Excellent!! Thank you kindly for the update! I'm looking forward to practicing pair end alignments with the BFAST program! It will give me a head start on our own data.

                    Comment

                    • Chipper
                      Senior Member
                      • Mar 2008
                      • 323

                      #11
                      Nils,

                      I am trying to figure out wich BFAST settings you used and how the alignments were filtered. In the paper you write "We choose the “best scoring” alignment, accepting an alignment only if it was at least the equivalent edit distance of two color errors away from the next best alignment", is this the same as the -A 2 or A 3 option?

                      Comment

                      • nilshomer
                        Nils Homer
                        • Nov 2008
                        • 1283

                        #12
                        Originally posted by Chipper View Post
                        Nils,

                        I am trying to figure out wich BFAST settings you used and how the alignments were filtered. In the paper you write "We choose the “best scoring” alignment, accepting an alignment only if it was at least the equivalent edit distance of two color errors away from the next best alignment", is this the same as the -A 2 or A 3 option?
                        We used "-A 3" in "bfast postprocess", then a minimum mapping quality of 20 assuming you left "-q" in "bfast localalign" as the default.

                        Nils

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Pathogen Surveillance with Advanced Genomic Tools
                          by seqadmin




                          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                          03-24-2025, 11:48 AM
                        • seqadmin
                          New Genomics Tools and Methods Shared at AGBT 2025
                          by seqadmin


                          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                          The Headliner
                          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                          03-03-2025, 01:39 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-20-2025, 05:03 AM
                        0 responses
                        49 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-19-2025, 07:27 AM
                        0 responses
                        57 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-18-2025, 12:50 PM
                        0 responses
                        50 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-03-2025, 01:15 PM
                        0 responses
                        201 views
                        0 reactions
                        Last Post seqadmin  
                        Working...