Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Available Personal Genomes

    Hi,

    We are resequencing segmental duplications of high sequence identity and there is very little overlap with dbSNP in these regions. Are there any personal genomes available other than Venter's which are sequenced with clones or a different long read technology which can effectively identify variation in these regions? It would be great to have a couple of points of comparison, in addition to Venter's personal genome, to sanity check and see that SNP concordance agrees across both the whole genome and these regions. Any input would be greatly appreciated.

    Thanks!

  • #2
    Here are some complete genomics genomes from PGP: https://my.pgp-hms.org/public_geneti...&commit=Search

    Comment


    • #3
      The best so far is the CHM1 pacbio assembly, but I don't if it has been publicly released yet. NA12878 also has pacbio assembly and public moleculo data (from 1000g ftp). These will be useful for investigating hard regions.

      Comment


      • #4
        CHM1 PacBio data has been released: http://blog.pacificbiosciences.com/2...erage-for.html

        Comment


        • #5
          I can't find Pacbio assembly of NA12878, do you know where this is
          available?

          From what I can tell, all of the genomes from PGP are sequenced with
          Complete Genomics which I thought had a relatively short read length. The
          personal genome *vcfs from hg19 are on UCSC. I don't understand how they
          able to call variants in repeat regions which standard whole genome
          Illumina 100bp reads can not disambiguate. Are these variants possibly
          the result of liftover errors from hg18 to hg19 for segmental duplications
          which were collapsed in the older version? Can these variants be trusted
          at all?

          I found the Pacbio assembly for chm1, but can only see the raw reads in the link
          given so this doesn't help that much. I found the supplementary material
          for the paper on biorxiv:

          This folder contains all of the supplementary material for Meltz Steinberg et al "Single haplotype assembly of the human genome from a hydatidiform mole"


          and CHM1_to_GRCh37_lite_snvs.site_filtered.pass.vcf is the only file which
          looks relevant, but the hetero:homo ratio of that vcf is 0.04 which looks
          suspect. Is there a different resource available than this which may not
          display this issue?

          Any other suggestions would be greatly appreciated. It would be great to
          have hg19 *vcfs which have variants in these regions which can be trusted.

          Thanks!

          Comment


          • #6
            PacBio assembly of CHM1 is here:



            It is different from the version I was looking at, but I believe it should be equally good. The NA12878 PacBio assembly has not been released yet.

            CHM1 is a haploid sample. Very low het:hom ratio is expected.

            EDIT: I should add that I am extremely impressed by the CHM1 assembly done by Jason Chin.
            Last edited by lh3; 09-25-2014, 11:29 AM.

            Comment


            • #7
              Ah, thanks for the clarification it is actually mentioned directly in their biorxiv paper but I overlooked it.

              Thanks!

              Comment


              • #8
                I overlooked it, too... An author told me the link yesterday.

                Comment


                • #9
                  Originally posted by ddoopus View Post
                  From what I can tell, all of the genomes from PGP are sequenced with Complete Genomics which I thought had a relatively short read length. The personal genome *vcfs from hg19 are on UCSC. I don't understand how they able to call variants in repeat regions which standard whole genome Illumina 100bp reads can not disambiguate. Are these variants possibly the result of liftover errors from hg18 to hg19 for segmental duplications which were collapsed in the older version? Can these variants be trusted at all?
                  I have a lot of experience with Complete Genomics data (but a bad memory, so the details are slightly fuzzy). Their reads are super-short. IIRC each "read" consists of 2x10bp fragments and 2x15bp fragments, or something like that, with unknown normally-distributed distances between the pieces but ~50% of the time the distance is one specific value, like 2bp. So you get reads like:
                  10bp sequenced, 0-2 bp unsequenced, 15bp sequenced, ~10bp unsequenced, 15bp sequenced, 0-2bp unsequenced, 10 bp sequenced.
                  ...roughly. I think some of the "readlets" were 5bp. Anyway, they are nothing like other platforms.

                  As a result, you cannot do de-novo assembly with them, and I would never trust them in long repetitive regions. In my testing, they are quite accurate for calling SNPs (using CG's calls) but abysmal at indels, with almost no concordance to indels called from 2x100bp Illumina data, or indels that could possibly have been inherited when analyzing sequenced parents+child trios. And FYI, the way they call indels is by de-novo reassembling the areas around suspected indels using reads that map spanning it, not directly from the reads.

                  I would not include CG genomes if you are studying 'difficult' parts of the genome that are low-complexity, repetitive, highly variable, or are interested in indels.
                  Last edited by Brian Bushnell; 09-25-2014, 05:58 PM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X