Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How many (human) sequences?

    Some time ago I posted a question about how many (high-speed) sequencers have been sold/installed, with interesting results.

    Now a related question: is there a resource where one can find a (constantly updated) figure for the number of full human genomes sequenced (of reasonable quality ie at least 20-fold redundancy, published, unpublished or half-published)?

    Thanks for your input and best wishes to all

    Bertrand
    Bertrand Jordan
    Marseille-Nice Genopole
    Luminy Science Park
    13288 Marseille, FRANCE

  • #2
    Must be at least 40 published human genomes now, including cancer genomes.

    Unpublished but claimed by reputable labs I would guess is several fold that -- I haven't totaled the number of genomes claimed in AACR webcasts but that would easily be a couple dozen (again, this includes tumor genomes)

    Comment


    • #3
      A tweet from the recent Cold Spring Harbor Laboratory genomes meeting estimated >600 human genomes were discussed there (You'll find it somewhere in http://twitter.com/search?q=%23bg2010 )

      Comment


      • #4
        If we are talking about the number of human individuals that have been resequenced, >600 is probably right. But I guess most of these come from the 1000 genomes project (g1k) and they are all <8X. My impressions is only 20-50 individuals have been sequenced >20X, while many are unpublished. Of the published high-cov data, many are for cancer sequencing and the data are dumped to dbGap. Using them has restrictions. I think from SRA/traceDB you can only get: venter, watson, a Chinese, two Koreans, NA18507 and the 6 individuals (2 trios) from g1k. I have not kept track of recent literature. Probably more are available by now. You know, with hiseq, you can easily get 20X coverage from 8 lanes.
        Last edited by lh3; 05-19-2010, 06:30 PM.

        Comment


        • #5
          Originally posted by lh3 View Post
          I think from SRA/traceDB you can only get: venter, watson, a Chinese, two Koreans, NA18507 and the 6 individuals (2 trios) from g1k. I have not kept track of recent literature. Probably more are available by now.
          Shameless plug, but also U87MG (SRX015657). Maybe SRA to have a page listing the published/deposited human genomes since I find their search function somewhat frustrating (maybe this has changed).

          The HiSeq and Solid 4 should help us get the raw data, but it sure will be fun once these data are well-aggregated and we can start querying over many genomes with ease.

          Comment


          • #6
            Thanks to all for your helpful bits of information on this topic. Apparently Illumina is trying to set up a "World Genome Registry" that "will go live in the next few weeks" according to an interview of Jay Flatley (Illumina CEO) in the InSequence newsletter. This is supposed to be a general registry for data from all sources. Sure would be helpful, if they manage to do it (and to do it right)! By the way, they (Illumina) just reduced their price for a 30X genome from 48k$ to 19.5K (9.5k in some cases).
            Cheers, Berrand
            Bertrand Jordan
            Marseille-Nice Genopole
            Luminy Science Park
            13288 Marseille, FRANCE

            Comment


            • #7
              > I think from SRA/traceDB you can only get:

              venter,
              watson,
              a Chinese,
              a Korean,
              a Korean,
              NA18507,
              individual from g1k
              individual from g1k
              individual from g1k
              individual from g1k
              individual from g1k
              individual from g1k

              > Probably more are available by now.
              -----------------------
              > also U87MG (SRX015657)

              > Maybe SRA to have a page listing the published/deposited
              > human genomes since I find their search function somewhat
              > frustrating (maybe this has changed).


              is there one big file in computer readable form containing
              all the mutations for analysis ?

              Comment


              • #8
                it's tedious, though, with these long files.
                So I think these files should be converted and offered in uniform
                standardized computer-readable format somewhere.

                Is anyone else here working with these files ?

                -------------
                first 3 lines in my file jw.d (differences in James Watson's genome to some (which ?) standard)
                01, 41921,GC,BJW-1117373
                01, 42101,TG,rs2691277.1
                01, 45408,CT,rs28396308
                ...
                chromosome,position,old+new nucleotide,gene-name(?)

                ~2M lines, 54MB uncompressed, maybe genename isn't needed

                now make the same format for YH,...


                --------------------------

                Code:
                file JW.gff ,  filed,name,counts
                1.) chr1,163479  chr2,174209  chr3,142245  chr4,148282  chr5,126170
                    chr6,129906  chr7,111889  chr8,106542  chr9,86346   chr10,105298
                    chr11,105511 chr12,96403  chr13,83250  chr14,64540  chr15,61959
                    chr16,62665  chr17,53075  chr18,59829  chr19,38679  chr20,43857
                    chr21,31727  chr22,24906  chrX,38031   chrY,1746
                2.) JW,2060544
                3.) genotype,1840461 
                     gt_novel,220083
                4.) 2049126 numbers   , chr1:<247M
                5.) same 2049126 numbers 
                    maximal values :
                    chr01:247175763 chr02:242691366 chr03:199365024 chr04:191262790 
                    chr05:180643418 chr06:170762121 chr07:158818812 chr08:146266634 
                    chr09:140218908 chr10:135374311 chr11:134451246 chr12:132288869 
                    chr13:114118283 chr14:106360250 chr15:100338311 chr16: 88690776 
                    chr17: 78643154 chr18: 76116029 chr19: 63789120 chr20: 62433614 
                    chr21: 46944097 chr22: 49570097 chr X:154874055 chr Y: 57442498
                    sum:3077M
                6.) .,2060544
                7.) +,2060544
                8.) .,2060544
                9a)  SNP rs: 1840461   SNP BJW: 220083    2060544 numbers in total
                9b)  alleles A/C;,86738
                     alleles A/G;,342479
                     alleles A/T;,73416
                     alleles C/A;,91673
                     alleles C/G;,86143
                     alleles C/T;,348865
                     alleles G/A;,349743
                     alleles G/C;,86305
                     alleles G/T;,92397
                     alleles T/A;,73611
                     alleles T/C;,342446
                     alleles T/G;,86728
                9c) ref_counts 0,382024
                    ref_counts 1,387528
                ...
                9d) oth_counts 1,602522
                ...

                trying YH now ...

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                50 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X