View Single Post
Old 12-05-2012, 07:46 AM   #8
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

currently I estimate (wild guess) you have ~500 complete human genomes (1500GB)
at ~10fold coverage but they are scattered in lots of different formats and directories
and it would take me ~10 hours to figure out how to find the data and decompress and
convert it and another ~5 hours to just download the compressed data

I'd like to see the estimates of others

----------new estimates-------
they have all 1092 genomes(people,"samples") sequenced at 2-6 fold coverage
(which I assume means that they have lots of small segments (~500 nucleotides
per segment ?) from the genome and those may have many errors but overlap
the genome at ~2-6 fold at each position)
critical positions, those with expected mutations overlap more often (50-100 fold)
So they have a total of ~2e13 overlapping nucleotides

the data is in "vcf" files with complicated format, so I stay with my estimate
of ~10hours work to convert them into a workable format.

The data could be ~700MB only, the y-chr came in 2 files of 29MB compressed
-------------------------------------------------

Last edited by gsgs; 12-05-2012 at 08:52 PM.
gsgs is offline   Reply With Quote