View Single Post
Old 12-05-2012, 10:38 AM   #15
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

wait, I have a better idea.
You compute the genetical distance between any pair of two samples, 1092^2 integers,4MB.
Just the number of set bits in the logical xor of the two 37M-bit-vectors.
Then you (circular) sort the 1092 samples so the sum of the distances between two neighbors
is minimal (traveling salesman problem, typically easy to solve for n=1092)
Then you compute the logical xors of any two adjacent samples, which presumably has lots of zeros.
1092 binary vectors of length 37M again, but this time with much better compression
via gzip or such because of the many zeros.
I can write you the programs for encoding and decoding, if you want.
Self-expanding executable, easy to use, all automatic.
The size of that file would be a measure of the genetical variability of your set of 1092 samples.
gsgs is offline   Reply With Quote