SEQanswers (
-   General (
-   -   human genome short read data for "Do it yourself genetic testing" (

attilav 08-18-2011 05:07 AM

human genome short read data for "Do it yourself genetic testing"
Hi all!

I would really like to try a certain program (, because in some ways it's similar to a project I'm working on. My problem is, that I just can't find the right data to try this program on. It needs the genome of an individual (so not the reference genome) in raw short read data format. I'm not really familiar with theese things, so I would really appreciate, if someone could tell me, where to find appropriate DNA sequences, that fit into this category.
The program uses the bowtie short read alignment program, so with other words, I need short reads of a human genome, that can be aligned by bowtie, I guess.


krobison 08-18-2011 07:11 AM

You can find tons of datasets in the NCBI Short Read Archive and the European Nucleotide Archive.

You can also find already aligned short read datasets (in BAM format) all over the place. The 1000 genome project is sequencing many at low coverage, so might not be good for you, but Watson's genome is available as well as many others (Venter's is available in long reads). Complete Genomics has made a large number of human genomes available on their website. Personal Genome Project should have files up as well.

(aside: perhaps there should be a wiki section on repositories of human and other genome alignments)

scordes 08-22-2011 01:59 PM

Unfortunately Bowtie is not optimized for Complete Genomics data. Specifically, Complete Genomics reads have sub-read gaps that Bowtie will interpret as mismatches. The high mismatch frequency will prevent Bowtie from successfully aligning many reads to the reference.

If you are interested in working with SNP genotypes and Complete Genomics data, we strongly recommend using the Complete Genomics-developed snpdiff command in our open source CGA Tools package ( This tool is specifically designed to extract SNP genotypes from Complete Genomics data, and to compare Complete Genomics genotype calls with SNP genotypes generated on other platforms.

All times are GMT -8. The time now is 12:22 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.