SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
"allele balance ratio" and "quality by depth" in VCF files efoss Bioinformatics 2 10-25-2011 12:13 PM
Relatively large proportion of "LOWDATA", "FAIL" of FPKM_status running cufflink ruben6um Bioinformatics 3 10-12-2011 01:39 AM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 08:55 AM
"Systems biology and administration" & "Genome generation: no engineering allowed" seb567 Bioinformatics 0 05-25-2010 01:19 PM
SEQanswers second "publication": "How to map billions of short reads onto genomes" ECO Literature Watch 0 06-30-2009 12:49 AM

Reply
 
Thread Tools
Old 08-18-2011, 05:07 AM   #1
attilav
Junior Member
 
Location: Hungary

Join Date: Mar 2011
Posts: 7
Default human genome short read data for "Do it yourself genetic testing"

Hi all!

I would really like to try a certain program (http://www.cbcb.umd.edu/software/BRCA-diagnostic/), because in some ways it's similar to a project I'm working on. My problem is, that I just can't find the right data to try this program on. It needs the genome of an individual (so not the reference genome) in raw short read data format. I'm not really familiar with theese things, so I would really appreciate, if someone could tell me, where to find appropriate DNA sequences, that fit into this category.
The program uses the bowtie short read alignment program, so with other words, I need short reads of a human genome, that can be aligned by bowtie, I guess.

Yours,
Attila

Last edited by attilav; 08-18-2011 at 05:13 AM.
attilav is offline   Reply With Quote
Old 08-18-2011, 07:11 AM   #2
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

You can find tons of datasets in the NCBI Short Read Archive and the European Nucleotide Archive.

You can also find already aligned short read datasets (in BAM format) all over the place. The 1000 genome project is sequencing many at low coverage, so might not be good for you, but Watson's genome is available as well as many others (Venter's is available in long reads). Complete Genomics has made a large number of human genomes available on their website. Personal Genome Project should have files up as well.

(aside: perhaps there should be a wiki section on repositories of human and other genome alignments)
krobison is offline   Reply With Quote
Old 08-22-2011, 01:59 PM   #3
scordes
Junior Member
 
Location: San Francisco, CA

Join Date: Jun 2011
Posts: 9
Default

Unfortunately Bowtie is not optimized for Complete Genomics data. Specifically, Complete Genomics reads have sub-read gaps that Bowtie will interpret as mismatches. The high mismatch frequency will prevent Bowtie from successfully aligning many reads to the reference.

If you are interested in working with SNP genotypes and Complete Genomics data, we strongly recommend using the Complete Genomics-developed snpdiff command in our open source CGA Tools package (http://cgatools.sourceforge.net/). This tool is specifically designed to extract SNP genotypes from Complete Genomics data, and to compare Complete Genomics genotype calls with SNP genotypes generated on other platforms.
__________________
Shaun Cordes, PhD | Customer Support Scientist | Complete Genomics, Inc.
Toll-free: (855) 267-5358 | Direct: (650) 943-2651
scordes@completegenomics.com
scordes is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO