Seqanswers Leaderboard Ad

**cariaso** · 05-14-2009, 05:57 AM

You may also be interested in this Korean genome which does have gff for its affy6 data.

http://www.snpedia.com/index.php/User:Kim_Seong-jin

ftp://ftp.kobic.re.kr/pub/PersonalGe..._Q40d4D100.gff

http://www.ncbi.nlm.nih.gov/books/bv...ion.ch5.ch5-s6 explains how to create a local mirror of dbsnp

from there you'll need to do a SELECT statement to pull out the rs#s.

Comparing non-rs# snps is not simple. If both snps are described from the same reference assembly it will be less painful, but thats unlikely to be true in the general case.

best workflow is based on what questions you want to answer. It also greatly affects your next question, CPU or Memory? When programming you can usually tune for more memory/less CPU and vice versa. In your case I expect the simplest approach is to slurp everything from the gff into memory, and then do queries against your mysql. That will be memory intensive. An alternative is to

1. presort the gff into numeric order.
2. export dbsnp into numeric order.
3. process the files sequentially (either step forward in file1 or file2 - keeping the rs#s in sync). This will have low memory requirements once the two lists are in order, and will simplify the code by keeping the heavy lifting in well optimized sorting routines.
4. This will be doable on a PC, but it'll take a while. If it were me I'd be crunching it in the amazon cloud on a small machine during dev, then switching to one of the beefy machines for the real run. Getting started in the amazon cloud may be more trouble than its worth, in which case my slightly neglected, but soon to be resurrected www.runblast.com might be of some interest.

**salturki** · 05-17-2009, 09:40 PM

cariaso,

I appreciate your help.

Thank you

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

SNPs Comparsion (Watson vs. YH vs dbSNPs vs X genome)

Comment

Comment

Latest Articles

ad_right_rmr

News