SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
tabix and 1000 genomes data Alessandra Bioinformatics 8 09-05-2013 12:13 AM
Interpreting 1000 Genomes data ashkot Bioinformatics 5 01-05-2012 12:27 AM
Annotating 1000 Genomes data ashkot Bioinformatics 4 12-22-2011 10:53 AM
1000 Genomes Data RichardRocca General 1 03-16-2011 12:11 PM
1000 Genomes Data/ Exon targetted Firebird Bioinformatics 27 02-17-2011 12:08 PM

Reply
 
Thread Tools
Old 06-09-2010, 09:28 AM   #1
michelle.lupton
Junior Member
 
Location: Brisbane

Join Date: Jun 2010
Posts: 5
Question need 1000 genomes data for just one gene

Dear all,

I am a bit stuck trying to access the 1000 genomes data. I just have one candidate gene that I have sequenced and identified some novel SNPs. I want to check if these SNP have been identified already in the 1000 genomes data.

Is the data shown on the browser just from the pilot 1 release? If so is there a way of viewing the allele frequencies in the pilot 2 for identified SNPs without downloading raw data?

I sure this would be very quick, I'm afraid I am a bit lost looking at the web site.

Thanks for your help,
Michelle
michelle.lupton is offline   Reply With Quote
Old 06-09-2010, 09:47 AM   #2
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

You could try using the SeattleSeq database. It takes as input a list of variant coordinates, and reports back if some of them were found in (some freeze of) 1000genomes results.
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 06-09-2010, 11:56 AM   #3
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

samtools view -bo sample1.bam ftp://ftp-trace.nih.gov/....../sample1.bam 10:10,000-20,000

Using a URL as the file name, you can "download" the alignment around a target gene without downloading the full alignments, which amount to >20TB for the main project.

It would be better if someone is willing to write a web service.
lh3 is offline   Reply With Quote
Old 06-11-2010, 03:09 AM   #4
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

Is this data available from the UCSC table browser? In the SNPs track of the Variations and Repeats group, i can see a "valid does/does not include by-1000genomes" option when i click on "Create Filter".
steven is offline   Reply With Quote
Old 06-14-2010, 10:39 PM   #5
Firebird
Member
 
Location: Germany

Join Date: Jun 2010
Posts: 18
Default

Hello,

I want to do the same like michelle.lupton with SNPs I found. I used the samtools and the view comand.
My problem is that I need an automated function to check all data from all individuals, becuase I can't type the path for every individual in samtools.
Can i give samtools a list of paths?

Thanks!
Firebird is offline   Reply With Quote
Old 06-15-2010, 05:27 AM   #6
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

If you want SNPs, go to the 1KG ftp. There are VCFs for each population. They are SNP calls.
lh3 is offline   Reply With Quote
Old 06-16-2010, 10:45 AM   #7
Gustavo
Junior Member
 
Location: Seattle

Join Date: Jun 2010
Posts: 4
Default

I've recently built a very condensed data structure encoding known single nucleotide variations from a number of sources, and then wrote a very simple tool for querying this and getting answers to the question: has this variant been seen already (and if so, in which personal genome or population).

You're welcome to give it a try:
http://db.systemsbiology.net/gestalt/cgi-pub/Kaviar.pl

Suggestions for improvement are welcome.
Gustavo is offline   Reply With Quote
Old 07-07-2010, 12:05 PM   #8
xguo
Member
 
Location: Maryland

Join Date: Jul 2008
Posts: 48
Default

Quote:
Originally Posted by Gustavo View Post
I've recently built a very condensed data structure encoding known single nucleotide variations from a number of sources, and then wrote a very simple tool for querying this and getting answers to the question: has this variant been seen already (and if so, in which personal genome or population).

You're welcome to give it a try:
http://db.systemsbiology.net/gestalt/cgi-pub/Kaviar.pl

Suggestions for improvement are welcome.
Your tool looks very useful. Now dbSNP131 is out. Is it possible to update your site with the new version of dbSNP? Is there an easy way to do a batch query or install it locally?

thanks
xguo is offline   Reply With Quote
Old 07-20-2010, 11:49 PM   #9
BetterPrimate
Member
 
Location: NSW

Join Date: May 2010
Posts: 15
Default

Quote:
Originally Posted by Gustavo View Post
I've recently built a very condensed data structure encoding known single nucleotide variations from a number of sources, and then wrote a very simple tool for querying this and getting answers to the question: has this variant been seen already (and if so, in which personal genome or population).

You're welcome to give it a try:
http://db.systemsbiology.net/gestalt/cgi-pub/Kaviar.pl
That looks very interesting. I see that the webpage says there are 26,520,897 variants. Over at Hapmart there are 26,291,751. Is your database a superset of Hapmart or are there extra SNPs to be found in HapMart?
BetterPrimate is offline   Reply With Quote
Old 07-30-2010, 05:09 PM   #10
Gustavo
Junior Member
 
Location: Seattle

Join Date: Jun 2010
Posts: 4
Default

Quote:
Originally Posted by xguo View Post
Is it possible to update your site with the new version of dbSNP?
I just updated it to v. 131 of dbSNP.
The updated data structure includes ~29.4 million variants.

Quote:
Originally Posted by BetterPrimate View Post
Is your database a superset of Hapmart or are there extra SNPs to be found in HapMart?
I haven't compared it to HapMart's data, sorry.
Gustavo is offline   Reply With Quote
Old 08-09-2010, 02:14 AM   #11
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

Quote:
Originally Posted by michelle.lupton View Post
Dear all,

I am a bit stuck trying to access the 1000 genomes data. I just have one candidate gene that I have sequenced and identified some novel SNPs. I want to check if these SNP have been identified already in the 1000 genomes data.

Is the data shown on the browser just from the pilot 1 release? If so is there a way of viewing the allele frequencies in the pilot 2 for identified SNPs without downloading raw data?

I sure this would be very quick, I'm afraid I am a bit lost looking at the web site.

Thanks for your help,
Michelle
What data are you after?

If you want the snp calls you should look at the recent release made in July

ftp://ftp.1000genomes.ebi.ac.uk/vol1...lease/2010_07/

The readme explains what this release contains

ftp://ftp.1000genomes.ebi.ac.uk/vol1...010_07_release

Once you download the vcf files which contain the variant calls then

http://vcftools.sourceforge.net/

should help you to extract the data you need from these files

If you only want variants for a specific region of the genome you don't even have to download the whole file you could use the tabix program which comes from https://sourceforge.net/projects/samtools/files/ to download a subsection of the files like this

tabix ftp://ftp.1000genomes.ebi.ac.uk/vol1...notypes.vcf.gz 1:233411980:245804116
laura is offline   Reply With Quote
Old 08-09-2010, 01:00 PM   #12
tumorim
Junior Member
 
Location: US

Join Date: Aug 2010
Posts: 2
Default

download the three 1000G files from http://www.openbioinformatics.org/an..._download.html

Then suppose you gene is in region chr1:1000-2000, just do

perl -ne 'm/(\d+)\t(\d+)/ and $1 eq "1" and $2>=1000 and $2<=2000 and print' < hg18_CEU.sites.2010_03.txt

You'll get all variants in CEU population. Do the same for YRI/ASN.

Of course you can also download the whole ANNOVAR program and run on your variants with many more functionality, such as filtering against dbSNP130 or dbSNP 131.
tumorim is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO