SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Allele frequency calculation from cgh data for wes data April21 Illumina/Solexa 1 07-18-2013 04:27 PM
Allele frequency calculation in SNP calling Rachelly Bioinformatics 5 05-14-2012 11:07 AM
how to fetch the snp allele frequency? dzmtnvmt Bioinformatics 3 06-21-2011 03:44 AM
SNP Allele-Frequency Determination in Pooled DNA Samples using solexa baohua100 Bioinformatics 1 07-19-2008 12:21 AM

Reply
 
Thread Tools
Old 03-06-2019, 09:52 AM   #1
lre1234
Senior Member
 
Location: US

Join Date: Aug 2011
Posts: 105
Default SNP Allele frequency data

Hi All,
I have a list of a few thousands SNP that I am trying to get the population allele frequency data for. Ideally, I would like the frequencies for all 26 of the populations within the 1000 genomes for these. I can't seem to find the information anywhere. Does anyone know where I may be able to get this?
Thanks for your help
lre1234 is offline   Reply With Quote
Old 03-07-2019, 12:04 AM   #2
Emily_Ensembl
Member
 
Location: Cambridge UK

Join Date: Dec 2013
Posts: 12
Default

You could try the Ensembl REST API variation POST endpoint. You'd need to chunk your list of variants into 200s but it would be relatively easy.
Emily_Ensembl is offline   Reply With Quote
Old 03-08-2019, 04:59 AM   #3
lre1234
Senior Member
 
Location: US

Join Date: Aug 2011
Posts: 105
Default

Thanks Emily_Ensembl.
This API does work, but I am still looking to get the allele frequencies for all 26 subpopulations of the 1000 Genomes. This API seems to only give the main continental ancestries. I know that I can probably download all of the genotype information and calculate these myself, but I would have thought there would be a simple way too download the information from somewhere.
lre1234 is offline   Reply With Quote
Old 03-08-2019, 06:08 AM   #4
Emily_Ensembl
Member
 
Location: Cambridge UK

Join Date: Dec 2013
Posts: 12
Default

pops=1 should get you all 26 populations
Emily_Ensembl is offline   Reply With Quote
Old 03-08-2019, 07:03 AM   #5
lre1234
Senior Member
 
Location: US

Join Date: Aug 2011
Posts: 105
Default

Then I must be doing something wrong (highly probable ). I'm trying it with the wget example:

Code:
wget -q --header='Content-type:application/json' --header='Accept:application/json' \
--post-data='{ "ids" : ["rs56116432" ] }' \
'http://rest.ensembl.org/variation/homo_sapiens' -O temp.out pops=1
And I get:

Quote:
{"rs56116432":{"ambiguity":"Y","ancestral_allele":null,"minor_allele":"T","mappings":[{"allele_string":"C/T","start":133256042,"coord_system":"chromosome","assembly_name":"GRCh38","end":133256042,"strand":1,"seq_region_name":"9","location":"9:133256042-133256042"},{"strand":1,"seq_region_name":"CHR_HG2030_PATCH","end":133256189,"assembly_name":"GRCh38","location":"CHR_HG2030_PATCH:133256189-133256189","allele_string":"C/T","coord_system":"chromosome","start":133256189}],"MAF":0.00259585,"most_severe_consequence":"missense_variant","synonyms":["NM_020469.2:c.689G>A","NP_065202.2.Gly230Asp"],"evidence":["Frequency","1000Genomes","ESP","ExAC","TOPMed","gnomAD"],"source":"Variants (including SNPs and indels) imported from dbSNP","var_class":"SNP","name":"rs56116432"}}
Unless I'm missing something in the output, all I see is the total MAF and not broken down by populations. I've also tried this with other SNPs and get similar results.
lre1234 is offline   Reply With Quote
Old 03-08-2019, 07:17 AM   #6
Emily_Ensembl
Member
 
Location: Cambridge UK

Join Date: Dec 2013
Posts: 12
Default

Try adding pops=1 to the URL, like:

Code:
wget -q --header='Content-type:application/json' --header='Accept:application/json' --post-data='{ "ids" : ["rs56116432" ] }' 'http://rest.ensembl.org/variation/homo_sapiens?pops=1' -O temp.out
Emily_Ensembl is offline   Reply With Quote
Old 03-08-2019, 08:40 AM   #7
lre1234
Senior Member
 
Location: US

Join Date: Aug 2011
Posts: 105
Default

That works. Thanks. I have a couple thousand to get the frequency for, so I can write some sort of wrap to let it go on and do them all.

One last question, is there a way to limit the output to just the 1000 genomes populations only?
lre1234 is offline   Reply With Quote
Old 03-12-2019, 12:43 AM   #8
Emily_Ensembl
Member
 
Location: Cambridge UK

Join Date: Dec 2013
Posts: 12
Default

No, you would need to parse your query response to limit the data in that way.
Emily_Ensembl is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:08 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO