Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNP Allele frequency data

    Hi All,
    I have a list of a few thousands SNP that I am trying to get the population allele frequency data for. Ideally, I would like the frequencies for all 26 of the populations within the 1000 genomes for these. I can't seem to find the information anywhere. Does anyone know where I may be able to get this?
    Thanks for your help

  • #2
    You could try the Ensembl REST API variation POST endpoint. You'd need to chunk your list of variants into 200s but it would be relatively easy.

    Comment


    • #3
      Thanks Emily_Ensembl.
      This API does work, but I am still looking to get the allele frequencies for all 26 subpopulations of the 1000 Genomes. This API seems to only give the main continental ancestries. I know that I can probably download all of the genotype information and calculate these myself, but I would have thought there would be a simple way too download the information from somewhere.

      Comment


      • #4
        pops=1 should get you all 26 populations

        Comment


        • #5
          Then I must be doing something wrong (quite likely ) . I'm trying it with the wget example:

          Code:
          wget -q --header='Content-type:application/json' --header='Accept:application/json' \
          --post-data='{ "ids" : ["rs56116432" ] }' \
          'http://rest.ensembl.org/variation/homo_sapiens' -O temp.out pops=1
          And only get this as the output:

          {"rs56116432":{"ambiguity":"Y","ancestral_allele":null,"minor_allele":"T","mappings":[{"allele_string":"C/T","start":133256042,"coord_system":"chromosome","assembly_name":"GRCh38","end":133256042,"strand":1,"seq_region_name":"9","location":"9:133256042-133256042"},{"strand":1,"seq_region_name":"CHR_HG2030_PATCH","end":133256189,"assembly_name":"GRCh38","location":"CHR_HG2030_PATCH:133256189-133256189","allele_string":"C/T","coord_system":"chromosome","start":133256189}],"MAF":0.00259585,"most_severe_consequence":"missense_variant","synonyms":["NM_020469.2:c.689G>A","NP_065202.2.Gly230Asp"],"evidence":["Frequency","1000Genomes","ESP","ExAC","TOPMed","gnomAD"],"source":"Variants (including SNPs and indels) imported from dbSNP","var_class":"SNP","name":"rs56116432"}}
          I've also tried it with other SNPs just to see if it was a SNP specific thing, but get similar outputs.

          Comment


          • #6
            Then I must be doing something wrong (highly probable ). I'm trying it with the wget example:

            Code:
            wget -q --header='Content-type:application/json' --header='Accept:application/json' \
            --post-data='{ "ids" : ["rs56116432" ] }' \
            'http://rest.ensembl.org/variation/homo_sapiens' -O temp.out pops=1
            And I get:

            {"rs56116432":{"ambiguity":"Y","ancestral_allele":null,"minor_allele":"T","mappings":[{"allele_string":"C/T","start":133256042,"coord_system":"chromosome","assembly_name":"GRCh38","end":133256042,"strand":1,"seq_region_name":"9","location":"9:133256042-133256042"},{"strand":1,"seq_region_name":"CHR_HG2030_PATCH","end":133256189,"assembly_name":"GRCh38","location":"CHR_HG2030_PATCH:133256189-133256189","allele_string":"C/T","coord_system":"chromosome","start":133256189}],"MAF":0.00259585,"most_severe_consequence":"missense_variant","synonyms":["NM_020469.2:c.689G>A","NP_065202.2.Gly230Asp"],"evidence":["Frequency","1000Genomes","ESP","ExAC","TOPMed","gnomAD"],"source":"Variants (including SNPs and indels) imported from dbSNP","var_class":"SNP","name":"rs56116432"}}
            Unless I'm missing something in the output, all I see is the total MAF and not broken down by populations. I've also tried this with other SNPs and get similar results.

            Comment


            • #7
              Try adding pops=1 to the URL, like:

              Code:
              wget -q --header='Content-type:application/json' --header='Accept:application/json' --post-data='{ "ids" : ["rs56116432" ] }' 'http://rest.ensembl.org/variation/homo_sapiens?pops=1' -O temp.out

              Comment


              • #8
                That works. Thanks. I have a couple thousand to get the frequency for, so I can write some sort of wrap to let it go on and do them all.

                One last question, is there a way to limit the output to just the 1000 genomes populations only?

                Comment


                • #9
                  No, you would need to parse your query response to limit the data in that way.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  23 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X