Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • michelle.lupton
    Junior Member
    • Jun 2010
    • 5

    need 1000 genomes data for just one gene

    Dear all,

    I am a bit stuck trying to access the 1000 genomes data. I just have one candidate gene that I have sequenced and identified some novel SNPs. I want to check if these SNP have been identified already in the 1000 genomes data.

    Is the data shown on the browser just from the pilot 1 release? If so is there a way of viewing the allele frequencies in the pilot 2 for identified SNPs without downloading raw data?

    I sure this would be very quick, I'm afraid I am a bit lost looking at the web site.

    Thanks for your help,
    Michelle
  • bioinfosm
    Senior Member
    • Jan 2008
    • 483

    #2
    You could try using the SeattleSeq database. It takes as input a list of variant coordinates, and reports back if some of them were found in (some freeze of) 1000genomes results.
    --
    bioinfosm

    Comment

    • lh3
      Senior Member
      • Feb 2008
      • 686

      #3
      samtools view -bo sample1.bam ftp://ftp-trace.nih.gov/....../sample1.bam 10:10,000-20,000

      Using a URL as the file name, you can "download" the alignment around a target gene without downloading the full alignments, which amount to >20TB for the main project.

      It would be better if someone is willing to write a web service.

      Comment

      • steven
        Senior Member
        • Aug 2009
        • 269

        #4
        Is this data available from the UCSC table browser? In the SNPs track of the Variations and Repeats group, i can see a "valid does/does not include by-1000genomes" option when i click on "Create Filter".

        Comment

        • Firebird
          Member
          • Jun 2010
          • 18

          #5
          Hello,

          I want to do the same like michelle.lupton with SNPs I found. I used the samtools and the view comand.
          My problem is that I need an automated function to check all data from all individuals, becuase I can't type the path for every individual in samtools.
          Can i give samtools a list of paths?

          Thanks!

          Comment

          • lh3
            Senior Member
            • Feb 2008
            • 686

            #6
            If you want SNPs, go to the 1KG ftp. There are VCFs for each population. They are SNP calls.

            Comment

            • gglusman
              Occasional visitor
              • Jun 2010
              • 4

              #7
              I've recently built a very condensed data structure encoding known single nucleotide variations from a number of sources, and then wrote a very simple tool for querying this and getting answers to the question: has this variant been seen already (and if so, in which personal genome or population).

              You're welcome to give it a try:


              Suggestions for improvement are welcome.

              Comment

              • xguo
                Member
                • Jul 2008
                • 48

                #8
                Originally posted by Gustavo View Post
                I've recently built a very condensed data structure encoding known single nucleotide variations from a number of sources, and then wrote a very simple tool for querying this and getting answers to the question: has this variant been seen already (and if so, in which personal genome or population).

                You're welcome to give it a try:


                Suggestions for improvement are welcome.
                Your tool looks very useful. Now dbSNP131 is out. Is it possible to update your site with the new version of dbSNP? Is there an easy way to do a batch query or install it locally?

                thanks

                Comment

                • BetterPrimate
                  Member
                  • May 2010
                  • 15

                  #9
                  Originally posted by Gustavo View Post
                  I've recently built a very condensed data structure encoding known single nucleotide variations from a number of sources, and then wrote a very simple tool for querying this and getting answers to the question: has this variant been seen already (and if so, in which personal genome or population).

                  You're welcome to give it a try:
                  That looks very interesting. I see that the webpage says there are 26,520,897 variants. Over at Hapmart there are 26,291,751. Is your database a superset of Hapmart or are there extra SNPs to be found in HapMart?

                  Comment

                  • gglusman
                    Occasional visitor
                    • Jun 2010
                    • 4

                    #10
                    Originally posted by xguo View Post
                    Is it possible to update your site with the new version of dbSNP?
                    I just updated it to v. 131 of dbSNP.
                    The updated data structure includes ~29.4 million variants.

                    Originally posted by BetterPrimate View Post
                    Is your database a superset of Hapmart or are there extra SNPs to be found in HapMart?
                    I haven't compared it to HapMart's data, sorry.

                    Comment

                    • laura
                      Senior Member
                      • Sep 2008
                      • 151

                      #11
                      Originally posted by michelle.lupton View Post
                      Dear all,

                      I am a bit stuck trying to access the 1000 genomes data. I just have one candidate gene that I have sequenced and identified some novel SNPs. I want to check if these SNP have been identified already in the 1000 genomes data.

                      Is the data shown on the browser just from the pilot 1 release? If so is there a way of viewing the allele frequencies in the pilot 2 for identified SNPs without downloading raw data?

                      I sure this would be very quick, I'm afraid I am a bit lost looking at the web site.

                      Thanks for your help,
                      Michelle
                      What data are you after?

                      If you want the snp calls you should look at the recent release made in July

                      ftp://ftp.1000genomes.ebi.ac.uk/vol1...lease/2010_07/

                      The readme explains what this release contains

                      ftp://ftp.1000genomes.ebi.ac.uk/vol1...010_07_release

                      Once you download the vcf files which contain the variant calls then



                      should help you to extract the data you need from these files

                      If you only want variants for a specific region of the genome you don't even have to download the whole file you could use the tabix program which comes from https://sourceforge.net/projects/samtools/files/ to download a subsection of the files like this

                      tabix ftp://ftp.1000genomes.ebi.ac.uk/vol1...notypes.vcf.gz 1:233411980:245804116

                      Comment

                      • tumorim
                        Junior Member
                        • Aug 2010
                        • 2

                        #12
                        download the three 1000G files from http://www.openbioinformatics.org/an..._download.html

                        Then suppose you gene is in region chr1:1000-2000, just do

                        perl -ne 'm/(\d+)\t(\d+)/ and $1 eq "1" and $2>=1000 and $2<=2000 and print' < hg18_CEU.sites.2010_03.txt

                        You'll get all variants in CEU population. Do the same for YRI/ASN.

                        Of course you can also download the whole ANNOVAR program and run on your variants with many more functionality, such as filtering against dbSNP130 or dbSNP 131.

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          Yesterday, 10:05 AM
                        • SEQadmin2
                          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                          by SEQadmin2


                          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                          Introduction

                          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                          05-22-2026, 06:42 AM
                        • SEQadmin2
                          Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                          by SEQadmin2

                          Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                          Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                          05-06-2026, 09:04 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Yesterday, 12:03 PM
                        0 responses
                        19 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, Yesterday, 11:40 AM
                        0 responses
                        14 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 05-28-2026, 11:40 AM
                        0 responses
                        29 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 05-26-2026, 10:12 AM
                        0 responses
                        31 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...