Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • need 1000 genomes data for just one gene

    Dear all,

    I am a bit stuck trying to access the 1000 genomes data. I just have one candidate gene that I have sequenced and identified some novel SNPs. I want to check if these SNP have been identified already in the 1000 genomes data.

    Is the data shown on the browser just from the pilot 1 release? If so is there a way of viewing the allele frequencies in the pilot 2 for identified SNPs without downloading raw data?

    I sure this would be very quick, I'm afraid I am a bit lost looking at the web site.

    Thanks for your help,
    Michelle

  • #2
    You could try using the SeattleSeq database. It takes as input a list of variant coordinates, and reports back if some of them were found in (some freeze of) 1000genomes results.
    --
    bioinfosm

    Comment


    • #3
      samtools view -bo sample1.bam ftp://ftp-trace.nih.gov/....../sample1.bam 10:10,000-20,000

      Using a URL as the file name, you can "download" the alignment around a target gene without downloading the full alignments, which amount to >20TB for the main project.

      It would be better if someone is willing to write a web service.

      Comment


      • #4
        Is this data available from the UCSC table browser? In the SNPs track of the Variations and Repeats group, i can see a "valid does/does not include by-1000genomes" option when i click on "Create Filter".

        Comment


        • #5
          Hello,

          I want to do the same like michelle.lupton with SNPs I found. I used the samtools and the view comand.
          My problem is that I need an automated function to check all data from all individuals, becuase I can't type the path for every individual in samtools.
          Can i give samtools a list of paths?

          Thanks!

          Comment


          • #6
            If you want SNPs, go to the 1KG ftp. There are VCFs for each population. They are SNP calls.

            Comment


            • #7
              I've recently built a very condensed data structure encoding known single nucleotide variations from a number of sources, and then wrote a very simple tool for querying this and getting answers to the question: has this variant been seen already (and if so, in which personal genome or population).

              You're welcome to give it a try:


              Suggestions for improvement are welcome.

              Comment


              • #8
                Originally posted by Gustavo View Post
                I've recently built a very condensed data structure encoding known single nucleotide variations from a number of sources, and then wrote a very simple tool for querying this and getting answers to the question: has this variant been seen already (and if so, in which personal genome or population).

                You're welcome to give it a try:


                Suggestions for improvement are welcome.
                Your tool looks very useful. Now dbSNP131 is out. Is it possible to update your site with the new version of dbSNP? Is there an easy way to do a batch query or install it locally?

                thanks

                Comment


                • #9
                  Originally posted by Gustavo View Post
                  I've recently built a very condensed data structure encoding known single nucleotide variations from a number of sources, and then wrote a very simple tool for querying this and getting answers to the question: has this variant been seen already (and if so, in which personal genome or population).

                  You're welcome to give it a try:
                  That looks very interesting. I see that the webpage says there are 26,520,897 variants. Over at Hapmart there are 26,291,751. Is your database a superset of Hapmart or are there extra SNPs to be found in HapMart?

                  Comment


                  • #10
                    Originally posted by xguo View Post
                    Is it possible to update your site with the new version of dbSNP?
                    I just updated it to v. 131 of dbSNP.
                    The updated data structure includes ~29.4 million variants.

                    Originally posted by BetterPrimate View Post
                    Is your database a superset of Hapmart or are there extra SNPs to be found in HapMart?
                    I haven't compared it to HapMart's data, sorry.

                    Comment


                    • #11
                      Originally posted by michelle.lupton View Post
                      Dear all,

                      I am a bit stuck trying to access the 1000 genomes data. I just have one candidate gene that I have sequenced and identified some novel SNPs. I want to check if these SNP have been identified already in the 1000 genomes data.

                      Is the data shown on the browser just from the pilot 1 release? If so is there a way of viewing the allele frequencies in the pilot 2 for identified SNPs without downloading raw data?

                      I sure this would be very quick, I'm afraid I am a bit lost looking at the web site.

                      Thanks for your help,
                      Michelle
                      What data are you after?

                      If you want the snp calls you should look at the recent release made in July

                      ftp://ftp.1000genomes.ebi.ac.uk/vol1...lease/2010_07/

                      The readme explains what this release contains

                      ftp://ftp.1000genomes.ebi.ac.uk/vol1...010_07_release

                      Once you download the vcf files which contain the variant calls then



                      should help you to extract the data you need from these files

                      If you only want variants for a specific region of the genome you don't even have to download the whole file you could use the tabix program which comes from https://sourceforge.net/projects/samtools/files/ to download a subsection of the files like this

                      tabix ftp://ftp.1000genomes.ebi.ac.uk/vol1...notypes.vcf.gz 1:233411980:245804116

                      Comment


                      • #12
                        download the three 1000G files from http://www.openbioinformatics.org/an..._download.html

                        Then suppose you gene is in region chr1:1000-2000, just do

                        perl -ne 'm/(\d+)\t(\d+)/ and $1 eq "1" and $2>=1000 and $2<=2000 and print' < hg18_CEU.sites.2010_03.txt

                        You'll get all variants in CEU population. Do the same for YRI/ASN.

                        Of course you can also download the whole ANNOVAR program and run on your variants with many more functionality, such as filtering against dbSNP130 or dbSNP 131.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        9 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        49 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        67 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X