Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 Genomes Data/ Exon targetted

    Hi,

    I have a question concerning the 1000 Genomes Data. On the ftp they have lowcoverage and exontargetted data.
    I assume that the exontagetted files only contain sequence information of the exons, but with a higher coverage. Is that correct?
    But why is the filesize between the individuals (exon tagetted, same chromosome) so different in size.

    Thanks

  • #2
    hi,

    I think the 1000 genomes project have enriched and sequenced only 1000 genes in the pilot data. I am trying to find out which 1000 genes they have enriched, but this simple piece of data is frustratingly hard to find.

    Can anyone else help?

    Comment


    • #3
      ftp://ftp.1000genomes.ebi.ac.uk/vol1...cal/reference/

      There is a bed file of the targeted regions and a gene list. Both labeled P3.

      Comment


      • #4
        There are three pilots

        Also, There are three pilot projects.

        P1 is low coverage-whole genome
        P2 is sequencing of parent/ child trios
        P3 is a sequence capture of coding exons of 1000 genes

        1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!

        Comment


        • #5
          OOhh,

          Wonderful. That's just what I wanted.

          Thanks Adamdeluca

          Comment


          • #6
            OK summarising...

            Pilot1 = 2 - 4X coverage 180 samples Whole-genome sequencing
            Pilot2 = 20-60X coverage 6 samples(2 trios) Whole-genome sequencing
            Pilot3 = 50X coverage 900 samples 1000 genes seqenced
            Main project= 4X coverage 2000 samples Whole genome sequence.

            But the FTP data is most unwieldly with separate VCF files per population listing every genotype for every individual. Which raises a question:

            Is there somewhere that summarises the allele frequencies for SNPs across all the 1KG pilots and combines the populations?
            e.g. In pilot3 data for the CEU population we can find SNP rs61733845 has 122 alleles called but if you look up that SNP in dbSNP there is no frequency data.
            Last edited by BetterPrimate; 08-03-2010, 11:41 PM.

            Comment


            • #7
              @BetterPrimate

              I was also looking for an overall VCF file, but I could only find genotypes per population per pilot study.

              An overall files for the whole project would be fine.

              Comment


              • #8
                At the moment the project ftp doesnt provide overall files for all the variants calls

                You can get the vcf files for each sub population used in each pilot from ftp://ftp.1000genomes.ebi.ac.uk/vol1...lease/2010_07/

                low coverage represents 180 individuals sequencings to 2-4x
                trios represents 2 family trios sequenced to 30x+
                exon represents ~700 individuals sequence for 1000 genes

                You could use the vcftools sourceforge package to get your frequencies for the whole set

                The perl code that is part of this package will merge vcf files for you



                and the c++ code will provide frequency reports

                Comment


                • #9
                  Hello,

                  how can I access the data from the 2000 Individuals sequenced with a 4 x coverage.

                  Thanks

                  Comment


                  • #10
                    Not all 2500 individuals have been sequenced yet.

                    So far we have sequence data for 653 samples, 552 have more than 10GB of sequence data available in fastq format

                    We have alignments for 539 individuals in bam format

                    You can get all this data from our ftp site

                    Our website explains how our ftp site is structured

                    1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!

                    Comment


                    • #11
                      Did you also call variants from this 653 samples?

                      Btw. I have a question about you called variants in the pilot 1 study. Did I undestand it right, that you pooled all the low coverage sequence data and called the variants from this new data set? Don't you loose very rare variants by doing this?

                      Comment


                      • #12
                        There aren't any variants released yet on the main project data.

                        We had a release of variants on the pilot data in july which you can find here

                        ftp://ftp.1000genomes.ebi.ac.uk/vol1...lease/2010_07/

                        As far as the variant calling goes as most of the low coverage individuals only have between 2 and 4x coverage there is insufficient data to call most variants just from one individual to the pooling of data gains us power. The low coverage approach is less powerful for rare variants

                        Comment


                        • #13
                          Can you please tell me how many individuals are included in the last release?

                          So with this approach you are only able to call common variants? But isn't it a goal of the project to detect variants with a frequency of less than 1 %?

                          Comment


                          • #14
                            If you look at the alignment index and sequence index files on the ftp site you can see how many individuals are in each release.

                            ftp://ftp.1000genomes.ebi.ac.uk/vol1....sequence_data
                            ftp://ftp.1000genomes.ebi.ac.uk/vol1...alignment_data

                            With 2500 individuals we can get 95% of 1%MAF alleles in the accessible genome. We will find some variants with lower MAF but we won't find all of them.

                            This project is designed to find all shared variation within the population rather very rare variants

                            Another phase of the project is going to do exome sequencing of the 2500 individuals and these will hopefully get variants down to 0.1% in these regions as we will have higher coverage of those regions

                            Comment


                            • #15
                              Allele frequencies in subpopulations 628 individuals

                              Hi,

                              I am aware that there is a vcf file "ALL.2of4intersection.20100804.sites.vcf.gz" on the ftp site where you can retrieve allele frequency for SNPs from the low coverage data of 628 individuals. This is pooled across all subpopulations.

                              Is there a way I can get the allele frequencies for the same SNPs in subpopulations?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X