View Single Post
Old 05-09-2011, 04:42 AM   #1
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default New Resources for 1000 Genomes

New Resources for 1000 Genomes

General Info

As well as posting new announcements on the front page of http://www.1000genomes.org, we have both rss http://www.1000genomes.org/announcements/rss.xml and twitter http://twitter.com/1000genomes twitter

You can also subscribe to and announcements list we have setup. http://listserver.1000genomes.org/ma...o/1000announce 1000announce@1000genomes.org

We have started an FAQ http://www.1000genomes.org/faq to provide help as to where to find certain data sets which surround the 1000 genomes project and answers to other similar questions.

Data Search

You can now search both our website and our ftp site.

To search the main website you can use the search box which appears in the top right hand corner of each page on http://www.1000genomes.org.

Our ftp search is linked to from the top menu bar at the top of each page. For our ftp site we have an index on the ftp site called the ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree which is updated every night to reflect the contents of the ftp site. http://www.1000genomes.org/ftpsearch

The search itself will look for strings in the names of files and directories on the ftp site. This means the search can be used to find all vcf files or files associated with a particular release date or particular individual.

The search options will allow you to include md5s in the output and have the ftp paths point to either the NCBI or the EBI ftp site. Due to the volume of results which would be returned the search by default excludes fastq and bam files but you can return these results to the search. Currently the search will only return the first 1000 results due to the large volume of files on the ftp site.

Accessibility

Many of our releases contain very large files which can be challenging to download in their entirety. Both bam and vcf files have indexes which allow subsections to be downloaded using samtools or tabix respectively. There are descriptions of how to do this in our faq. We also now have a web based tool within our Ensembl browser which allows you to request a 10KB subsection of these files.

The Data Slicer (http://browser.1000genomes.org/tools.html) needs the URL of a indexed bam or vcf file and then will present a view of this file and a bam or vcf file to download. The data slicer can be accessed from the tool link at in the top right hand of all browser pages. It should work for any remotely accessible tabix indexed vcf file. It will work for any indexed bam over http but may only work for ftp bams within the EBI

You can also upload data from bam or vcf files from our ftp site. To do you you need to click on the mange your data link on the left hand menu of a page. This is best done from Location view. The section of the menu you need to click on is labeled attach remote file. Only bam files from the EBI ftp site will be visible but any remotely accessible vcf which is accompanied by a tabix index. Once your file is loaded you should be able to see the snps or aligned reads displayed and also share these links with others. This is described with screenshots in our Ensembl tutorial http://www.1000genomes.org/sites/100...l_20110506.doc

The browser also has a variant effect predictor tool which will take in up to 750 snps and indels in VCF format or an Ensembl specific format. This tool provides functional consequences with respect to the current gene and regulatory annotation which include SIFT and PolyPhen for any non synonymous snps. http://browser.1000genomes.org/tools.html. You can also download

If you have any questions about these new features or any other aspects of the project please email info@1000genomes.org
laura is offline   Reply With Quote