1k genomes and VCF: slow processing of individual genome

simon_seq

Member

Join Date: Aug 2012

Posts: 13
- Share
- Tweet
#1

1k genomes and VCF: slow processing of individual genome

02-10-2014, 04:11 AM

Hi all,
I'm currently having troubles extracting a single genome's VCF file from the 1000 genomes repository. Theoretically using tabix and vcftools it's working, but it just appears to be incredibly slow. For instance I've started loading ca. 14 hours ago and currently have only got VCF files for 4 chromosomes. This is my code:

for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 X
do
tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr${i}.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz ${i} | vcf-subset -a -e -c HG00096 | bgzip -c > HG00096/HG00096_chr${i}.vcf.gz
done

I'd be happy for any advise on this. I've also as a test downloaded a full chromosome for all samples VCF file and tried if this accelerated the process, but haven't noticed a real difference.

Thanks
Simon
Tags: 1000 genomes, vcf vcf-tools
TiborNagy

Senior Member

Join Date: Mar 2010

Posts: 329
- Share
- Tweet
#2

02-10-2014, 05:07 AM

Your script is looks OK. Try to use multiple machines to speed up the process.
Comment
simon_seq

Member

Join Date: Aug 2012

Posts: 13
- Share
- Tweet
#3

02-10-2014, 07:24 AM

Hi TiborNagy,
Thanks for your quick reply. This is good to know - in this case I will indeed move on to the cluster...

Best
Simon
Comment

Previous template Next

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad