Hi all,
I'm currently having troubles extracting a single genome's VCF file from the 1000 genomes repository. Theoretically using tabix and vcftools it's working, but it just appears to be incredibly slow. For instance I've started loading ca. 14 hours ago and currently have only got VCF files for 4 chromosomes. This is my code:
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 X
do
tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr${i}.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz ${i} | vcf-subset -a -e -c HG00096 | bgzip -c > HG00096/HG00096_chr${i}.vcf.gz
done
I'd be happy for any advise on this. I've also as a test downloaded a full chromosome for all samples VCF file and tried if this accelerated the process, but haven't noticed a real difference.
Thanks
Simon
I'm currently having troubles extracting a single genome's VCF file from the 1000 genomes repository. Theoretically using tabix and vcftools it's working, but it just appears to be incredibly slow. For instance I've started loading ca. 14 hours ago and currently have only got VCF files for 4 chromosomes. This is my code:
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 X
do
tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr${i}.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz ${i} | vcf-subset -a -e -c HG00096 | bgzip -c > HG00096/HG00096_chr${i}.vcf.gz
done
I'd be happy for any advise on this. I've also as a test downloaded a full chromosome for all samples VCF file and tried if this accelerated the process, but haven't noticed a real difference.
Thanks
Simon
Comment