SEQanswers (
-   Bioinformatics (
-   -   Splitting multiple sample vcf file (

yl01 11-03-2012 03:33 PM

Splitting multiple sample vcf file
A question appears when working with vcf file produced by UnifiedGenotyper on multiple samples. It is of course better to work with single sample vcf file so I was trying to split the multiple sampel vcf file. I used vcf-subset of vcftools but the problem is that the splitted single sample vcf file still has homozygous reference calls. Does anyone has a easy solution to this problem?

Gig77 11-04-2012 07:58 AM

Have you tried the -e parameter? I suppose -a should also be used to get rid of alternate alleles not found in the subset.

Here is the usage of vcf-subset:

Usage: vcf-subset [OPTIONS] in.vcf.gz > out.vcf
-a, --trim-alt-alleles Remove alternate alleles if not found in the subset
-c, --columns <string> File or comma-separated list of columns to keep in the vcf file. If file, one column per row
-e, --exclude-ref Exclude rows not containing variants.
-f, --force Proceed anyway even if VCF does not contain some of the samples.
-p, --private Print only rows where only the subset columns carry an alternate allele.
-r, --replace-with-ref Replace the excluded types with reference allele instead of dot.
-t, --type <list> Comma-separated list of variant types to include: SNPs,indels.
-u, --keep-uncalled Do not exclude rows without calls.
-h, -?, --help This help message.

yl01 11-05-2012 03:32 AM

Thanks Gig77 for the answer! I tested with -e and it gave what I wanted.

All times are GMT -8. The time now is 07:08 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.