SEQanswers (
-   Bioinformatics (
-   -   How to view only the variants that are present in multiple VCF files? (

ronton 10-10-2014 02:29 PM

How to view only the variants that are present in multiple VCF files?
Suppose I have 10 .vcf files and I want to generate a .vcf file that only contains the common variants that are present in all of the 10 files. Can I use GATK CombineVariants followed by SelectVariants intersection to achieve this? Is there another way?

Thank you

"If you want to extract just the records in common between two VCFs, you would first run CombineVariants on the two files to generate a single VCF and then run SelectVariants to extract the common records with -select 'set == "Intersection"', as worked out in the detailed example in the documentation guide."

GenoMax 10-11-2014 05:02 AM


ronton 10-15-2014 03:18 PM

I tried vcf-isec and it did not seem to work.

I was eventually able to install and setup vcftools, including sorting, indexing, and compressing the vcf files with tabix and bgzip.

The vcf-isec command gave a warning that column names do not match (i.e. 1-Normal and 1-Tumor). The command ran, but the output vcf file was 28 bytes of unreadable characters. Each of the 11 input files are around 80kb.

These are vcf files generated using MuTect (for comparing tumor to normal samples).

I am not sure if vcf-isec will work with MuTect vcf files or if there is something I am doing wrong. Maybe I can process the files ahead of time to get them to work.

The idea is that MuTect gives a list of somatic mutations in cancer samples by comparing to matched normal samples. What I am trying to do is take several MuTect vcf files, and see which variants are present in multiple vcf files or the 'intersection.'

All times are GMT -8. The time now is 04:41 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.