SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
vcf files contain variants from a narrow range of comps from assembly atelford RNA Sequencing 0 02-07-2014 07:47 AM
multiple vcf files to one multisampled vcf file Jetse Bioinformatics 2 06-27-2013 06:34 AM
Viewing multiple VCF files in Artemis coldturkey Bioinformatics 0 02-01-2012 02:45 AM
Calling multiple BAM files for SNPs and vcf newbietonextgen Bioinformatics 3 04-19-2011 12:29 PM
view .wig files alperyilmaz Bioinformatics 7 02-16-2009 09:13 AM

Reply
 
Thread Tools
Old 10-10-2014, 03:29 PM   #1
ronton
Member
 
Location: US

Join Date: Jun 2014
Posts: 34
Default How to view only the variants that are present in multiple VCF files?

Suppose I have 10 .vcf files and I want to generate a .vcf file that only contains the common variants that are present in all of the 10 files. Can I use GATK CombineVariants followed by SelectVariants intersection to achieve this? Is there another way?

Thank you

"If you want to extract just the records in common between two VCFs, you would first run CombineVariants on the two files to generate a single VCF and then run SelectVariants to extract the common records with -select 'set == "Intersection"', as worked out in the detailed example in the documentation guide."

https://www.broadinstitute.org/gatk/...neVariants.php
ronton is offline   Reply With Quote
Old 10-11-2014, 06:02 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,852
Default

VCFtools: http://vcftools.sourceforge.net/perl....html#vcf-isec
GenoMax is offline   Reply With Quote
Old 10-15-2014, 04:18 PM   #3
ronton
Member
 
Location: US

Join Date: Jun 2014
Posts: 34
Default

I tried vcf-isec and it did not seem to work.

I was eventually able to install and setup vcftools, including sorting, indexing, and compressing the vcf files with tabix and bgzip.

The vcf-isec command gave a warning that column names do not match (i.e. 1-Normal and 1-Tumor). The command ran, but the output vcf file was 28 bytes of unreadable characters. Each of the 11 input files are around 80kb.

These are vcf files generated using MuTect (for comparing tumor to normal samples).

I am not sure if vcf-isec will work with MuTect vcf files or if there is something I am doing wrong. Maybe I can process the files ahead of time to get them to work.

The idea is that MuTect gives a list of somatic mutations in cancer samples by comparing to matched normal samples. What I am trying to do is take several MuTect vcf files, and see which variants are present in multiple vcf files or the 'intersection.'
ronton is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO