SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
vcf file - filter based on DP4 (not total) and removal alternative alleles clarissaboschi Bioinformatics 4 03-03-2015 03:15 AM
How to filter out SNPs with Minor allele frequency less than 5% in VCF file jdpr_100 Bioinformatics 1 10-01-2014 02:14 PM
what is indelFS and indelQD in the filter column of VCF file seraphin De novo discovery 0 07-05-2013 07:46 AM
How and what to use to filter out any known SNPs (with rs#) from the vcf file? BhariD Bioinformatics 0 05-16-2013 01:55 PM
rare variant missing in the vcf file crang Bioinformatics 10 04-04-2013 08:27 AM

Reply
 
Thread Tools
Old 05-09-2018, 11:41 AM   #1
auzzie599
Junior Member
 
Location: Pennsylvania

Join Date: Sep 2017
Posts: 6
Default Filter VCF file to have just one variant per contig

Hello, I have a multi-sample VCF file, with about 4000 contigs/loci represented. Many of these contigs contain multiple SNPs, meaning the SNPs are linked to one another in this case. However, some of my downstream analyses do not handle linked markers well, so I would like to be able to filter my VCF file so that I am left with a single SNP per contig. If there were a way to randomly select one SNP per contig, that would be great. Any ideas about how to achieve this? As far as I can tell, VCFtools does not allow this.
auzzie599 is offline   Reply With Quote
Old 05-09-2018, 12:39 PM   #2
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 450
Default

vcftools does have a --thin option, and if you set it to the maximum contig size then there will be only 1 SNP per contig (--thin 100000, for example).

It may select the first SNP in the contig, though.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 05-09-2018, 01:51 PM   #3
auzzie599
Junior Member
 
Location: Pennsylvania

Join Date: Sep 2017
Posts: 6
Default

Great, thanks SNPsaurus, this is working nicely. I would still like to do a 'random SNP per locus' option, and compare results to what I'm getting with this 'first SNP per locus' method, but this might require some more advanced programming skills.
auzzie599 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:13 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO