SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
MiSeq gDNA reads still fail "Kmer content" and "per base seq content" after trimming" ysnapus Illumina/Solexa 4 11-12-2014 07:25 AM
DEXSeq error in estimateDispersions: match.arg(start.method, c("log(y)", "mean")) fpadilla Bioinformatics 14 07-03-2013 02:11 PM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 07:55 AM
"Systems biology and administration" & "Genome generation: no engineering allowed" seb567 Bioinformatics 0 05-25-2010 12:19 PM
Korean "Hanwoo" bovine genome and whole SNPs announced CLC bio Vendor Forum 0 01-28-2010 12:53 PM

Reply
 
Thread Tools
Old 05-27-2015, 06:07 AM   #1
Rhewter
Member
 
Location: Goiás, Brazil

Join Date: Sep 2014
Posts: 10
Default SNPs that are at least "N" bp away from each other

Hello,

I got a set of SNPs in a vcf file and would like to discard SNPs many nearby. The idea is to take only SNPs that are at least 200 bp away from each other. How can I do this?

Thanks in advance.
Rhewter is offline   Reply With Quote
Old 05-27-2015, 06:13 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

You could do this with awk, assuming the VCF file is sorted in a useful way. The general idea is to keep track of two variables: the chromosome of the last entry and the position of the last entry. If the chromosome of an entry and the previous are different or their distances are above 200bp then print $0 (the variables should be updated in any case). Have a go at writing such a script in awk (or python, or perl, or whatever else you know) and post a question when you run into problems.
dpryan is offline   Reply With Quote
Old 05-27-2015, 06:31 AM   #3
Rhewter
Member
 
Location: Goiás, Brazil

Join Date: Sep 2014
Posts: 10
Default

Quote:
Originally Posted by dpryan View Post
You could do this with awk, assuming the VCF file is sorted in a useful way. The general idea is to keep track of two variables: the chromosome of the last entry and the position of the last entry. If the chromosome of an entry and the previous are different or their distances are above 200bp then print $0 (the variables should be updated in any case). Have a go at writing such a script in awk (or python, or perl, or whatever else you know) and post a question when you run into problems.
Hi...

Thank you for your answer. I thought that possibility before but would like to know whether there was a ready-made option. But I think your suggestion is the best option.

Thank you!
Rhewter is offline   Reply With Quote
Old 05-27-2015, 08:42 AM   #4
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 451
Default

vcftools has a --thin option

--thin <int>

Thin sites so that no two sites are within the specified distance from one another.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 05-27-2015, 09:16 AM   #5
Rhewter
Member
 
Location: Goiás, Brazil

Join Date: Sep 2014
Posts: 10
Default

Quote:
Originally Posted by SNPsaurus View Post
vcftools has a --thin option

--thin <int>

Thin sites so that no two sites are within the specified distance from one another.

Hi SNPsaurus,

I tried using this function but it only generates a log file (with the number of unique variants that range) and not a new VCF. Is that so?


Thank you for your answer!
Rhewter is offline   Reply With Quote
Old 05-27-2015, 09:17 AM   #6
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 451
Default

I think you need to add the --recode option, which tells vcftools to actually make a new vcf based on the filtering.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 05-27-2015, 09:21 AM   #7
Rhewter
Member
 
Location: Goiás, Brazil

Join Date: Sep 2014
Posts: 10
Smile

Quote:
Originally Posted by SNPsaurus View Post
I think you need to add the --recode option, which tells vcftools to actually make a new vcf based on the filtering.

It really is that. Thank you very much once again.
Rhewter is offline   Reply With Quote
Old 05-31-2018, 09:50 PM   #8
[email protected]
Junior Member
 
Location: India

Join Date: May 2018
Posts: 1
Default

Hi SNPsaurus,
I have applied various filters from VCFfilter tool i.e. DP>10, AF>0.1 etc.
This filter information is shown in top of the VCF file also. However, when I used to work with VCFtools for minimum SNPs distance (--Thin 10000) parametere, it does not show the information in vcf file.
Can you make me understand regarding this?

Regards
Prakash Thakor
drpbt1692@gmail.com is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO