SEQanswers Calculating r^2 in vcftools for only one SNP?
 Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post willMD Bioinformatics 8 02-01-2018 01:24 AM Rcucullatus Bioinformatics 5 09-17-2012 10:32 AM tnguyen Bioinformatics 0 08-10-2012 03:05 AM rururara Bioinformatics 0 03-31-2011 06:47 AM ShwenHo Bioinformatics 0 03-08-2010 03:25 PM

 03-19-2014, 08:39 AM #1 nk Member   Location: UK Join Date: Apr 2012 Posts: 11 Calculating r^2 in vcftools for only one SNP? I would like to check for a set of SNPs in a given region how strongly they correlate with one particular SNP. I know that vcftools has the --geno-r2 option which will give me this information, however this takes very long to compute because it actually calculates all possible r^2s between all SNPs. I only need it to compute it for one of them against all others. For example plink has the `--ld-snp` option which allows you to specify a single SNP to calculate against. Is there any way I can do this with vcftools?
 03-20-2014, 05:50 AM #2 TiborNagy Senior Member   Location: Budapest Join Date: Mar 2010 Posts: 329 Yes, you can. Using --chr, --from-bp, --to_bp you can specify the region of interest.
 03-20-2014, 03:05 PM #3 nk Member   Location: UK Join Date: Apr 2012 Posts: 11 I know that, but this will still calculate the pairwise r^2 for all SNPs in this area. What I want is just the r^2 from one SNP in this region to all others. I just ended up making a tped file and running this though PLINK now, although this seems unnecessarily complicated.
 03-20-2014, 03:47 PM #4 gringer David Eccles (gringer)   Location: Wellington, New Zealand Join Date: May 2011 Posts: 836 I would expect that most of the ways to do this would be complicated because your use case strays a bit from the beaten path. I find it a little odd to talk about r^2 within a "given region", and be concerned about the time it takes to do a full pairwise calculation, because if it takes less than 2h to calculate it's probably a waste of time to look for other solutions. What is the region size? How many SNPs? How many individuals?
 03-20-2014, 03:54 PM #5 nk Member   Location: UK Join Date: Apr 2012 Posts: 11 Well, the idea is to use this in local Manhattan plots from a GWAS, looking at the r^2 between the most significantly associated SNP and other SNPs around it. As far as I am aware this is not too exotic of an application, but maybe I am wrong. And since I want to do this for many cases and more or less on demand 2h is quite a long time. For comparison, the PLINK method takes maybe a minute or so now.
03-20-2014, 04:48 PM   #6
gringer
David Eccles (gringer)

Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836

Quote:
 And since I want to do this for many cases and more or less on demand 2h is quite a long time. For comparison, the PLINK method takes maybe a minute or so now.
Ah, okay. In that case, write a script to do the conversion to TPED and run PLINK. Then while you're collecting money (or saving time) due to the use of your script, hunt around for more efficient solutions.

I vaguely recall diagrams with the most significant SNP identified and r^2 values for surrounding SNPs, but unfortunately can't remember how it was done.

 03-20-2014, 11:27 PM #7 chrchang Member   Location: Mountain View, CA Join Date: Jun 2013 Posts: 15 Do you need to account for phase in your r^2 calculation? If not, you can just use PLINK 1.9's VCF import function: plink --vcf [vcf filename] --out [new fileset prefix] Then you can use --r2 as usual. r^2 with the entire rest of the genome: plink --bfile [plink fileset prefix] --r2 --inter-chr --ld-snp [snp id] r^2 with the rest of the chromosome: plink --bfile [plink fileset prefix] --r2 --inter-chr --ld-snp [snp id] --chr [chromosome number] r^2 with a limited window: plink --bfile [plink fileset prefix] --r2 --ld-snp [snp id] --ld-window [max snps + 1] --ld-window-kb [max kbs]
 03-21-2014, 01:11 AM #8 gringer David Eccles (gringer)   Location: Wellington, New Zealand Join Date: May 2011 Posts: 836 Oh wow, PLINK continues on at BGI. I'll have to spread the word to my colleagues (who use PLINK a lot)....
 08-01-2014, 01:55 AM #9 jimineep Member   Location: UK Join Date: Sep 2011 Posts: 10 Did anyone find a solution for this using vcftools? I ask because vcftools allows you to use the hap-r2 option which takes phase into account when calculating R2, as well as the geno-r2 - so with vcftools this would be more flexible.