SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Calculate the proportion of missing data per sample from VCF file (http://seqanswers.com/forums/showthread.php?t=70929)

LeonDK 08-18-2016 03:19 AM

Calculate the proportion of missing data per sample from VCF file
 
I have a VCFv4.2 (Sanger-imputation-service genotype data) file and I need to calculate the proportion of missing data per sample.

Each sample is encoded like so:
Code:

GT:ADS:DS:GP
0|0:0.25,0.15:0.4:0.6375,0.325,0.0375

I am uncertain as to how 'missing' is identified - Any suggestions?

dpryan 08-18-2016 04:00 AM

Missing data will have ".|." or "./." as the genotype. However, since this is imputed I'm not sure how much there will be in the way of missing values (they should have been largely imputed).

LeonDK 08-18-2016 05:11 AM

Quote:

Originally Posted by dpryan (Post 197922)
Missing data will have ".|." or "./." as the genotype. However, since this is imputed I'm not sure how much there will be in the way of missing values (they should have been largely imputed).

Running the following on the vcf files
Code:

for f in `ls -1`; do gunzip -c $f | grep -c '\.|\.'; done
Yields nothing but zeros, so it looks like you are correct.


All times are GMT -8. The time now is 11:05 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.