Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • LeonDK
    Member
    • Sep 2014
    • 69

    Calculate the proportion of missing data per sample from VCF file

    I have a VCFv4.2 (Sanger-imputation-service genotype data) file and I need to calculate the proportion of missing data per sample.

    Each sample is encoded like so:
    Code:
    GT:ADS:DS:GP
    0|0:0.25,0.15:0.4:0.6375,0.325,0.0375
    I am uncertain as to how 'missing' is identified - Any suggestions?
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Missing data will have ".|." or "./." as the genotype. However, since this is imputed I'm not sure how much there will be in the way of missing values (they should have been largely imputed).

    Comment

    • LeonDK
      Member
      • Sep 2014
      • 69

      #3
      Originally posted by dpryan View Post
      Missing data will have ".|." or "./." as the genotype. However, since this is imputed I'm not sure how much there will be in the way of missing values (they should have been largely imputed).
      Running the following on the vcf files
      Code:
      for f in `ls -1`; do gunzip -c $f | grep -c '\.|\.'; done
      Yields nothing but zeros, so it looks like you are correct.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      30 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      38 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      43 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      64 views
      0 reactions
      Last Post SEQadmin2  
      Working...