Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BCFtools will not filter fixed sites

    Hello all,

    I'm encountering a surprising problem while filtering a vcf file in bcftools (v1.1-52-ga2e5b56). It doesn't look like a proper bug but leaves be bewildered enough.

    I'm trying to filter out all variant sites, and retain only fixed sites. So, logically enough, I apply a filter to discard sites where minor allele count is above zero, with :

    bcftools view --max-ac 0:minor

    It mostly works, but only mostly : at some sites, the odd heterozygous individual is retained (0/1:2,1:3:30:30,0,65 for ex in a mostly 0/0 line). This is not a problem of a deadlock between two alleles with equal frequency (thus no minor), since this odd "1" is the only one with 100+ "0"s.

    So, I believe there must be some kind of hidden tolerance threshold somewhere, but, does any one have a workaround..?

    Thanks a lot for insight !

    Robin

  • #2
    With a quick edit precision: this also applies to VCFtools..! here's an example :

    First, just scanning the VCF file manually:

    bash-4.1$ gunzip -c data.vcf.gz | grep '0/0' | grep '0/1' -v | grep '1/1' -v | grep '0/0' -c
    2408 ##these are ref:ref homozygous

    bash-4.1$ gunzip -c data.vcf.gz | grep '1/1' | grep '0/1' -v | grep '0/0' -v | grep '1/1' -c
    46180 ##these are alt:alt homozygous

    bash-4.1$ vcftools --gzvcf data.vcf.gz --max-mac 0 --recode --stdout | grep -c '0/1'
    0 #that's what I want, but then:

    bash-4.1$ vcftools --gzvcf data.vcf.gz --max-mac 0 --recode --stdout | grep -c '0/0'
    0 ##none of my 2408 ref:ref homozygotes has been retained !

    bash-4.1$ vcftools --gzvcf data.vcf.gz --max-mac 0 --recode --stdout | grep -c '1/1'
    46122 ##and some of the ref:ref have been dropped too.

    This is surprising, as both BCFtools and VCFtools explicity distinguish the notions of "minor allele" and "non-reference allele". So this should NOT be an issue.

    All ideas very much welcome !!

    Best

    Robin

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin


      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
      Yesterday, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    51 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    45 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X