Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BCFTOOLS 0.2 Filter problems

    Hi all,
    First post here, so I hope I am posting in the right place. I am trying to use BCFTOOLS (version: 0.2.0-rc8-10-g7a1041c) to filter a bcf file but I am running into a couple of problems and I am not sure if I am mis-specifying my filter expressions or if there's a couple of bugs to be ironed out.

    UPDATE: I opened this as Issue #51 on the Github page. I will update this if the issue gets updated and vice versa.

    e.g. trying to filter based on the genotype filed (FORMAT/GT) I try:

    Code:
    bcftools filter -i 'FORMAT/GT=="1/1"' ./mysnps.bcf | less
    I return the bcf header (indicating that bcftools was able to parse the filter expression just fine) but zero variants.

    When I also try to use a subscripted FORMAT field (e.g. PL for arguments sake) like this:
    Code:
    bcftools filter -i 'FORMAT/PL[0]==1' ./mysnps.bcf | less
    I get the error
    Code:
    Error: Arrays must be subscripted, e.g. PL[0]
    Anyone know what I am doing wrong or what is happening? Some filters work just fine, e.g.
    Code:
    bcftools filter -i "%QUAL>30" ./mysnps.bcf | less [COLOR="Silver"][I]# this filter works fine[/I][/COLOR]
    bcftools_0.2 filter -i "FORMAT/SP==1" JEL423.raw.snps.3.bcf | less [COLOR="Silver"][I]# using a FORMAT field here works[/I][/COLOR]
    FYI here are the relevant parts of my bcf header:
    Code:
    ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
    ##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel">
    ##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel">
    ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
    ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version=3>
    ##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)">
    ##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)">
    ##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)">
    ##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)">
    ##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
    ##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
    ##INFO=<ID=I16,Number=16,Type=Float,Description="Auxiliary tag used for calling, see description of bcf_callret1_t in bam2bcf.h">
    ##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for calling">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
    ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##INFO=<ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)">
    ##INFO=<ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)">
    ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
    ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
    ##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
    ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
    ##bcftools_callVersion=0.2.0-rc8-10-g7a1041c+htslib-0.2.0-rc8-6-gd49dfa6
    ##bcftools_callCommand=call -vm -O b -
    ##bcftools_viewVersion=0.2.0-rc8-10-g7a1041c+htslib-0.2.0-rc8-6-gd49dfa6
    Last edited by simono101; 05-27-2014, 08:27 AM.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin


    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
    Yesterday, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
55 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
52 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
45 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Working...
X