Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BCFTOOLS 0.2 Filter problems

    Hi all,
    First post here, so I hope I am posting in the right place. I am trying to use BCFTOOLS (version: 0.2.0-rc8-10-g7a1041c) to filter a bcf file but I am running into a couple of problems and I am not sure if I am mis-specifying my filter expressions or if there's a couple of bugs to be ironed out.

    UPDATE: I opened this as Issue #51 on the Github page. I will update this if the issue gets updated and vice versa.

    e.g. trying to filter based on the genotype filed (FORMAT/GT) I try:

    Code:
    bcftools filter -i 'FORMAT/GT=="1/1"' ./mysnps.bcf | less
    I return the bcf header (indicating that bcftools was able to parse the filter expression just fine) but zero variants.

    When I also try to use a subscripted FORMAT field (e.g. PL for arguments sake) like this:
    Code:
    bcftools filter -i 'FORMAT/PL[0]==1' ./mysnps.bcf | less
    I get the error
    Code:
    Error: Arrays must be subscripted, e.g. PL[0]
    Anyone know what I am doing wrong or what is happening? Some filters work just fine, e.g.
    Code:
    bcftools filter -i "%QUAL>30" ./mysnps.bcf | less [COLOR="Silver"][I]# this filter works fine[/I][/COLOR]
    bcftools_0.2 filter -i "FORMAT/SP==1" JEL423.raw.snps.3.bcf | less [COLOR="Silver"][I]# using a FORMAT field here works[/I][/COLOR]
    FYI here are the relevant parts of my bcf header:
    Code:
    ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
    ##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel">
    ##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel">
    ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
    ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version=3>
    ##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)">
    ##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)">
    ##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)">
    ##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)">
    ##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
    ##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
    ##INFO=<ID=I16,Number=16,Type=Float,Description="Auxiliary tag used for calling, see description of bcf_callret1_t in bam2bcf.h">
    ##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for calling">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
    ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##INFO=<ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)">
    ##INFO=<ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)">
    ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
    ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
    ##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
    ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
    ##bcftools_callVersion=0.2.0-rc8-10-g7a1041c+htslib-0.2.0-rc8-6-gd49dfa6
    ##bcftools_callCommand=call -vm -O b -
    ##bcftools_viewVersion=0.2.0-rc8-10-g7a1041c+htslib-0.2.0-rc8-6-gd49dfa6
    Last edited by simono101; 05-27-2014, 08:27 AM.

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-27-2024, 06:37 PM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-27-2024, 06:07 PM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
56 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
70 views
0 likes
Last Post seqadmin  
Working...
X