Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DeNovoGear BCF parsing error

    Hi,

    I am attempting to use denovogear to identify denovo variants from a parent-child trio, and am getting an error message when it parses the bcf file:

    # START ERROR MESSAGE

    XD Model
    PED file : trio.ped, BCF file : trio.bcf
    The number of trios in the ped file : 1
    The number of paired samples in the ped file : 0


    Created SNP lookup table - XD
    First mrate 1 Last 1
    First code 6 Last 6
    First tgt AA/AA/AA Last TT/TT/TT
    First tref 0.0001791 Last 0.744757

    Created indel lookup table - XD
    First code 6 Last 6
    First tgt RR/RR/RR Last DD/DD/DD
    First prior 0.0375 Last 0.0855

    BCF PARSING ERROR ! -7
    Exiting !

    # END ERROR MESSAGE


    The guide to using denovogear is here:
    http://sourceforge.net/p/denovogear/wiki/Home/

    It requires a bcf of called variants for the trio, and ped file to describe the trio.

    I'm running it with the following command:
    denovogear dnm XD --bcf output.bcf --ped trio.ped

    My ped file looks like this
    FAM001 child dad mum 2 2
    FAM001 mum 0 0 1 0
    FAM001 dad 0 0 2 0
    affected child is female, normal parents.. is this correctly formatted?

    I created the bcf from a multi-sample vcf containing the same individuals in the trio as so:
    bcftools view -S -b -D ucsc_hg19.dict trio.vcf > trio.bcf

    I generated the sequence dictionary with Picard:

    java -jar /usr/local/lib/picard_tools/CreateSequenceDictionary.jar \
    REFERENCE=$ref_genome \
    OUTPUT=genome.dict

    Any ideas what I'm doing wrong?

    Cheers,

    Chris

  • #2
    I'm planning to try out this tool and wanted to know if you created a consensus pileup for the entire trio or if you just output the sites where there is evidence of an alternate allele in atleast one of the samples.

    The pileup for a whole genome dataset would be pretty large I assume.

    Comment


    • #3
      I was able to reproduce the error in the OP and the reason seems to be that the BCF produced was in a compressed VCF-like format when I viewed the file with bcftools view.

      The tool expects an mpileup-like format.

      Comment


      • #4
        Hi swNGS,
        Could you make sure that the sample names in the BCF file are the same as specified in the PED file ?
        Avinash

        Comment


        • #5
          You could also something like

          "samtools mpileup -gDf hg19.fa s1.bam s2.bam s3.bam | ./denovogear dnm auto --ped t.ped --bcf -"

          Comment


          • #6
            Hi,

            Many thanks for your responses. I didn't mention that the original vcf was generated using GATK UnifiedGenotyper, and phased using GATK PhaseByTransmission.

            The whole trio is called at once. It's not a massive file since it's exome rather an whole genome.

            I'm keen to use the vcf produced by GATK, since my pipeline is set up that way.

            .. So is the problem that DeNovoGear wont parse a vcf produced by GATK?
            Or is the error related to the compression settings of the vcf-> bcf process?

            Also, the sample names are identical in the vcf and the ped file (I generic'ified them for illustrative purposes here)

            I'll look into alternative methods of doing the vcf-> bcf conversion and let you know if that helps

            Thanks

            Comment


            • #7
              I'm a bit uncomfortable using PhaseByTransmission since my concordance values with array data and the GTs produced by PBT were rather low.

              Comment


              • #8
                Hi Chris,
                DeNovoGear takes in BCF files as input, so you have to find some way of converting the GATK vcf's to BCF. There was some talk that the VCF to BCF convertor in bcftools was not perfect. I'm not sure if they've fixed the bugs yet.
                Does GATK produce a BCF file do you know ?
                The reason we settled on BCF as a format for our input was because it seemed to be emerging as the consensus format for storing variant calls.
                Cheers,
                Avinash

                Comment


                • #9
                  I'm sure Avinash can correct me if I'm wrong, but I think one of the attractive advantages of DeNovoGear is that it works directly off the BAM files rather than calling SNPs first and then filtering based on the (possibly incorrect) calls. The mpileup command is used to generate the genotype likelihoods at each position, which is fed as input into the Bayesian framework of DeNovoGear. So mpileup isn't being used to call the SNPs but rather used to generate the genotype likelihoods (calculated from the I16 values in the samtools bcf/vcf file).

                  Justin

                  Comment


                  • #10
                    You're absolutely right Justin ! I was trying to offer Chris a solution based on the fact that he wants to use the VCF from GATK.

                    Comment


                    • #11
                      It would appear that the bcftools output may not be the current version:
                      1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!


                      ...and GATK apparently can combine multiple vcfs and convert to bcf:


                      ...took a bit of finding

                      Comment


                      • #12
                        Ah, okay that's clearer. I'll try the way you suggest.

                        Comment


                        • #13
                          Thanks to the pointers, I have now got DNG running, however I seem only to get it to output to the terminal window, with the following command:

                          samtools mpileup -gDf $ref_genome $bam1 $bam2 $bam3 | denovogear dnm XD --ped trio.ped --bcf -

                          I realise this is a basic question, but how do I get it to send the output to a bcf or vcf ?

                          as I understand it, in the command:
                          ./denovogear dnm auto --ped paired.ped --bcf sample.bcf

                          "--bcf sample.bcf" specifies the input bcf file, which as per the previous seuuestions I am piping directly from samtools mpileup. I cant see how to specify the output file though.

                          Would it be more efficient to generate the bcf from samtools mpileup first, then use that as an input bcf for DNG?

                          Thanks,

                          Chris

                          Comment


                          • #14
                            Hi Chris,
                            To get a vcf output try this,

                            "samtools mpileup -gDf $ref_genome $bam1 $bam2 $bam3 | denovogear dnm XD --ped trio.ped --bcf -
                            --output o1.vcf"
                            I will update the README to show this, also let me know if you have any issues with the VCF output, I've coded it based on the VCF specs page.

                            As for your second question, if you were to make a BCF file first then for every subsequent run of denovogear you would just have to pass the BCF file. The creation of the BCF ( i.e calculation of GL's ) is the more time intensive process so your DNG runs will be much quicker. I'd recommend piping if you have disk space constraints.

                            Comment


                            • #15
                              I am getting a similar error:
                              Unable to find pair, exiting Denovogear! ( 2, 2)
                              BCF PARSING ERROR - Paired Sample! -3
                              Exiting !
                              I used marked dup sorted bam files to create bcf files using samtools mpileup and then ran the bcf file through the denovogear and got the following error for all the samples.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X