SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BCF Tools JohnK Bioinformatics 4 11-28-2013 06:27 AM
samtools bcf to vcf Anjali Bioinformatics 13 06-26-2012 04:51 AM
ALEXA-seq blast output parsing error demis001 RNA Sequencing 1 06-15-2012 02:27 PM
*.bam -> var.bcf No result dongshenglulv Bioinformatics 8 09-02-2011 06:03 AM
Parsing error in BAM header rachmani Bioinformatics 4 04-18-2011 12:36 AM

Reply
 
Thread Tools
Old 10-08-2012, 01:27 PM   #1
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default DeNovoGear BCF parsing error

Hi,

I am attempting to use denovogear to identify denovo variants from a parent-child trio, and am getting an error message when it parses the bcf file:

# START ERROR MESSAGE

XD Model
PED file : trio.ped, BCF file : trio.bcf
The number of trios in the ped file : 1
The number of paired samples in the ped file : 0


Created SNP lookup table - XD
First mrate 1 Last 1
First code 6 Last 6
First tgt AA/AA/AA Last TT/TT/TT
First tref 0.0001791 Last 0.744757

Created indel lookup table - XD
First code 6 Last 6
First tgt RR/RR/RR Last DD/DD/DD
First prior 0.0375 Last 0.0855

BCF PARSING ERROR ! -7
Exiting !

# END ERROR MESSAGE


The guide to using denovogear is here:
http://sourceforge.net/p/denovogear/wiki/Home/

It requires a bcf of called variants for the trio, and ped file to describe the trio.

I'm running it with the following command:
denovogear dnm XD --bcf output.bcf --ped trio.ped

My ped file looks like this
FAM001 child dad mum 2 2
FAM001 mum 0 0 1 0
FAM001 dad 0 0 2 0
affected child is female, normal parents.. is this correctly formatted?

I created the bcf from a multi-sample vcf containing the same individuals in the trio as so:
bcftools view -S -b -D ucsc_hg19.dict trio.vcf > trio.bcf

I generated the sequence dictionary with Picard:

java -jar /usr/local/lib/picard_tools/CreateSequenceDictionary.jar \
REFERENCE=$ref_genome \
OUTPUT=genome.dict

Any ideas what I'm doing wrong?

Cheers,

Chris
swNGS is offline   Reply With Quote
Old 10-09-2012, 02:14 PM   #2
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 164
Default

I'm planning to try out this tool and wanted to know if you created a consensus pileup for the entire trio or if you just output the sites where there is evidence of an alternate allele in atleast one of the samples.

The pileup for a whole genome dataset would be pretty large I assume.
vivek_ is offline   Reply With Quote
Old 10-11-2012, 08:56 AM   #3
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 164
Default

I was able to reproduce the error in the OP and the reason seems to be that the BCF produced was in a compressed VCF-like format when I viewed the file with bcftools view.

The tool expects an mpileup-like format.
vivek_ is offline   Reply With Quote
Old 10-11-2012, 10:58 AM   #4
trackavinash
Member
 
Location: St Louis, USA

Join Date: Nov 2011
Posts: 14
Default

Hi swNGS,
Could you make sure that the sample names in the BCF file are the same as specified in the PED file ?
Avinash
trackavinash is offline   Reply With Quote
Old 10-11-2012, 11:02 AM   #5
trackavinash
Member
 
Location: St Louis, USA

Join Date: Nov 2011
Posts: 14
Default

You could also something like

"samtools mpileup -gDf hg19.fa s1.bam s2.bam s3.bam | ./denovogear dnm auto --ped t.ped --bcf -"
trackavinash is offline   Reply With Quote
Old 10-12-2012, 09:59 AM   #6
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default

Hi,

Many thanks for your responses. I didn't mention that the original vcf was generated using GATK UnifiedGenotyper, and phased using GATK PhaseByTransmission.

The whole trio is called at once. It's not a massive file since it's exome rather an whole genome.

I'm keen to use the vcf produced by GATK, since my pipeline is set up that way.

.. So is the problem that DeNovoGear wont parse a vcf produced by GATK?
Or is the error related to the compression settings of the vcf-> bcf process?

Also, the sample names are identical in the vcf and the ped file (I generic'ified them for illustrative purposes here)

I'll look into alternative methods of doing the vcf-> bcf conversion and let you know if that helps

Thanks
swNGS is offline   Reply With Quote
Old 10-12-2012, 12:11 PM   #7
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 164
Default

I'm a bit uncomfortable using PhaseByTransmission since my concordance values with array data and the GTs produced by PBT were rather low.
vivek_ is offline   Reply With Quote
Old 10-12-2012, 12:49 PM   #8
trackavinash
Member
 
Location: St Louis, USA

Join Date: Nov 2011
Posts: 14
Default

Hi Chris,
DeNovoGear takes in BCF files as input, so you have to find some way of converting the GATK vcf's to BCF. There was some talk that the VCF to BCF convertor in bcftools was not perfect. I'm not sure if they've fixed the bugs yet.
Does GATK produce a BCF file do you know ?
The reason we settled on BCF as a format for our input was because it seemed to be emerging as the consensus format for storing variant calls.
Cheers,
Avinash
trackavinash is offline   Reply With Quote
Old 10-12-2012, 01:26 PM   #9
BAMseek
Senior Member
 
Location: St. Louis, MO, USA

Join Date: Apr 2011
Posts: 124
Default

I'm sure Avinash can correct me if I'm wrong, but I think one of the attractive advantages of DeNovoGear is that it works directly off the BAM files rather than calling SNPs first and then filtering based on the (possibly incorrect) calls. The mpileup command is used to generate the genotype likelihoods at each position, which is fed as input into the Bayesian framework of DeNovoGear. So mpileup isn't being used to call the SNPs but rather used to generate the genotype likelihoods (calculated from the I16 values in the samtools bcf/vcf file).

Justin
BAMseek is offline   Reply With Quote
Old 10-12-2012, 01:34 PM   #10
trackavinash
Member
 
Location: St Louis, USA

Join Date: Nov 2011
Posts: 14
Default

You're absolutely right Justin ! I was trying to offer Chris a solution based on the fact that he wants to use the VCF from GATK.
trackavinash is offline   Reply With Quote
Old 10-12-2012, 01:35 PM   #11
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default

It would appear that the bcftools output may not be the current version:
http://www.1000genomes.org/wiki/anal...-vcf-version-2

...and GATK apparently can combine multiple vcfs and convert to bcf:
http://www.ohloh.net/p/gatk/commits/206543775

...took a bit of finding
swNGS is offline   Reply With Quote
Old 10-12-2012, 01:37 PM   #12
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default

Ah, okay that's clearer. I'll try the way you suggest.
swNGS is offline   Reply With Quote
Old 10-13-2012, 03:44 AM   #13
swNGS
Member
 
Location: SW UK

Join Date: Nov 2011
Posts: 83
Default

Thanks to the pointers, I have now got DNG running, however I seem only to get it to output to the terminal window, with the following command:

samtools mpileup -gDf $ref_genome $bam1 $bam2 $bam3 | denovogear dnm XD --ped trio.ped --bcf -

I realise this is a basic question, but how do I get it to send the output to a bcf or vcf ?

as I understand it, in the command:
./denovogear dnm auto --ped paired.ped --bcf sample.bcf

"--bcf sample.bcf" specifies the input bcf file, which as per the previous seuuestions I am piping directly from samtools mpileup. I cant see how to specify the output file though.

Would it be more efficient to generate the bcf from samtools mpileup first, then use that as an input bcf for DNG?

Thanks,

Chris
swNGS is offline   Reply With Quote
Old 10-13-2012, 09:22 AM   #14
trackavinash
Member
 
Location: St Louis, USA

Join Date: Nov 2011
Posts: 14
Default

Hi Chris,
To get a vcf output try this,

"samtools mpileup -gDf $ref_genome $bam1 $bam2 $bam3 | denovogear dnm XD --ped trio.ped --bcf -
--output o1.vcf"
I will update the README to show this, also let me know if you have any issues with the VCF output, I've coded it based on the VCF specs page.

As for your second question, if you were to make a BCF file first then for every subsequent run of denovogear you would just have to pass the BCF file. The creation of the BCF ( i.e calculation of GL's ) is the more time intensive process so your DNG runs will be much quicker. I'd recommend piping if you have disk space constraints.
trackavinash is offline   Reply With Quote
Old 03-27-2013, 07:39 AM   #15
akul
Junior Member
 
Location: St.Louis

Join Date: Feb 2013
Posts: 2
Default

I am getting a similar error:
Unable to find pair, exiting Denovogear! ( 2, 2)
BCF PARSING ERROR - Paired Sample! -3
Exiting !
I used marked dup sorted bam files to create bcf files using samtools mpileup and then ran the bcf file through the denovogear and got the following error for all the samples.
akul is offline   Reply With Quote
Reply

Tags
denovogear bcf trio

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:14 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO