Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
samtools mpileup to get vcf files including GL information xiangfeiloulan Bioinformatics 0 07-01-2012 06:57 PM
how to get coverage information from the 1000 genomes data kjaja Bioinformatics 3 06-08-2012 09:25 AM
Understanding coverage in VCF elfuser Bioinformatics 0 05-31-2012 03:25 PM
When to include Cufflinks/Cuffdiff -u and -b flags? KAP Bioinformatics 0 08-15-2011 02:37 PM
using vcf tools to extract genotype information rna_seeker Bioinformatics 3 07-10-2011 05:25 PM

Thread Tools
Old 07-13-2012, 07:36 AM   #1
Location: Bonn

Join Date: Feb 2010
Posts: 30
Default How to include coverage information in vcf

Hi folks

When we report variants in vcf format a common question is: what is
the coverage profile of the target region? Combining vcf and coverage
information allows to deduce also the genotypes of the target region
without variants based on the reference sequence. This is much more efficient than listing the 0/0 (which is reference genotype) at every non variant position in vcf.

For this purpose one could generate a bed file listing intervals with e.g. >10x and >20x coverage. Alternatively a wig file would be an option.

I was wondering whether there is also a possibility to include this
information in vcf. Maybe in the header? Does anyone have an idea?

krawitz is offline   Reply With Quote
Old 07-13-2012, 08:09 AM   #2
Richard Finney
Senior Member
Location: bethesda

Join Date: Feb 2009
Posts: 700

Yes you can do it. Our group as done it in haplotype discovery.

The point you are assuming is ...

IF (there's enough coverage for a location) and (the location is not reported in a VCF that is only reporting non-reference calls)
THEN the position was interrogated and you can assume that the call is "reference".

Of course if there is little or no coverage, you will have trouble making a reliable call.

This is do-able, as you suggest, by having an additional wig (or bigwig) file.

There is the tricky situation of ... other samples have heterozygous at a locus but we only have coverage of 4 in the sample we're trying to call with 2 read loci are are non-reference and 2 are reference. Coverage is only four, but evidence is good for "het". But, of course, making a call with coverage 4 and all 'reference" would be bad. (right?). In practice, I've just used a cut-off of coverage 10.

NB: VCF generation programs can sometimes be forced to output calls for every location, not just the non-reference calls.

Last edited by Richard Finney; 07-13-2012 at 10:12 AM. Reason: added "tricky" paragraph
Richard Finney is offline   Reply With Quote
Old 07-13-2012, 09:37 AM   #3
Senior Member
Location: San Diego

Join Date: May 2008
Posts: 912

The DP and DP4 elements in the info column give coverage info at that position. But by its nature, a vcf is only going to tell you what's going on at the variant. Use BEDTools to get coverage stats across multiple regions.
swbarnes2 is offline   Reply With Quote

reporting variants, vcf

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 01:41 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO