SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Discrepancy in paired-end Illumina data kopardev Bioinformatics 1 01-03-2012 11:23 PM
Library quantitation discrepancy genlyai Illumina/Solexa 1 04-15-2011 07:24 AM
vcftools to annotate SNP rururara Bioinformatics 0 03-31-2011 06:47 AM
bwa question: quality discrepancy between a color-space alignment and its csfastq yenhuahuang1 Bioinformatics 4 03-15-2010 06:23 AM
eagleview discrepancy search alig Bioinformatics 0 11-12-2008 10:06 PM

Reply
 
Thread Tools
Old 07-25-2011, 07:27 PM   #1
zxyeo
Junior Member
 
Location: Singapore

Join Date: Jun 2011
Posts: 6
Default Overlap number discrepancy between VCFTools and BEDTools

--------------------------------VCFTools-----------------------------------------------------
$./vcf-compare in.vcf.gz dbsnp_132.hg18.vcf.gz > in.snp-dbsnp.summary
$less in.snp-dbsnp.summary

Number of sites found only in
115 in.vcf.gz (0.8%)
15101 dbsnp_132.hg18.vcf.gz (0.1%) in.vcf.gz (99.2%)
26694223 dbsnp_132.hg18.vcf.gz (99.9%)

Number of REF matches: 15052
Number of ALT matches: 14449
Number of REF mismatches: 49
Number of ALT mismatches: 603
Number of sites lost due to grouping (e.g. duplicate sites)
2024327 (7.0%) .. read 28733651, reported 26709324 dbsnp_132.hg18.vcf.gz

--------------------------------BEDTools-----------------------------------------------------
$intersectBed -u -f 1 -a in.vcf -b dbsnp_132.hg18.vcf > in.snp-dbsnp.u.bed
$wc -l in.snp-dbsnp.u.bed
15092 in.snp-dbsnp.u.bed

***********************************************************************************************
As you can see, the number of overlap is different. It is 15101 from VCFTools but 15092 by BEDTools.

I also used the vcf-isec to get the VCFTools version of overlap vcf. Then, I used the 'intersectBed' to overlap the this vcf (VCFTools version) with the in.snp-dbsnp.u.bed (BEDTools version) and get all 15092 overlaps.

AFAIK, BEDTools did the overlap by only using the position information (i.e. consider an overlap even with different base(s)). But I am not sure what VCFTools does to come out with the additional 9 overlaps (or why BEDTools has 9 overlaps missing).

It would be great if someone could explain what VCFTools:vcf-isec is doing and give me some advice on how to interpret the above mentioned discrepancy. Many thanks!

Last edited by zxyeo; 07-25-2011 at 07:48 PM.
zxyeo is offline   Reply With Quote
Old 07-25-2011, 08:42 PM   #2
zxyeo
Junior Member
 
Location: Singapore

Join Date: Jun 2011
Posts: 6
Default Update: is BEDTools doing more than simply comparing the genomic position?

I just looked at the additional 9 entries and I would like to update my interpretation:

Let's start with 1 of the 9 entries (generally, other entries give the same observation):
chr2 79990299 . GAGC G 408.26 PASS AC=2;AF=1.00;AN=2;DP=12;FS=0.000;HRun=0;HaplotypeScore=63.0941;MQ=41.32;MQ0=0;QD=34.02;SB=-158.86;SF=0,1,1

In 'dbsnp_132.hg18.vcf', a similar entry was found:
chr2 79990299 rs10578220 G GAGC . PASS G5;G5A;GNO;NSF;REF;RSPOS=80136791;SAO=0;SCS=0;SLO;SSR=0;VC=INDEL;VP=050100001201030100000200;WGT=1;dbSNPBuildID=119

i.e. somehow my REF and ALT base(s) were switched.

Interestingly, BEDTools managed to distinguish them (which means BEDTools might be using the base identity on top of position information). I guess VCFTools might be only considering the position information (that's why this is detected as overlap). Correct me if this is not true.
zxyeo is offline   Reply With Quote
Old 12-27-2011, 04:36 PM   #3
wanguan2000
Member
 
Location: shanghai

Join Date: Nov 2010
Posts: 24
Default

vcf-compare

Compares positions in two or more VCF files and outputs the numbers of positions contained in one but not the other files; two but not the other files, etc, which comes handy when generating Venn diagrams. The script also computes numbers such as nonreference discordance rates (including multiallelic sites), compares actual sequence (useful when comparing indels), etc.

vcf just considering the position information.
wanguan2000 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO