--------------------------------VCFTools-----------------------------------------------------
$./vcf-compare in.vcf.gz dbsnp_132.hg18.vcf.gz > in.snp-dbsnp.summary
$less in.snp-dbsnp.summary
Number of sites found only in
115 in.vcf.gz (0.8%)
15101 dbsnp_132.hg18.vcf.gz (0.1%) in.vcf.gz (99.2%)
26694223 dbsnp_132.hg18.vcf.gz (99.9%)
Number of REF matches: 15052
Number of ALT matches: 14449
Number of REF mismatches: 49
Number of ALT mismatches: 603
Number of sites lost due to grouping (e.g. duplicate sites)
2024327 (7.0%) .. read 28733651, reported 26709324 dbsnp_132.hg18.vcf.gz
--------------------------------BEDTools-----------------------------------------------------
$intersectBed -u -f 1 -a in.vcf -b dbsnp_132.hg18.vcf > in.snp-dbsnp.u.bed
$wc -l in.snp-dbsnp.u.bed
15092 in.snp-dbsnp.u.bed
***********************************************************************************************
As you can see, the number of overlap is different. It is 15101 from VCFTools but 15092 by BEDTools.
I also used the vcf-isec to get the VCFTools version of overlap vcf. Then, I used the 'intersectBed' to overlap the this vcf (VCFTools version) with the in.snp-dbsnp.u.bed (BEDTools version) and get all 15092 overlaps.
AFAIK, BEDTools did the overlap by only using the position information (i.e. consider an overlap even with different base(s)). But I am not sure what VCFTools does to come out with the additional 9 overlaps (or why BEDTools has 9 overlaps missing).
It would be great if someone could explain what VCFTools:vcf-isec is doing and give me some advice on how to interpret the above mentioned discrepancy. Many thanks!
$./vcf-compare in.vcf.gz dbsnp_132.hg18.vcf.gz > in.snp-dbsnp.summary
$less in.snp-dbsnp.summary
Number of sites found only in
115 in.vcf.gz (0.8%)
15101 dbsnp_132.hg18.vcf.gz (0.1%) in.vcf.gz (99.2%)
26694223 dbsnp_132.hg18.vcf.gz (99.9%)
Number of REF matches: 15052
Number of ALT matches: 14449
Number of REF mismatches: 49
Number of ALT mismatches: 603
Number of sites lost due to grouping (e.g. duplicate sites)
2024327 (7.0%) .. read 28733651, reported 26709324 dbsnp_132.hg18.vcf.gz
--------------------------------BEDTools-----------------------------------------------------
$intersectBed -u -f 1 -a in.vcf -b dbsnp_132.hg18.vcf > in.snp-dbsnp.u.bed
$wc -l in.snp-dbsnp.u.bed
15092 in.snp-dbsnp.u.bed
***********************************************************************************************
As you can see, the number of overlap is different. It is 15101 from VCFTools but 15092 by BEDTools.
I also used the vcf-isec to get the VCFTools version of overlap vcf. Then, I used the 'intersectBed' to overlap the this vcf (VCFTools version) with the in.snp-dbsnp.u.bed (BEDTools version) and get all 15092 overlaps.
AFAIK, BEDTools did the overlap by only using the position information (i.e. consider an overlap even with different base(s)). But I am not sure what VCFTools does to come out with the additional 9 overlaps (or why BEDTools has 9 overlaps missing).
It would be great if someone could explain what VCFTools:vcf-isec is doing and give me some advice on how to interpret the above mentioned discrepancy. Many thanks!
Comment