Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • epigen
    Senior Member
    • May 2010
    • 101

    Bedtools intersectBed with vcf format

    Since I could not find any documentation on how Bedtools intersectBed works with vcf format, I'm posting my question here:

    I'm intersecting a vcf file from mpileup and bcftools with dbSNP132 that is in vcf format:

    intersectBed -a mutations_chr8.vcf -b dbSNP132_chr8.vcf -wao > mutations_chr8_dbSNP132.txt

    In my example below I expected to see only dbSNP entries that have the exactly same coordinates, ie.e chr8 112454855, but I also get hits for chr8 112454865 and chr8 112454876:

    Code:
    chr8	112454855	.	Tacacacacacacacacacacaca	TACacacacacacacacacacacaca	35.5	.	INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1	PL:GT:GQ	73,0,78:0/1:75	chr8	112454855	rs76365113	T	A	.	.	dbSNPBuildID=131;VP=050000000001070008000100;WGT=1;VC=SNP;VLD;G5A;G5;KGPilot1	1
    chr8	112454855	.	Tacacacacacacacacacacaca	TACacacacacacacacacacacaca	35.5	.	INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1	PL:GT:GQ	73,0,78:0/1:75	chr8	112454855	rs67322031	A	AAC	.	.	dbSNPBuildID=130;VP=050000000001000000000200;WGT=1;VC=INDEL	1
    chr8	112454855	.	Tacacacacacacacacacacaca	TACacacacacacacacacacacaca	35.5	.	INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1	PL:GT:GQ	73,0,78:0/1:75	chr8	112454855	rs71851070	TAC	T	.	.	dbSNPBuildID=130;VP=050000000001000000000200;WGT=1;VC=INDEL	3
    chr8	112454855	.	Tacacacacacacacacacacaca	TACacacacacacacacacacacaca	35.5	.	INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1	PL:GT:GQ	73,0,78:0/1:75	chr8	112454865	rs34424884	CAC	C	.	.	dbSNPBuildID=126;VP=050000000001000000000200;WGT=1;VC=INDEL	3
    chr8	112454855	.	Tacacacacacacacacacacaca	TACacacacacacacacacacacaca	35.5	.	INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1	PL:GT:GQ	73,0,78:0/1:75	chr8	112454876	rs36066513	ACA	A	.	.	dbSNPBuildID=126;VP=050100000001030100000200;WGT=1;VC=INDEL;SLO;G5A;G5;GNO	3
    This is correct and useful considering that indel coordinates are often a bit shifted. However, I'd like to how Bedtools determines the interval in which to look for the intersection. I guess it determines the length of the reference allele the -a file, but does it also do the same for the -b file? And how could I restrict the intersection to the exact position instead of the interval?

    Thanks in advance for any help - and for Bedtools in general

    Barbara
  • mpiro
    Member
    • Nov 2009
    • 10

    #2
    Hi Barbara

    If you generate a bed file from your VCF file you can define intervals for one base like this:

    Code:
    chr8	112454855 112454856 Tacacacacacacacacacacaca	TACacacacacacacacacacacaca	35.5	.	INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1	PL:GT:GQ	73,0,78:0/1:75
    Using shell this can be done by:

    Code:
    cat mutations_chr8.vcf | awk '{print $1"\t"$2"\t"$2+1"\t"$0}' | cut -f -3,6- > mutations_chr8.bed
    
    cat dbSNP132_chr8.vcf  | awk '{print $1"\t"$2"\t"$2+1"\t"$0}' | cut -f -3,6- > dbSNP132_chr8.bed
    Then using intersectBed will match exact position.

    Comment

    • jerryliu
      Junior Member
      • Aug 2008
      • 4

      #3
      Good trick. Just a minor correction, the VCF pos is 1-based and BED pos is 0-based, so the awk part should be:
      awk '{print $1"\t"$2-1"\t"$2"\t"$0}'

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Yesterday, 11:58 AM
      0 responses
      10 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      25 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      35 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      58 views
      0 reactions
      Last Post SEQadmin2  
      Working...