rudi283 03-02-2011 05:46 AM

need help with converting VCF to GTF/GFF format
I need to convert my vcf (from file to gff/gtf format to be able to use it for annotating my refseq with SNPs.
Is there any tool I could use for converting that file?

andreitudor 03-02-2011 06:05 AM

These sorts of conversions are rather tricky because it is difficult to construct a set of standard conversion decisions that will please everyone's specific demands. VCF to BED/GFF is doable with an awk script. BED and GFF are essentially interchangeable with awk as well. GFF/BED to VCF is not really doable unless the necessary VCF info is already tracked in the BED or GFF.

This is what I found on another forum. I do not think there is a tool that does this for you. I have searched for scripts, but I did not come across any. As it was written is that post, the best way would be to build your own script.


cow_girl 03-02-2011 09:37 AM

I also have been searching for a simple script or tool to convert to gff format to use in SNP annotation, I am wanting to convert xml to gff though. If you are working with the human genome you can download from UCSC genome dbSNP in gff format I'm pretty sure, my problem is I am working with the bovine genome! Here is a link to a sequence converter, unfortunately vcf isn't one of the input formats but it may be useful

ketan_bnf 03-02-2011 08:11 PM

If you have vcf file and want to annotate SNP you can use

1) EnsEMBL Variant effect predictor
2) snpEff software

both works with cow, accepts vcf file as input and gives GENEID, transcript ID, protein name.

rudi283 03-05-2011 10:49 AM

Thanks for the answers!
I was going to use the UCSC and download a file with SNPs in gtf format but looks like the latest - dbSNP132 has not been added to UCSC yet:(
I'll try with the
1) EnsEMBL Variant effect predictor
2) snpEff software

