SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
plink files to vcf conversion Vanisha Bioinformatics 10 03-13-2017 10:48 PM
Is SRA format to vcf conversion possible amruta.bn Bioinformatics 3 05-29-2012 07:42 PM
.bam to .wig conversion kalidaemon Bioinformatics 7 05-10-2012 02:39 PM
casava 1.8 bam conversion to gatk bam kingsalex Bioinformatics 1 02-14-2012 11:47 AM
conversion from CASAVA to VCF format kjaja Bioinformatics 0 12-08-2011 12:14 PM

Reply
 
Thread Tools
Old 09-09-2011, 01:14 AM   #1
ashwatha
Member
 
Location: India

Join Date: Jul 2011
Posts: 14
Default BAM to VCF conversion

Hi,

are there any tools to convert BAM files to VCF (Variant Call Format)? Alternatively, are there any tools to convert pileup files to VCF?

Looking at the VCF specs, VCF is fairly similar to pileup format, so I can probably write a script for this conversion. I am just wondering if something already exists.

Any pointers are much appreciated!

thanks,
Ashwatha.
ashwatha is offline   Reply With Quote
Old 09-09-2011, 01:48 AM   #2
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default

I can't see why you want to convert a BAM file to a vcf file. The Bam file stores the alignment where the VCF file stores variants. In order to generate a vcf file you would need to do a proper SNP calling (e.g. GATK, VarScan, ...). A direct conversion makes no sense for me...
ulz_peter is offline   Reply With Quote
Old 09-11-2011, 10:22 PM   #3
ashwatha
Member
 
Location: India

Join Date: Jul 2011
Posts: 14
Default

Hi Peter,

I see what you mean - my question was not worded correctly. What I am looking for is a way to take a BAM file, and call variants on it and generate a VCF file, the way "samtools pileup" generates a pileup file out of a BAM file.

If such a tool doesn't exist, I could also use something that can convert a pileup file generated using samtools pileup to a VCF file, considering that pileup files and VCF files contain similar data (at least for coordinates where there is a SNP or other variant).

thanks,
Ashwatha.
ashwatha is offline   Reply With Quote
Old 09-11-2011, 10:33 PM   #4
ketan_bnf
Member
 
Location: India

Join Date: Oct 2010
Posts: 59
Default

You can use samtools mpileup option, visit this link

http://samtools.sourceforge.net/mpileup.shtml

also you can use GATK for variant calling.
ketan_bnf is offline   Reply With Quote
Old 09-11-2011, 10:53 PM   #5
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default

I'd recommend GATK as well:
http://www.broadinstitute.org/gsa/wi...alysis_Toolkit

like mentioned in my post above VarScan would be another option: http://varscan.sourceforge.net/
ulz_peter is offline   Reply With Quote
Old 09-12-2011, 02:36 AM   #6
ashwatha
Member
 
Location: India

Join Date: Jul 2011
Posts: 14
Default

Thanks, Ketan and Peter!
ashwatha is offline   Reply With Quote
Old 10-07-2013, 07:03 AM   #7
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default

Thank you for the valuable thread. I have some more query for which I need some suggestions, I am new to GATK and want to use it for my exome sequencing data analysis. I have been a bit lost reading all the blogs , comments and the technical forums. So here is something I want to say and please correct and guide me through the procedure. I have downloaded the hg19 files from the UCSC browser and created the reference genome but do I need to again use the one which is there in GATK repository and then align my samples for downstream analysis? Also I want to run the GATK in my institute cluster. So if am not wrong I should create the directory of the latest GATK version and transfer all the necessary files via Filezilla in the cluster directory with the same name. Now this I have already done. So next thing is to download the bundle from the repository where I see 2 versions , so which one should I download? 2.5 or 2.3? Also once I download the bundle do I have to download anything else? So here it is which I should be downloading right in my cluster. The jar file and the resource folder with the .java files and then in the main directory of the GATK version folder in my cluster I should download the bundle version (2.5 or 2.3) and then unzip all the files that are there in the bundle directory. Right? Please let me know. Then I should be ready to use the GATK for the different downstream processes listed below:

Identify target regions for realignment (Genome Analysis Toolkit) ->Realign BAM to get better Indel calling (Genome Analysis Toolkit) ->Reindex the realigned BAM (SAM Tools) ->Call Indels (Genome Analysis Toolkit) ->Call SNPs (Genome Analysis Toolkit)->View aligned reads in BAM/BAI (Integrated Genome Viewer)

Please let me know if this looks correct or not. The VCF files from the 1kG and the DBSNP are already there in compressed form in the bundle repository of the GATK website which I am currently downloading and I can use them directly after unzipping them.
vd4mindia is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO