Hi guys,
I have to find SNPs and indels on my sequencing data of Candida albicans SC5314. I'm using Candida A. SC5314 Assembly 21 as a reference. Since the sequencing has been carried out using IonTorrent, I have to be particularly careful (especially with indels) so I'm trying to follow the "best practices" GATK pipeline. In more than one step is strongly advised to use a 'known sites' SNP file, to help GATK realign and recalibrate scores.
I searched a lot, and the only SNP file I could obtain for Assembly 21 was from the Broad Institute (http://genome.tbdb.org/annotation/ge...Downloads.html, second to last link); other resources (dbsnp etc) provided snps for assembly 19, which is organized into supercontigs and not into chromosomes like assembly 21.
Now, here's the problem: I don't know which format this file is, certainly not vcf (see attached example). Do you have any idea? Is there any way to extract all the info needed to write a vcf from this thing?
Thank you very much.
Mauro
I have to find SNPs and indels on my sequencing data of Candida albicans SC5314. I'm using Candida A. SC5314 Assembly 21 as a reference. Since the sequencing has been carried out using IonTorrent, I have to be particularly careful (especially with indels) so I'm trying to follow the "best practices" GATK pipeline. In more than one step is strongly advised to use a 'known sites' SNP file, to help GATK realign and recalibrate scores.
I searched a lot, and the only SNP file I could obtain for Assembly 21 was from the Broad Institute (http://genome.tbdb.org/annotation/ge...Downloads.html, second to last link); other resources (dbsnp etc) provided snps for assembly 19, which is organized into supercontigs and not into chromosomes like assembly 21.
Now, here's the problem: I don't know which format this file is, certainly not vcf (see attached example). Do you have any idea? Is there any way to extract all the info needed to write a vcf from this thing?
Thank you very much.
Mauro