Hello,
I got an issue with my bacterial snp annotation, that is, chromosome_not_found error. I did snp call using kSNP3.0.
Here are several lines of my vcf file:
====
##fileformat=VCFv4.0
##Reference genome=GCA_000008865_1_ASM886v1_genomic
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GCA_000006665_1_ASM666v1_genomic GCA_000008865_1_ASM886v1_genomic
1 494154 AAAAAAACCG.AGCGCAAATA_R G A . . NS=205;AF=0.003 GT 0 0
1 40998 AAAAAAAGCC.TCTTCGTCGC_F T C . . NS=196;AF=0.003 GT 0 0
1 4531974 AAAAAAAGCG.AAATCTGGCA_F C T . . NS=212;AF=0.003 GT 0 0
1 18983 AAAAAAATAG.GCTTCCAGGG_R G T . . NS=197;AF=0.003 GT 0 0
1 1477420 AAAAAAATCC.GCTGCCGATA_R T C . . NS=200;AF=0.006 GT 0 0
1 1013276 AAAAAACCCA.CAACCTTGAA_F T C . . NS=200;AF=0.003 GT 0 0
1 254058 AAAAAACCCA.GGCGGGCGTT_R T C . . NS=206;AF=0.003 GT 0 0
1 461873 AAAAAACCGG.AAACCGGACT_F A C . . NS=167;AF=0.003 GT 0 0
1 2363022 AAAAAACCGG.CAGTTTGAGC_R T C . . NS=181;AF=0.006 GT 0 0
1 494140 AAAAAACCGT.GCGTATTTGC_F G A . . NS=204;AF=0.003 GT 0 0
===
here is my snp call reference ( GCA_000008865_1_ASM886) headers:
===
[login-node04 databaseWorkingO17]$ grep ">" GCA_000008865.1_ASM886v1_genomic.fna
>BA000007.2 Escherichia coli O157:H7 str. Sakai DNA, complete genome
>AB011549.2 Escherichia coli O157:H7 str. Sakai plasmid pO157 DNA, complete sequence
>AB011548.2 Escherichia coli O157:H7 str. Sakai plasmid pOSAK1 DNA, complete sequence
===
Here is the snpeff data bin file (downloaded from snpeff database):
===
[login-node03 Escherichia_coli_o157_h7_str_sakai]$ gunzip -c snpEffectPredictor.bin|head -20
SnpEff 4.3
CHROMOSOME 2 1 0 5498449 Chromosome false
CHROMOSOME 3 1 0 92720 pO157 false
CHROMOSOME 4 1 0 3305 pOSAK1 false
GENOME 1 -1 0 2147483647 Escherichia_coli_o157_h7_str_sakai false Escherichia_coli_o157_h7_str_sakai Escherichia_coli_o157_h7_str_sakai 2,3,4
EXON 7 6 725366 725474 EBG00001089957-1 false cccaaaagaaaaccctcaccgtcaggcggcgagggtttaactcacatgatgatactgactgttgctcactctttgaagtgatttgcgtcacattcagggaattcctcaa -1 1 cccaaaagaaaaccctcaccgtcaggcggcgagggtttaactcacatgatgatactgactgttgctcactctttgaagtgatttgcgtcacattcagggaattcctcaa RETAINED
TRANSCRIPT 6 5 725366 725474 EBT00001692105 false 7 lincRNA false false false false false 1 -1 -1
GENE 5 2 725366 725474 EBG00001089957 false 6 rnk_leader lincRNA
EXON 10 9 2143588 2143965 BAB35560-1 false atggttaatcagaagaaagatcgtctgcttaacgagtatctgtctccgctggatattaccgcggcacagtttaaggtgctctgctctatccgctgcgcggcgtgtattactccggttgaactgaaaaaagtgttgtcggtcgacctgggagcactgacccgtatgctggatcgcctggtctgtaaaggctgggtagaaaggttgccgaacccgaatgataagcgcggcgtactggtaaaacttaccaccagcggcgcggcaatatgtgaacaatgccatcaattagttggccaggacctgcatcaagaattaacaaaaaacctgacggcggacgaagtggcaacacttgagcatttgcttaagaaagtcctgccgtaa 0 1 atggttaatcagaagaaagatcgtctgcttaacgagtatctgtctccgctggatattaccgcggcacagtttaaggtgctctgctctatccgctgcgcggcgtgtattactccggttgaactgaaaaaagtgttgtcggtcgacctgggagcactgacccgtatgctggatcgcctggtctgtaaaggctgggtagaaaggttgccgaacccgaatgataagcgcggcgtactggtaaaacttaccaccagcggcgcggcaatatgtgaacaatgccatcaattagttggccaggacctgcatcaagaattaacaaaaaacctgacggcggacgaagtggcaacacttgagcatttgcttaagaaagtcctgccgtaa RETAINED
CDS 11 9 2143588 2143965 CDS_Chromosome_2143589_2143963 false 0
TRANSCRIPT 9 8 2143588 2143965 BAB35560 false 10 protein_coding true false true false false 1 -1 -1 11
GENE 8 2 2143588 2143965 BAB35560 false 9 ECs2137 protein_coding
===
Some online thread mentioned that when snp call reference has inconsistency with the snpEff database, such error would occur. Is there an easy way to modify the .vcf file to get around it?
Thank you very much for your time.
C.
I got an issue with my bacterial snp annotation, that is, chromosome_not_found error. I did snp call using kSNP3.0.
Here are several lines of my vcf file:
====
##fileformat=VCFv4.0
##Reference genome=GCA_000008865_1_ASM886v1_genomic
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GCA_000006665_1_ASM666v1_genomic GCA_000008865_1_ASM886v1_genomic
1 494154 AAAAAAACCG.AGCGCAAATA_R G A . . NS=205;AF=0.003 GT 0 0
1 40998 AAAAAAAGCC.TCTTCGTCGC_F T C . . NS=196;AF=0.003 GT 0 0
1 4531974 AAAAAAAGCG.AAATCTGGCA_F C T . . NS=212;AF=0.003 GT 0 0
1 18983 AAAAAAATAG.GCTTCCAGGG_R G T . . NS=197;AF=0.003 GT 0 0
1 1477420 AAAAAAATCC.GCTGCCGATA_R T C . . NS=200;AF=0.006 GT 0 0
1 1013276 AAAAAACCCA.CAACCTTGAA_F T C . . NS=200;AF=0.003 GT 0 0
1 254058 AAAAAACCCA.GGCGGGCGTT_R T C . . NS=206;AF=0.003 GT 0 0
1 461873 AAAAAACCGG.AAACCGGACT_F A C . . NS=167;AF=0.003 GT 0 0
1 2363022 AAAAAACCGG.CAGTTTGAGC_R T C . . NS=181;AF=0.006 GT 0 0
1 494140 AAAAAACCGT.GCGTATTTGC_F G A . . NS=204;AF=0.003 GT 0 0
===
here is my snp call reference ( GCA_000008865_1_ASM886) headers:
===
[login-node04 databaseWorkingO17]$ grep ">" GCA_000008865.1_ASM886v1_genomic.fna
>BA000007.2 Escherichia coli O157:H7 str. Sakai DNA, complete genome
>AB011549.2 Escherichia coli O157:H7 str. Sakai plasmid pO157 DNA, complete sequence
>AB011548.2 Escherichia coli O157:H7 str. Sakai plasmid pOSAK1 DNA, complete sequence
===
Here is the snpeff data bin file (downloaded from snpeff database):
===
[login-node03 Escherichia_coli_o157_h7_str_sakai]$ gunzip -c snpEffectPredictor.bin|head -20
SnpEff 4.3
CHROMOSOME 2 1 0 5498449 Chromosome false
CHROMOSOME 3 1 0 92720 pO157 false
CHROMOSOME 4 1 0 3305 pOSAK1 false
GENOME 1 -1 0 2147483647 Escherichia_coli_o157_h7_str_sakai false Escherichia_coli_o157_h7_str_sakai Escherichia_coli_o157_h7_str_sakai 2,3,4
EXON 7 6 725366 725474 EBG00001089957-1 false cccaaaagaaaaccctcaccgtcaggcggcgagggtttaactcacatgatgatactgactgttgctcactctttgaagtgatttgcgtcacattcagggaattcctcaa -1 1 cccaaaagaaaaccctcaccgtcaggcggcgagggtttaactcacatgatgatactgactgttgctcactctttgaagtgatttgcgtcacattcagggaattcctcaa RETAINED
TRANSCRIPT 6 5 725366 725474 EBT00001692105 false 7 lincRNA false false false false false 1 -1 -1
GENE 5 2 725366 725474 EBG00001089957 false 6 rnk_leader lincRNA
EXON 10 9 2143588 2143965 BAB35560-1 false atggttaatcagaagaaagatcgtctgcttaacgagtatctgtctccgctggatattaccgcggcacagtttaaggtgctctgctctatccgctgcgcggcgtgtattactccggttgaactgaaaaaagtgttgtcggtcgacctgggagcactgacccgtatgctggatcgcctggtctgtaaaggctgggtagaaaggttgccgaacccgaatgataagcgcggcgtactggtaaaacttaccaccagcggcgcggcaatatgtgaacaatgccatcaattagttggccaggacctgcatcaagaattaacaaaaaacctgacggcggacgaagtggcaacacttgagcatttgcttaagaaagtcctgccgtaa 0 1 atggttaatcagaagaaagatcgtctgcttaacgagtatctgtctccgctggatattaccgcggcacagtttaaggtgctctgctctatccgctgcgcggcgtgtattactccggttgaactgaaaaaagtgttgtcggtcgacctgggagcactgacccgtatgctggatcgcctggtctgtaaaggctgggtagaaaggttgccgaacccgaatgataagcgcggcgtactggtaaaacttaccaccagcggcgcggcaatatgtgaacaatgccatcaattagttggccaggacctgcatcaagaattaacaaaaaacctgacggcggacgaagtggcaacacttgagcatttgcttaagaaagtcctgccgtaa RETAINED
CDS 11 9 2143588 2143965 CDS_Chromosome_2143589_2143963 false 0
TRANSCRIPT 9 8 2143588 2143965 BAB35560 false 10 protein_coding true false true false false 1 -1 -1 11
GENE 8 2 2143588 2143965 BAB35560 false 9 ECs2137 protein_coding
===
Some online thread mentioned that when snp call reference has inconsistency with the snpEff database, such error would occur. Is there an easy way to modify the .vcf file to get around it?
Thank you very much for your time.
C.
Comment