SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SNPeff annotations diego diaz Bioinformatics 0 11-23-2015 11:00 AM
snpeff with transdecoder vinaydu RNA Sequencing 0 01-06-2015 12:38 AM
snpeff error? bongbimit Bioinformatics 1 08-06-2014 03:10 AM
snpeff mmmm Bioinformatics 3 04-04-2014 06:54 AM
snpeff warnings mmmm Bioinformatics 2 01-23-2014 04:57 AM

Reply
 
Thread Tools
Old 03-13-2017, 09:11 AM   #1
capricy
Senior Member
 
Location: 63130

Join Date: Apr 2012
Posts: 125
Default snpeff : ERROR_CHROMOSOME_NOT_FOUND

Hello,

I got an issue with my bacterial snp annotation, that is, chromosome_not_found error. I did snp call using kSNP3.0.


Here are several lines of my vcf file:
====
##fileformat=VCFv4.0
##Reference genome=GCA_000008865_1_ASM886v1_genomic
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GCA_000006665_1_ASM666v1_genomic GCA_000008865_1_ASM886v1_genomic
1 494154 AAAAAAACCG.AGCGCAAATA_R G A . . NS=205;AF=0.003 GT 0 0
1 40998 AAAAAAAGCC.TCTTCGTCGC_F T C . . NS=196;AF=0.003 GT 0 0
1 4531974 AAAAAAAGCG.AAATCTGGCA_F C T . . NS=212;AF=0.003 GT 0 0
1 18983 AAAAAAATAG.GCTTCCAGGG_R G T . . NS=197;AF=0.003 GT 0 0
1 1477420 AAAAAAATCC.GCTGCCGATA_R T C . . NS=200;AF=0.006 GT 0 0
1 1013276 AAAAAACCCA.CAACCTTGAA_F T C . . NS=200;AF=0.003 GT 0 0
1 254058 AAAAAACCCA.GGCGGGCGTT_R T C . . NS=206;AF=0.003 GT 0 0
1 461873 AAAAAACCGG.AAACCGGACT_F A C . . NS=167;AF=0.003 GT 0 0
1 2363022 AAAAAACCGG.CAGTTTGAGC_R T C . . NS=181;AF=0.006 GT 0 0
1 494140 AAAAAACCGT.GCGTATTTGC_F G A . . NS=204;AF=0.003 GT 0 0
===

here is my snp call reference ( GCA_000008865_1_ASM886) headers:
===
[login-node04 databaseWorkingO17]$ grep ">" GCA_000008865.1_ASM886v1_genomic.fna
>BA000007.2 Escherichia coli O157:H7 str. Sakai DNA, complete genome
>AB011549.2 Escherichia coli O157:H7 str. Sakai plasmid pO157 DNA, complete sequence
>AB011548.2 Escherichia coli O157:H7 str. Sakai plasmid pOSAK1 DNA, complete sequence
===

Here is the snpeff data bin file (downloaded from snpeff database):
===
[login-node03 Escherichia_coli_o157_h7_str_sakai]$ gunzip -c snpEffectPredictor.bin|head -20
SnpEff 4.3
CHROMOSOME 2 1 0 5498449 Chromosome false
CHROMOSOME 3 1 0 92720 pO157 false
CHROMOSOME 4 1 0 3305 pOSAK1 false
GENOME 1 -1 0 2147483647 Escherichia_coli_o157_h7_str_sakai false Escherichia_coli_o157_h7_str_sakai Escherichia_coli_o157_h7_str_sakai 2,3,4
EXON 7 6 725366 725474 EBG00001089957-1 false cccaaaagaaaaccctcaccgtcaggcggcgagggtttaactcacatgatgatactgactgttgctcactctttgaagtgatttgcgtcacattcagggaattcctcaa -1 1 cccaaaagaaaaccctcaccgtcaggcggcgagggtttaactcacatgatgatactgactgttgctcactctttgaagtgatttgcgtcacattcagggaattcctcaa RETAINED
TRANSCRIPT 6 5 725366 725474 EBT00001692105 false 7 lincRNA false false false false false 1 -1 -1
GENE 5 2 725366 725474 EBG00001089957 false 6 rnk_leader lincRNA
EXON 10 9 2143588 2143965 BAB35560-1 false atggttaatcagaagaaagatcgtctgcttaacgagtatctgtctccgctggatattaccgcggcacagtttaaggtgctctgctctatccgctgcgcggcgtgtattactccggttgaactgaaaaaagtgttgtcggtcgacctgggagcactgacccgtatgctggatcgcctggtctgtaaaggctgggtagaaaggttgccgaacccgaatgataagcgcggcgtactggtaaaacttaccaccagcggcgcggcaatatgtgaacaatgccatcaattagttggccaggacctgcatcaagaattaacaaaaaacctgacggcggacgaagtggcaacacttgagcatttgcttaagaaagtcctgccgtaa 0 1 atggttaatcagaagaaagatcgtctgcttaacgagtatctgtctccgctggatattaccgcggcacagtttaaggtgctctgctctatccgctgcgcggcgtgtattactccggttgaactgaaaaaagtgttgtcggtcgacctgggagcactgacccgtatgctggatcgcctggtctgtaaaggctgggtagaaaggttgccgaacccgaatgataagcgcggcgtactggtaaaacttaccaccagcggcgcggcaatatgtgaacaatgccatcaattagttggccaggacctgcatcaagaattaacaaaaaacctgacggcggacgaagtggcaacacttgagcatttgcttaagaaagtcctgccgtaa RETAINED
CDS 11 9 2143588 2143965 CDS_Chromosome_2143589_2143963 false 0
TRANSCRIPT 9 8 2143588 2143965 BAB35560 false 10 protein_coding true false true false false 1 -1 -1 11
GENE 8 2 2143588 2143965 BAB35560 false 9 ECs2137 protein_coding
===

Some online thread mentioned that when snp call reference has inconsistency with the snpEff database, such error would occur. Is there an easy way to modify the .vcf file to get around it?


Thank you very much for your time.

C.
capricy is offline   Reply With Quote
Old 03-13-2017, 10:25 AM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

A warning flag for me is that the VCF file contains a chromosome name of "1" rather than the SNP call header names ("BA000007.2", etc).
gringer is offline   Reply With Quote
Old 03-13-2017, 10:36 AM   #3
capricy
Senior Member
 
Location: 63130

Join Date: Apr 2012
Posts: 125
Default

do you mean the first column?

This indeed is my first time to handle the .vcf file. But I did read the document and the all the examples I have found has number of chromosome instead of the specific name in the first column.

Something wrong with my understanding?

I am actually confused about snpEff database/ensembl record: it starts with chromosome 2, but not chromosome 1...
capricy is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:14 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO