Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • snpeff : ERROR_CHROMOSOME_NOT_FOUND

    Hello,

    I got an issue with my bacterial snp annotation, that is, chromosome_not_found error. I did snp call using kSNP3.0.


    Here are several lines of my vcf file:
    ====
    ##fileformat=VCFv4.0
    ##Reference genome=GCA_000008865_1_ASM886v1_genomic
    ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
    ##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GCA_000006665_1_ASM666v1_genomic GCA_000008865_1_ASM886v1_genomic
    1 494154 AAAAAAACCG.AGCGCAAATA_R G A . . NS=205;AF=0.003 GT 0 0
    1 40998 AAAAAAAGCC.TCTTCGTCGC_F T C . . NS=196;AF=0.003 GT 0 0
    1 4531974 AAAAAAAGCG.AAATCTGGCA_F C T . . NS=212;AF=0.003 GT 0 0
    1 18983 AAAAAAATAG.GCTTCCAGGG_R G T . . NS=197;AF=0.003 GT 0 0
    1 1477420 AAAAAAATCC.GCTGCCGATA_R T C . . NS=200;AF=0.006 GT 0 0
    1 1013276 AAAAAACCCA.CAACCTTGAA_F T C . . NS=200;AF=0.003 GT 0 0
    1 254058 AAAAAACCCA.GGCGGGCGTT_R T C . . NS=206;AF=0.003 GT 0 0
    1 461873 AAAAAACCGG.AAACCGGACT_F A C . . NS=167;AF=0.003 GT 0 0
    1 2363022 AAAAAACCGG.CAGTTTGAGC_R T C . . NS=181;AF=0.006 GT 0 0
    1 494140 AAAAAACCGT.GCGTATTTGC_F G A . . NS=204;AF=0.003 GT 0 0
    ===

    here is my snp call reference ( GCA_000008865_1_ASM886) headers:
    ===
    [login-node04 databaseWorkingO17]$ grep ">" GCA_000008865.1_ASM886v1_genomic.fna
    >BA000007.2 Escherichia coli O157:H7 str. Sakai DNA, complete genome
    >AB011549.2 Escherichia coli O157:H7 str. Sakai plasmid pO157 DNA, complete sequence
    >AB011548.2 Escherichia coli O157:H7 str. Sakai plasmid pOSAK1 DNA, complete sequence
    ===

    Here is the snpeff data bin file (downloaded from snpeff database):
    ===
    [login-node03 Escherichia_coli_o157_h7_str_sakai]$ gunzip -c snpEffectPredictor.bin|head -20
    SnpEff 4.3
    CHROMOSOME 2 1 0 5498449 Chromosome false
    CHROMOSOME 3 1 0 92720 pO157 false
    CHROMOSOME 4 1 0 3305 pOSAK1 false
    GENOME 1 -1 0 2147483647 Escherichia_coli_o157_h7_str_sakai false Escherichia_coli_o157_h7_str_sakai Escherichia_coli_o157_h7_str_sakai 2,3,4
    EXON 7 6 725366 725474 EBG00001089957-1 false cccaaaagaaaaccctcaccgtcaggcggcgagggtttaactcacatgatgatactgactgttgctcactctttgaagtgatttgcgtcacattcagggaattcctcaa -1 1 cccaaaagaaaaccctcaccgtcaggcggcgagggtttaactcacatgatgatactgactgttgctcactctttgaagtgatttgcgtcacattcagggaattcctcaa RETAINED
    TRANSCRIPT 6 5 725366 725474 EBT00001692105 false 7 lincRNA false false false false false 1 -1 -1
    GENE 5 2 725366 725474 EBG00001089957 false 6 rnk_leader lincRNA
    EXON 10 9 2143588 2143965 BAB35560-1 false atggttaatcagaagaaagatcgtctgcttaacgagtatctgtctccgctggatattaccgcggcacagtttaaggtgctctgctctatccgctgcgcggcgtgtattactccggttgaactgaaaaaagtgttgtcggtcgacctgggagcactgacccgtatgctggatcgcctggtctgtaaaggctgggtagaaaggttgccgaacccgaatgataagcgcggcgtactggtaaaacttaccaccagcggcgcggcaatatgtgaacaatgccatcaattagttggccaggacctgcatcaagaattaacaaaaaacctgacggcggacgaagtggcaacacttgagcatttgcttaagaaagtcctgccgtaa 0 1 atggttaatcagaagaaagatcgtctgcttaacgagtatctgtctccgctggatattaccgcggcacagtttaaggtgctctgctctatccgctgcgcggcgtgtattactccggttgaactgaaaaaagtgttgtcggtcgacctgggagcactgacccgtatgctggatcgcctggtctgtaaaggctgggtagaaaggttgccgaacccgaatgataagcgcggcgtactggtaaaacttaccaccagcggcgcggcaatatgtgaacaatgccatcaattagttggccaggacctgcatcaagaattaacaaaaaacctgacggcggacgaagtggcaacacttgagcatttgcttaagaaagtcctgccgtaa RETAINED
    CDS 11 9 2143588 2143965 CDS_Chromosome_2143589_2143963 false 0
    TRANSCRIPT 9 8 2143588 2143965 BAB35560 false 10 protein_coding true false true false false 1 -1 -1 11
    GENE 8 2 2143588 2143965 BAB35560 false 9 ECs2137 protein_coding
    ===

    Some online thread mentioned that when snp call reference has inconsistency with the snpEff database, such error would occur. Is there an easy way to modify the .vcf file to get around it?


    Thank you very much for your time.

    C.

  • #2
    A warning flag for me is that the VCF file contains a chromosome name of "1" rather than the SNP call header names ("BA000007.2", etc).

    Comment


    • #3
      do you mean the first column?

      This indeed is my first time to handle the .vcf file. But I did read the document and the all the examples I have found has number of chromosome instead of the specific name in the first column.

      Something wrong with my understanding?

      I am actually confused about snpEff database/ensembl record: it starts with chromosome 2, but not chromosome 1...

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X