Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Annotating VCF files from GFF or .gbk files

    Hello all,

    I am trying to annotate VCF files using snpEff. I'm using a virus that doesn't have good gene annotations available so I had to build a custom database. That seemed to work well but when I actually try to do the annotation I'm not getting the amino acid changes called. Here is the output
    Code:
    weger@IDANGS01:~/snpEff$ java -jar snpEff.jar dengue_DR59_2 /data/ebel_lab/Weger/Experiment_27/36_Exp/test/323-minority.vcf > 323-minority_ann.vcf -noGenome
    Error: Error while processing VCF entry (line 59) :
            gb:AB122022|Organism:Dengue     3152    .       C       CA      36.83   PASS    SN=0;STA=3152;STO=3152;TYP=INS;R1P=198;R1M=195;R2P=0;R2M=0;PPC=0;LS=54650;MQS=15536;MQM=42;BQS=13814;BQM=36;EDS=15658;EDM=75;IDS=385516;IDM=992;COV=18847;MCOV=-1;CED=601;HMP=0;DP=18847;AF=0.0209;RAF=0.0209;SB=1.0000;DP4=9226,9228,198,195      GT:DP:AD:AF:RAF:SB:SC:PF        1:18847:393:0.0209:0.0209:1.0000:36.83:PASS
    java.lang.NullPointerException
    Error: Error while processing VCF entry (line 65) :
            gb:AB122022|Organism:Dengue     5915    .       C       CA      37.11   PASS    SN=0;STA=5915;STO=5915;TYP=INS;R1P=228;R1M=211;R2P=0;R2M=0;PPC=0;LS=61291;MQS=17579;MQM=42;BQS=15431;BQM=36;EDS=16893;EDM=75;IDS=431762;IDM=992;COV=7479;MCOV=-1;CED=601;HMP=0;DP=7479;AF=0.0587;RAF=0.0587;SB=0.9999;DP4=3512,3528,228,211        GT:DP:AD:AF:RAF:SB:SC:PF        1:7479:439:0.0587:0.0587:0.9999:37.11:PASS
    java.lang.NullPointerException
    Error: Error while processing VCF entry (line 75) :
            gb:AB122022|Organism:Dengue     8469    .       C       CA      36.46   PASS    SN=0;STA=8469;STO=8469;TYP=INS;R1P=59;R1M=56;R2P=0;R2M=0;PPC=0;LS=16215;MQS=4113;MQM=41;BQS=4059;BQM=36;EDS=5218;EDM=75;IDS=111026;IDM=992;COV=5471;MCOV=-1;CED=601;HMP=0;DP=5471;AF=0.0210;RAF=0.0210;SB=0.9999;DP4=2677,2679,59,56       GT:DP:AD:AF:RAF:SB:SC:PF        1:5471:115:0.0210:0.0210:0.9999:36.46:PASS
    java.lang.NullPointerException
    3 errors.
    The problem seems to be java.lang.NullPointerException. However, I'm not sure what this means.

    I have tried using different variant callers (lofreq and callvariants.sh). I have also tried annotating both a .gbk and a .gff3 file with the same result.

    Any thoughts?
    Last edited by GenoMax; 05-31-2017, 10:24 AM. Reason: Adding [CODE] tags

  • #2
    Additional info:

    I reran using the -v option

    The program is not detecting the coding region which is clearly in the .gbk file.
    #-----------------------------------------------
    # Genome name : 'Dengue virus'
    # Genome version : 'dengue_DR59'
    # Genome ID : 'dengue_DR59[0]'
    # Has protein coding info : false
    # Has Tr. Support Level info : false
    # Genes : 0
    # Protein coding genes : 0
    #-----------------------------------------------
    # Transcripts : 0
    # Avg. transcripts per gene : NaN
    #-----------------------------------------------
    # Checked transcripts :
    #-----------------------------------------------
    # Protein coding transcripts : 0
    #-----------------------------------------------
    # Cds : 0
    # Exons : 0
    # Exons with sequence : 0
    # Exons without sequence : 0
    # Avg. exons per transcript : NaN
    #-----------------------------------------------
    # Number of chromosomes : 0
    # Chromosomes : Format 'chromo_name size codon_table'

    # WARNING! : No protein coding transcripts found.
    #-----------------------------------------------

    Comment


    • #3
      Have you built a snpEff database from your genbank file (option 4 on this page)?

      Comment


      • #4
        @GenoMax, yep, from both genbank and gff3, they give the same error.

        Comment


        • #5
          Were any errors produced when you built the database using GBK files? Is your "chromosome name" matching in your alignments/VCF/annotation files?

          Comment


          • #6
            So I rebuilt the database, here is the output

            weger@IDANGS01:~/snpEff$ java -jar snpEff.jar build -genbank -v dengue_DR59_2
            00:00:00 SnpEff version SnpEff 4.3o (build 2017-05-21 09:40), by Pablo Cingolani
            00:00:00 Command: 'build'
            00:00:00 Building database for 'dengue_DR59_2'
            00:00:00 Reading configuration file 'snpEff.config'. Genome: 'dengue_DR59_2'
            00:00:00 Reading config file: /home/weger/snpEff/snpEff.config
            00:00:00 done
            Chromosome: 'AB122022' length: 10723

            Create exons from CDS (if needed): .
            Exons created for 1 transcripts.

            Deleting redundant exons (if needed):
            Total transcripts with deleted exons: 0

            Collapsing zero length introns (if needed):
            Total collapsed transcripts: 0
            Adding genomic sequences to exons: Done (1 sequences added, 0 ignored).

            Adjusting transcripts:
            Adjusting genes:
            Adjusting chromosomes lengths:
            Ranking exons:
            Create UTRs from CDS (if needed):
            Remove empty chromosomes:

            Marking as 'coding' from CDS information:
            Done: 0 transcripts marked
            00:00:00 Caracterizing exons by splicing (stage 1) :

            00:00:00 Caracterizing exons by splicing (stage 2) :
            00:00:00 done.
            00:00:00 [Optional] Rare amino acid annotations
            00:00:00 Warning: Cannot read optional protein sequence file '/home/weger/snpEff/./data/dengue_DR59_2/protein.fa', nothing done.
            00:00:00 Protein check file: '/home/weger/snpEff/./data/dengue_DR59_2/genes.gbk'

            00:00:00 Checking database using protein sequences
            00:00:00 Reading proteins from file '/home/weger/snpEff/./data/dengue_DR59_2/genes.gbk'...
            00:00:00 done (1 Proteins).
            00:00:00 Comparing Proteins...
            Labels:
            '+' : OK
            '.' : Missing
            '*' : Error
            +

            Protein check: dengue_DR59_2 OK: 1 Not found: 0 Errors: 0 Error percentage: 0.0%
            00:00:00 Saving database
            00:00:00 [Optional] Reading regulation elements: GFF
            00:00:00 Warning: Cannot read optional regulation file '/home/weger/snpEff/./data/dengue_DR59_2/regulation.gff', nothing done.
            00:00:00 [Optional] Reading regulation elements: BED
            00:00:00 Cannot find optional regulation dir '/home/weger/snpEff/./data/dengue_DR59_2/regulation.bed/', nothing done.
            00:00:00 [Optional] Reading motifs: GFF
            00:00:00 Warning: Cannot open PWMs file /home/weger/snpEff/./data/dengue_DR59_2/pwms.bin. Nothing done
            00:00:00 Done
            00:00:00 Logging
            00:00:01 Checking for updates...
            00:00:02 Done.

            At first it says it finds one transcript and then it says 0 are marked.

            Have you seen this before?

            Comment


            • #7
              I have not used snpEff in this particular fashion.

              I still have a feeling that there is a mismatch between your files (gb:AB122022|Organism: Dengue seems to be in your VCF (and BAM?) where as snpEff probably is using just AB122022.

              Comment


              • #8
                @GenoMax, (gb:AB122022|Organism: Dengue is the header for the fasta file. I will try to do a new alignment and variant calling with a fasta that has the exact same header as the .gbk. I'll let you know what happens. Thanks!

                Comment


                • #9
                  Unfortunately it does not change the result.
                  Code:
                  weger@IDANGS01:~/snpEff$ java -Xmx4g -jar snpEff.jar dengue_DR59_2 /data/ebel_lab/Weger/Experiment_27/36_Exp/test/323.vcf > 323_ann.vcf -noGenome
                  Error: Error while processing VCF entry (line 93) :
                          AB122022        3152    .       C       CA      36.96   PASS    SN=0;STA=3152;STO=3152;TYP=INS;R1P=126;R1M=101;R2P=0;R2M=0;PPC=0;LS=31259;MQS=8723;MQM=42;BQS=7920;BQM=36;EDS=8773;EDM=75;IDS=221837;IDM=992;COV=3602;MCOV=-1;CED=601;HMP=0;DP=3602;AF=0.0630;RAF=0.0630;SB=0.9996;DP4=1675,1700,126,101    GT:DP:AD:AF:RAF:SB:SC:PF        1:3602:227:0.0630:0.0630:0.9996:36.96:PASS
                  java.lang.NullPointerException
                  Error: Error while processing VCF entry (line 138) :
                          AB122022        5915    .       C       CA      37.37   PASS    SN=0;STA=5915;STO=5915;TYP=INS;R1P=128;R1M=115;R2P=0;R2M=0;PPC=0;LS=33701;MQS=9517;MQM=42;BQS=8444;BQM=36;EDS=9499;EDM=75;IDS=238270;IDM=992;COV=2101;MCOV=-1;CED=601;HMP=0;DP=2101;AF=0.1157;RAF=0.1157;SB=0.9998;DP4=923,935,128,115      GT:DP:AD:AF:RAF:SB:SC:PF        1:2101:243:0.1157:0.1157:0.9998:37.37:PASS
                  java.lang.NullPointerException
                  Error: Error while processing VCF entry (line 157) :
                          AB122022        6711    .       G       GT      36.58   PASS    SN=0;STA=6711;STO=6711;TYP=INS;R1P=73;R1M=102;R2P=0;R2M=0;PPC=0;LS=24461;MQS=6085;MQM=42;BQS=5861;BQM=36;EDS=7838;EDM=75;IDS=167900;IDM=992;COV=3686;MCOV=-1;CED=601;HMP=0;DP=3686;AF=0.0475;RAF=0.0475;SB=0.9994;DP4=1770,1741,73,102      GT:DP:AD:AF:RAF:SB:SC:PF        1:3686:175:0.0475:0.0475:0.9994:36.58:PASS
                  java.lang.NullPointerException
                  Error: Error while processing VCF entry (line 197) :
                          AB122022        10389   .       C       CA      43.47   PASS    SN=0;STA=10389;STO=10389;TYP=INS;R1P=207;R1M=213;R2P=0;R2M=0;PPC=0;LS=59099;MQS=13515;MQM=42;BQS=14722;BQM=36;EDS=17184;EDM=75;IDS=399426;IDM=992;COV=526;MCOV=-1;CED=334;HMP=0;DP=526;AF=0.7985;RAF=0.7985;SB=0.9999;DP4=56,50,207,213     GT:DP:AD:AF:RAF:SB:SC:PF        1:526:420:0.7985:0.7985:0.9999:43.47:PASS
                  java.lang.NullPointerException
                  4 errors.
                  Last edited by GenoMax; 06-01-2017, 04:49 AM.

                  Comment


                  • #10
                    Are you using Java 1.8?

                    Comment


                    • #11
                      Hmm, nope its 1.7.0_25. I wasn't aware that snpEff needed 1.8. I will update and get back to you with results. Thanks!

                      Comment


                      • #12
                        @Genomax, awesome, with those changes it worked! I really appreciate your help.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        29 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        31 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X