Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • snpEff eff with built genome database

    Hello,

    I am having trouble annotating my vcf file using a database that I built. It seems as though snpEff wants to go to sourceforge.net to retieve a built database. Can someone help me identify what my problem is?

    Here is my code
    java -Xmx4g -jar /pkgs/snpeff-4.3.1p-1/share/snpeff-4.3.1p-1/snpEff.jar eff -c /pkgs/snpeff-4.3.1p-1/share/snpeff-4.3.1p-1/snpEff.config Sorghum EMSmutboth.vcf > var.ann.vcf

    Here is the error I continue to receive:
    ERROR while connecting to http://downloads.sourceforge.net/pro..._3_Sorghum.zip
    java.lang.RuntimeException: java.lang.RuntimeException: File not found on the server. Make sure the database name is correct.
    at org.snpeff.util.Download.download(Download.java:178)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.downloadAndInstall(SnpEffCmdDownload.java:32)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.runDownloadGenome(SnpEffCmdDownload.java:86)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.run(SnpEffCmdDownload.java:72)
    at org.snpeff.SnpEff.run(SnpEff.java:1221)
    at org.snpeff.SnpEff.loadDb(SnpEff.java:515)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:998)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:981)
    at org.snpeff.SnpEff.run(SnpEff.java:1183)
    at org.snpeff.SnpEff.main(SnpEff.java:162)
    Caused by: java.lang.RuntimeException: File not found on the server. Make sure the database name is correct.
    at org.snpeff.util.Download.download(Download.java:127)
    ... 9 more
    java.lang.RuntimeException: Genome download failed!
    at org.snpeff.SnpEff.loadDb(SnpEff.java:516)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:998)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:981)
    at org.snpeff.SnpEff.run(SnpEff.java:1183)
    at org.snpeff.SnpEff.main(SnpEff.java:162)

    I built a database that seemed to work, so I am not sure what information I am missing. When I built the database I used the following command:
    java -jar location/to/snpEff.jar build -gffs -v Sorghum

    Thank you for any help

  • #2
    Hello,

    how does you config file looks like? I guess snpEff cannot find your database and tries to download it.

    Be sure to link to your local database in the config file or use the -dataDir flag to specify the folder where the database is located.

    fin swimmer

    Comment


    • #3
      fin swimmer,

      Thank you for your reply.

      This is what I have in my config file:

      #Sorghum bicolor genome, version 3.0.1
      Sorghum.genome : Sorghum

      And this is the error I am getting when running snpEff eff:
      Reading configuration file '/home/pkgs/snpeff-4.3.1p-1/share/snpeff-4.3.1p-1/snpEff.config'. Genome: 'Sorghum'
      00:00:00 Reading config file: /home/pkgs/snpeff-4.3.1p-1/share/snpeff-4.3.1p-1/snpEff.config
      00:00:01 done
      00:00:01 Reading database for genome version 'Sorghum' from file '/home/pkgs/snpeff-4.3.1p-1/share/snpeff-4.3.1p-1/./data/Sorghum/snpEffectPredictor.bin' (this might take a while)
      00:00:01 Database not installed

      It seems that the snpEffectPredictor.bin file is not created. Which leads me to believe my database wasn't built correctly. However when I build the database I do not receive and major errors of a failure to build database.

      Are you familiar with this problem?
      Thanks for replying to original thread, I hope you know what is causing this error,
      htetre

      Comment


      • #4
        Hello htetre,

        snpeff tries to find your database in this path:
        /home/pkgs/snpeff-4.3.1p-1/share/snpeff-4.3.1p-1/./data/Sorghum/snpEffectPredictor.bin
        As you can see there is a '.' in the path. You can define a absolute path in the config file for data_dir or use -dataDir flag as i suggest in my first post to link to the correct directory.

        fin swimmer

        Comment


        • #5
          Thank you fin swimmer,

          I have changed that now, thanks to you. Now it is definitely going to the correct path. However I have determined that my database was never built correctly in that I do not have this snpEffectPredictor.bin file. When I try to 'find' it anywhere on the cluster it is not present.

          Have you heard of that problem? Executing the following:
          java -jar snpEff.jar build -gff3 -v Sorghum
          starts the program and it is finding my genes.gff file but it is not following through and creating a predictor file.

          Thank you
          htetre

          Comment


          • #6
            Hello,

            i've never build a new database, so I cannot help here. Do you really need to build your own database? snpEff seems to have a prebuild database for Sorghum.

            fin swimmer

            Comment


            • #7
              Well, up to now my vcf file is based on the most recent Sorghum genome assembly and that assembly is not part of snpeff. But with the difficulty I am having to build genome database I'm thinking of redoing with an older assembly version that is part of the snpeff available databases. I was trying not to but seems as though the problem I am having is not common or its unfamiliar because I havent been able to find information.

              Thanks
              Hannah

              Comment


              • #8
                Hello,

                I have found my problem, and it resides in the gff file I have been using. It only contained the gene not gene_exon information. Once I used the correct gff file the genome database was built.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                23 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                21 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X