Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blast "returned non-zero exit status 137" error, memory problems

    Hi everyone!

    I'm having trouble while running blast locally for a big database, and I hope someone can help me with this.

    I'm running blastp through biopython NcbiblastpCommandline function and I got a Bio.Application.ApplicationError: "Command 'blastp -out ../Results/blastDBs/all_genomes_prot_db_blast_results.xml -outfmt 5 -query ../faa_files/all_genomes_prot.fasta -db ../Results/blastDBs/all_genomes_prot_db -evalue 0.001' returned non-zero exit status 137, 'Killed'".
    The same code worked in a smaller database and query (25 genomes against the same 25) but in this case I want to run it to 74 genomes, what takes a lot of time (40h until the error). After searching about the error, it seems to be about the system run out of memory.
    I thought it could be because blastp is not writing in the XML file as the blast is running. It only writes the results in the end, so, all the results stay allocated in memory. I'm running it in a server by a VPN connection, and this only happens there. By running the same code in my pc, I can see the size of XML file increasing while the blast is. The blast versions are different (2.2.26+ in the server and 2.2.28+ in my pc) so I updated the version in the server to 2.2.30+, but it's still doing the same. And it's not because of python or biopython, because if I run blast directly from command line, the same happens (the size of the xml file is 0 while blast is running).

    Someone have a clue why it happens? Why the size of the xml file does not increase in the server but it does in my pc? Is it the cause the system going out of memory, causing the error? Does anyone knows if there's any configuration for BLAST to start writing to the output file during the execution instead of keeping the data in memory until the end?

    Please help me!

  • #2
    Can you provide some additional details about the hardware configurations, size of the query and database files?

    Comment


    • #3
      If you are searching with a multi-fasta file blast should be writing the output file as it goes through the queries. If you are using some sort of job scheduler on the server then it may be buffering the output, which is why you don't see it and it may never get written to a file once the job is killed.
      Last edited by GenoMax; 05-20-2015, 10:33 AM.

      Comment


      • #4
        It is a .fasta file made by concatenation of the individual .faa files of genomes. And this file is used to create the blast database and also to query it. To 74 genomes, it has something like 55 Mb. But, no matter what size it has, running it in the server the .xml file remains empty until the blast ends, but running the same in my pc, we can see the size of .xml file increasing. Running to 25 genomes, it works well in the server, although the problem of writing in xml file only after blast ends to run remains (in this case it's not a problem because it's not killed).
        I'm running it on the server because in my pc it becomes impractical to a high number of genomes. I can't understand why, running the same code, in the server it doesn't write the output while running blast and in the pc it does.

        Comment


        • #5
          Based on your last post it sounds like the first 24 or so genomes are ok but the 25th or 26th must be causing the problem? Ideally if you have access to a cluster, running these genomes in parallel as independent jobs would be the most efficient (or as separate jobs on the server, so you will have the output from those genome jobs that do finish on hand). You can combine the data later.

          It sounds like you are not running the job under a scheduler (e.g. SGE, LSF etc) on the server.

          Comment


          • #6
            It's not because of a genome, but because the high number of genomes. I need to run all vs. all genomes because I want to find all the similar sequences between the genomes (and not only the best match for each query sequence). But I found that I have not enough space in the directory in the server. Maybe it is what made the server kill the process: when it tried to write in xml file, it wasn't enough space and crashed. I will try to run it in a new directory with more space and see if I can obtain my results.
            Still, the problem (I'm not sure if it's a problem, but in my pc it doesn't happen) of not writing in xml file while running blast still remains... And it's not because of which genomes, because it happens to every file I run, no matter if it has 5 genomes, 10, 25 or 75...

            I hope it's just a space problem and I can get my results soon. I'll give news! Thanks anyway for your contribution!

            Comment


            • #7
              Well... It seems that in the new directory in the server, the size of the xml file is rising while the blast is running! I don't know why now it's working and in the old directory it doesn't... So, I hope that I can run my scripts to the huge 74 genomes dataset!

              Thanks a lot!

              Comment


              • #8
                Even though you are doing an all vs all comparison blast is only using one query sequence at one time against the genome pool (unless I am misunderstanding something). So submitting those jobs in parallel or serial is going to give you the same result.

                If you know for sure that the problem was disk space then you have a path forward.

                BTW: If these are bacterial genomes (related) then you may want to look at Mauve as an alternative.

                Comment


                • #9
                  Yeah I can submit serial jobs, but the size of database has to remain the same anyway. Maybe it's a good alternative if it's still not working... And yes, these are bacterial genomes, but I want to get the genes in common, and not align the complete genomes. I'm not sure what Mauve does exactly, but I'll take a look!

                  Thanks!

                  Comment


                  • #10
                    Just to confirm that my scripts are running well, and I already have results to the 74 genomes dataset! It seems that it was because the space in disk, not a memory problem!

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin


                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                      Today, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    37 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    41 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    35 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    54 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X