Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blast "returned non-zero exit status 137" error, memory problems

    Hi everyone!

    I'm having trouble while running blast locally for a big database, and I hope someone can help me with this.

    I'm running blastp through biopython NcbiblastpCommandline function and I got a Bio.Application.ApplicationError: "Command 'blastp -out ../Results/blastDBs/all_genomes_prot_db_blast_results.xml -outfmt 5 -query ../faa_files/all_genomes_prot.fasta -db ../Results/blastDBs/all_genomes_prot_db -evalue 0.001' returned non-zero exit status 137, 'Killed'".
    The same code worked in a smaller database and query (25 genomes against the same 25) but in this case I want to run it to 74 genomes, what takes a lot of time (40h until the error). After searching about the error, it seems to be about the system run out of memory.
    I thought it could be because blastp is not writing in the XML file as the blast is running. It only writes the results in the end, so, all the results stay allocated in memory. I'm running it in a server by a VPN connection, and this only happens there. By running the same code in my pc, I can see the size of XML file increasing while the blast is. The blast versions are different (2.2.26+ in the server and 2.2.28+ in my pc) so I updated the version in the server to 2.2.30+, but it's still doing the same. And it's not because of python or biopython, because if I run blast directly from command line, the same happens (the size of the xml file is 0 while blast is running).

    Someone have a clue why it happens? Why the size of the xml file does not increase in the server but it does in my pc? Is it the cause the system going out of memory, causing the error? Does anyone knows if there's any configuration for BLAST to start writing to the output file during the execution instead of keeping the data in memory until the end?

    Please help me!

  • #2
    Can you provide some additional details about the hardware configurations, size of the query and database files?

    Comment


    • #3
      If you are searching with a multi-fasta file blast should be writing the output file as it goes through the queries. If you are using some sort of job scheduler on the server then it may be buffering the output, which is why you don't see it and it may never get written to a file once the job is killed.
      Last edited by GenoMax; 05-20-2015, 10:33 AM.

      Comment


      • #4
        It is a .fasta file made by concatenation of the individual .faa files of genomes. And this file is used to create the blast database and also to query it. To 74 genomes, it has something like 55 Mb. But, no matter what size it has, running it in the server the .xml file remains empty until the blast ends, but running the same in my pc, we can see the size of .xml file increasing. Running to 25 genomes, it works well in the server, although the problem of writing in xml file only after blast ends to run remains (in this case it's not a problem because it's not killed).
        I'm running it on the server because in my pc it becomes impractical to a high number of genomes. I can't understand why, running the same code, in the server it doesn't write the output while running blast and in the pc it does.

        Comment


        • #5
          Based on your last post it sounds like the first 24 or so genomes are ok but the 25th or 26th must be causing the problem? Ideally if you have access to a cluster, running these genomes in parallel as independent jobs would be the most efficient (or as separate jobs on the server, so you will have the output from those genome jobs that do finish on hand). You can combine the data later.

          It sounds like you are not running the job under a scheduler (e.g. SGE, LSF etc) on the server.

          Comment


          • #6
            It's not because of a genome, but because the high number of genomes. I need to run all vs. all genomes because I want to find all the similar sequences between the genomes (and not only the best match for each query sequence). But I found that I have not enough space in the directory in the server. Maybe it is what made the server kill the process: when it tried to write in xml file, it wasn't enough space and crashed. I will try to run it in a new directory with more space and see if I can obtain my results.
            Still, the problem (I'm not sure if it's a problem, but in my pc it doesn't happen) of not writing in xml file while running blast still remains... And it's not because of which genomes, because it happens to every file I run, no matter if it has 5 genomes, 10, 25 or 75...

            I hope it's just a space problem and I can get my results soon. I'll give news! Thanks anyway for your contribution!

            Comment


            • #7
              Well... It seems that in the new directory in the server, the size of the xml file is rising while the blast is running! I don't know why now it's working and in the old directory it doesn't... So, I hope that I can run my scripts to the huge 74 genomes dataset!

              Thanks a lot!

              Comment


              • #8
                Even though you are doing an all vs all comparison blast is only using one query sequence at one time against the genome pool (unless I am misunderstanding something). So submitting those jobs in parallel or serial is going to give you the same result.

                If you know for sure that the problem was disk space then you have a path forward.

                BTW: If these are bacterial genomes (related) then you may want to look at Mauve as an alternative.

                Comment


                • #9
                  Yeah I can submit serial jobs, but the size of database has to remain the same anyway. Maybe it's a good alternative if it's still not working... And yes, these are bacterial genomes, but I want to get the genes in common, and not align the complete genomes. I'm not sure what Mauve does exactly, but I'll take a look!

                  Thanks!

                  Comment


                  • #10
                    Just to confirm that my scripts are running well, and I already have results to the 74 genomes dataset! It seems that it was because the space in disk, not a memory problem!

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advancing Precision Medicine for Rare Diseases in Children
                      by seqadmin




                      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                      12-16-2024, 07:57 AM
                    • seqadmin
                      Recent Advances in Sequencing Technologies
                      by seqadmin



                      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                      Long-Read Sequencing
                      Long-read sequencing has seen remarkable advancements,...
                      12-02-2024, 01:49 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 12-17-2024, 10:28 AM
                    0 responses
                    33 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-13-2024, 08:24 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-12-2024, 07:41 AM
                    0 responses
                    34 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-11-2024, 07:45 AM
                    0 responses
                    46 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X