Seqanswers Leaderboard Ad

**GenoMax** · 05-20-2015, 10:03 AM

Can you provide some additional details about the hardware configurations, size of the query and database files?

**GenoMax** · 05-20-2015, 10:22 AM

If you are searching with a multi-fasta file blast should be writing the output file as it goes through the queries. If you are using some sort of job scheduler on the server then it may be buffering the output, which is why you don't see it and it may never get written to a file once the job is killed.

**adpolicarpo** · 05-21-2015, 03:30 AM

It is a .fasta file made by concatenation of the individual .faa files of genomes. And this file is used to create the blast database and also to query it. To 74 genomes, it has something like 55 Mb. But, no matter what size it has, running it in the server the .xml file remains empty until the blast ends, but running the same in my pc, we can see the size of .xml file increasing. Running to 25 genomes, it works well in the server, although the problem of writing in xml file only after blast ends to run remains (in this case it's not a problem because it's not killed).
I'm running it on the server because in my pc it becomes impractical to a high number of genomes. I can't understand why, running the same code, in the server it doesn't write the output while running blast and in the pc it does.

**GenoMax** · 05-21-2015, 04:19 AM

Based on your last post it sounds like the first 24 or so genomes are ok but the 25th or 26th must be causing the problem? Ideally if you have access to a cluster, running these genomes in parallel as independent jobs would be the most efficient (or as separate jobs on the server, so you will have the output from those genome jobs that do finish on hand). You can combine the data later.

It sounds like you are not running the job under a scheduler (e.g. SGE, LSF etc) on the server.

**adpolicarpo** · 05-21-2015, 05:31 AM

It's not because of a genome, but because the high number of genomes. I need to run all vs. all genomes because I want to find all the similar sequences between the genomes (and not only the best match for each query sequence). But I found that I have not enough space in the directory in the server. Maybe it is what made the server kill the process: when it tried to write in xml file, it wasn't enough space and crashed. I will try to run it in a new directory with more space and see if I can obtain my results.
Still, the problem (I'm not sure if it's a problem, but in my pc it doesn't happen) of not writing in xml file while running blast still remains... And it's not because of which genomes, because it happens to every file I run, no matter if it has 5 genomes, 10, 25 or 75...

I hope it's just a space problem and I can get my results soon. I'll give news! Thanks anyway for your contribution!

**adpolicarpo** · 05-21-2015, 05:38 AM

Well... It seems that in the new directory in the server, the size of the xml file is rising while the blast is running! I don't know why now it's working and in the old directory it doesn't... So, I hope that I can run my scripts to the huge 74 genomes dataset!

Thanks a lot!

**GenoMax** · 05-21-2015, 05:39 AM

Even though you are doing an all vs all comparison blast is only using one query sequence at one time against the genome pool (unless I am misunderstanding something). So submitting those jobs in parallel or serial is going to give you the same result.

If you know for sure that the problem was disk space then you have a path forward.

BTW: If these are bacterial genomes (related) then you may want to look at Mauve as an alternative.

**adpolicarpo** · 05-21-2015, 06:37 AM

Yeah I can submit serial jobs, but the size of database has to remain the same anyway. Maybe it's a good alternative if it's still not working... And yes, these are bacterial genomes, but I want to get the genes in common, and not align the complete genomes. I'm not sure what Mauve does exactly, but I'll take a look!

Thanks!

**adpolicarpo** · 05-26-2015, 05:11 AM

Just to confirm that my scripts are running well, and I already have results to the 74 genomes dataset! It seems that it was because the space in disk, not a memory problem!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

blast "returned non-zero exit status 137" error, memory problems

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News