you wonder then, why the program isn't being doing this splitting by itself
Unconfigured Ad
Collapse
X
-
There are 4 main steps in blastn.
1.Prepare the hash table with mask data.
2.Scan the hits in the database. And the -thread_num command only useful in this step.
3.Trace back the result in the database.
4.Print the result.
-thread_num command (multi-thread version in step 2) is better than multi-progress. Multi-progress will load database, mask database into RAM by each progress.
Our G-Blastn which speed up the scan step in GPU and speed up the trace back step by SSE, change the framework into pipeline, each step can be overlapped.
You can find the source code and release 1.0 on
and
Download GBLASTN for free. G-BLASTN is a GPU-accelerated nucleotide alignment tool. G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST. G-BLASTN can produce exactly the same results as NCBI-BLAST, and it also has very similar user commands.
Comment
-
-
I had a speed up problem with blast+, too. I analysed the cpu usage for mulitthreading option in the "old" blast and blast+ and I saw that the multithreading option was not efficient for my dataset. So I parallized it with a perl script so increase the speed. Maybe thats an option to speed up blast+ runs for you.
The original reply from the blast team was:
Maybe this helps... If someone is interested in the script, just ask me....The overall total CPU time was about 160 minutes for both runs, but the blastall application did finish in less time than the BLAST+ application. We will work on improving the the parallelization of BLAST+ for this case. I've also looked at some test cases against our nt database, but blastall and the BLAST+ application did equally well on the parallelization.
Comment
-
-
It's not the most elegant script, but it works fine for an all-vs-all blast. It includes making databases. You need Bio::SeqIO;Parallel::ForkManager; Time::Local. Hope it helps:
Code:#!/usr/bin/perl -w use strict; use Bio::SeqIO; use Parallel::ForkManager; use Time::Local; ############################# #USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus #e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10 ############################# # author: Ulrike Loeber ([email protected]) #This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+; my $blast_method = $ARGV[0]; my $seq_dir = $ARGV[1]; my $evalue = $ARGV[2]; my $outfmt = $ARGV[3]; my $outdir = $ARGV[4]; my $cpus = $ARGV[5]; opendir (INDIR, $seq_dir) or die $!; my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta" close INDIR; #one cpu is used for perl, so the number of cpus left is $seq_dir-1 my $numberOfProcesses=($cpus-1); my $subsets=$numberOfProcesses; #build as many subsets as free cpus my $manager = new Parallel::ForkManager( $numberOfProcesses ); foreach my $file(@files){ my $time=localtime(); print "#########PROCESSING##########\n $file\t $time\n"; my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file", -format => "fasta"); my $seq; my @seq_array; while( $seq = $input->next_seq() ) { push(@seq_array,$seq); } my $numberofsequences=@seq_array; system "makeblastdb -dbtype nucl -in $seq_dir/$file"; my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences $seq=$seq_array[$i]; my $id=$seq->id(); my $sequence=$seq->seq(); print OUTFILE ">$id\n$sequence\n"; } close OUTFILE; $manager->start and next; system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast "; $manager->finish; } print "#########END##########\n $file\t $time\n"; } #cleaning up directory $manager->wait_all_children; foreach my $file(@files){ my $outfile=$file; $outfile=~s/fasta/blast/g; system "touch $outdir/$outfile"; #creates one outfile per fasta file for (my $j=0;$j<$subsets;$j++){ system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results system "rm $outdir/subset_$j\_$file"; #removes data subsets } system "rm $outdir/$file.nhr"; system "rm $outdir/$file.nin"; system "rm $outdir/$file.nsq"; #print "#########COMPLETE##########\n $file\t $time\n"; }
Comment
-
-
I am using following script to speed up my query at tblastn, hence it is showing following error...
Can't exec "makeblastdb": No such file or directory at blast.pl line 42, <GEN0> line 42132.
#############################
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Parallel::ForkManager;
use Time::Local;
#############################
#USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus
#e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10
#############################
# author: Ulrike Loeber ([email protected])
#This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+;
my $blast_method = $ARGV[0];
my $seq_dir = $ARGV[1];
my $evalue = $ARGV[2];
my $outfmt = $ARGV[3];
my $outdir = $ARGV[4];
my $cpus = $ARGV[5];
opendir (INDIR, $seq_dir) or die $!;
my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta"
close INDIR;
#one cpu is used for perl, so the number of cpus left is $seq_dir-1
my $numberOfProcesses=($cpus-1);
my $subsets=$numberOfProcesses; #build as many subsets as free cpus
my $manager = new Parallel::ForkManager( $numberOfProcesses );
foreach my $file(@files){
my $time=localtime();
print "#########PROCESSING##########\n $file\t $time\n";
my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file",
-format => "fasta");
my $seq;
my @seq_array;
while( $seq = $input->next_seq() ) {
push(@seq_array,$seq);
}
my $numberofsequences=@seq_array;
system "makeblastdb -dbtype nucl -in $seq_dir/$file";
my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences
for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x
open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it
for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences
$seq=$seq_array[$i];
my $id=$seq->id();
my $sequence=$seq->seq();
print OUTFILE ">$id\n$sequence\n";
}
close OUTFILE;
$manager->start and next;
system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast ";
$manager->finish;
}
print "#########END##########\n $file\t $time\n";
}
#cleaning up directory
$manager->wait_all_children;
foreach my $file(@files){
my $outfile=$file;
$outfile=~s/fasta/blast/g;
system "touch $outdir/$outfile"; #creates one outfile per fasta file
for (my $j=0;$j<$subsets;$j++){
system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result
system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results
system "rm $outdir/subset_$j\_$file"; #removes data subsets
}
system "rm $outdir/$file.nhr";
system "rm $outdir/$file.nin";
system "rm $outdir/$file.nsq";
#print "#########COMPLETE##########\n $file\t $time\n";
}
Comment
-
-
-
Just copying the files to local directory is not enough if that directory is not in your PATH. Append your PATH to include the directory in question.
Comment
-
-
Actually there are a few prerequisites, like an executable makeblastdb where ever you run the script. Do you added ncbis blast to your path if you are a non root user? Be aware of the higher memory usage. But it still might be faster. You can contact me if you have any more questions. Bests, Ulrike
Comment
-
-
blast+ programs
I am at the same path where blast+ programs are there. Can see my path following
/Downloads/bp272/ncbi-blast-2.2.31+/bin$ perl blast.pl tblastn /home/sekhwalm/Oryza/blast/ e-5 6 /home/sekhwalm/Oryza/blast/ 5
Originally posted by GenoMax View PostJust copying the files to local directory is not enough if that directory is not in your PATH. Append your PATH to include the directory in question.
Comment
-
-
speed up tblastn query
Hi I am using following script to speed up my query at tblastn, hence it is showing error....
Can't exec "makeblastdb": No such file or directory at blast.pl line 41, <GEN0> line 42132.
Originally posted by uloeber View PostIt's not the most elegant script, but it works fine for an all-vs-all blast. It includes making databases. You need Bio::SeqIO;Parallel::ForkManager; Time::Local. Hope it helps:
Code:#!/usr/bin/perl -w use strict; use Bio::SeqIO; use Parallel::ForkManager; use Time::Local; ############################# #USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus #e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10 ############################# # author: Ulrike Loeber ([email protected]) #This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+; my $blast_method = $ARGV[0]; my $seq_dir = $ARGV[1]; my $evalue = $ARGV[2]; my $outfmt = $ARGV[3]; my $outdir = $ARGV[4]; my $cpus = $ARGV[5]; opendir (INDIR, $seq_dir) or die $!; my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta" close INDIR; #one cpu is used for perl, so the number of cpus left is $seq_dir-1 my $numberOfProcesses=($cpus-1); my $subsets=$numberOfProcesses; #build as many subsets as free cpus my $manager = new Parallel::ForkManager( $numberOfProcesses ); foreach my $file(@files){ my $time=localtime(); print "#########PROCESSING##########\n $file\t $time\n"; my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file", -format => "fasta"); my $seq; my @seq_array; while( $seq = $input->next_seq() ) { push(@seq_array,$seq); } my $numberofsequences=@seq_array; system "makeblastdb -dbtype nucl -in $seq_dir/$file"; my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences $seq=$seq_array[$i]; my $id=$seq->id(); my $sequence=$seq->seq(); print OUTFILE ">$id\n$sequence\n"; } close OUTFILE; $manager->start and next; system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast "; $manager->finish; } print "#########END##########\n $file\t $time\n"; } #cleaning up directory $manager->wait_all_children; foreach my $file(@files){ my $outfile=$file; $outfile=~s/fasta/blast/g; system "touch $outdir/$outfile"; #creates one outfile per fasta file for (my $j=0;$j<$subsets;$j++){ system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results system "rm $outdir/subset_$j\_$file"; #removes data subsets } system "rm $outdir/$file.nhr"; system "rm $outdir/$file.nin"; system "rm $outdir/$file.nsq"; #print "#########COMPLETE##########\n $file\t $time\n"; }
Comment
-
-
Until you fix the PATH this is not going to work for any blast type. Add "/Downloads/bp272/ncbi-blast-2.2.31+/bin" this directory to your PATH following instructions I had linked in a post above.
If you are serious about learning this then spend a bit of time here understanding some basic unix: http://korflab.ucdavis.edu/Unix_and_...ent.html#part1
Comment
-
-
Thanks...
now, I got the PATH issue, and blast is running, However after sometimes running it shows following errors..
Warning: [tblastn] Query is Empty!
cat: /home/sekhwalm/Downloads/bp272/ncbi-blast-2.2.31+/bin//subset_0_pep.fasta.blast: No such file or directory
rm: cannot remove ‘/home/sekhwalm/Downloads/bp272/ncbi-blast-2.2.31+/bin//subset_0_pep.fasta.blast’: No such file or directory
Originally posted by GenoMax View PostUntil you fix the PATH this is not going to work for any blast type. Add "/Downloads/bp272/ncbi-blast-2.2.31+/bin" this directory to your PATH following instructions I had linked in a post above.
If you are serious about learning this then spend a bit of time here understanding some basic unix: http://korflab.ucdavis.edu/Unix_and_...ent.html#part1
Comment
-
Latest Articles
Collapse
-
by GATTACATLove this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
-
Channel: Articles
07-01-2026, 11:43 AM -
-
by SEQadmin2
I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.
Here are nine questions we think about, in roughly the order they matter, before...-
Channel: Articles
-
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, 07-02-2026, 11:08 AM
|
0 responses
18 views
0 reactions
|
Last Post
by SEQadmin2
07-02-2026, 11:08 AM
|
||
|
Started by SEQadmin2, 06-30-2026, 05:37 AM
|
0 responses
19 views
0 reactions
|
Last Post
by SEQadmin2
06-30-2026, 05:37 AM
|
||
|
Started by SEQadmin2, 06-26-2026, 11:10 AM
|
0 responses
21 views
0 reactions
|
Last Post
by SEQadmin2
06-26-2026, 11:10 AM
|
||
|
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population
by SEQadmin2
Started by SEQadmin2, 06-17-2026, 06:09 AM
|
0 responses
54 views
0 reactions
|
Last Post
by SEQadmin2
06-17-2026, 06:09 AM
|
Comment