you wonder then, why the program isn't being doing this splitting by itself
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
There are 4 main steps in blastn.
1.Prepare the hash table with mask data.
2.Scan the hits in the database. And the -thread_num command only useful in this step.
3.Trace back the result in the database.
4.Print the result.
-thread_num command (multi-thread version in step 2) is better than multi-progress. Multi-progress will load database, mask database into RAM by each progress.
Our G-Blastn which speed up the scan step in GPU and speed up the trace back step by SSE, change the framework into pipeline, each step can be overlapped.
You can find the source code and release 1.0 on
and
Download GBLASTN for free. G-BLASTN is a GPU-accelerated nucleotide alignment tool. G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST. G-BLASTN can produce exactly the same results as NCBI-BLAST, and it also has very similar user commands.
Comment
-
I had a speed up problem with blast+, too. I analysed the cpu usage for mulitthreading option in the "old" blast and blast+ and I saw that the multithreading option was not efficient for my dataset. So I parallized it with a perl script so increase the speed. Maybe thats an option to speed up blast+ runs for you.
The original reply from the blast team was:
...The overall total CPU time was about 160 minutes for both runs, but the blastall application did finish in less time than the BLAST+ application. We will work on improving the the parallelization of BLAST+ for this case. I've also looked at some test cases against our nt database, but blastall and the BLAST+ application did equally well on the parallelization.
Comment
-
It's not the most elegant script, but it works fine for an all-vs-all blast. It includes making databases. You need Bio::SeqIO;Parallel::ForkManager; Time::Local. Hope it helps:
Code:#!/usr/bin/perl -w use strict; use Bio::SeqIO; use Parallel::ForkManager; use Time::Local; ############################# #USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus #e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10 ############################# # author: Ulrike Loeber ([email protected]) #This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+; my $blast_method = $ARGV[0]; my $seq_dir = $ARGV[1]; my $evalue = $ARGV[2]; my $outfmt = $ARGV[3]; my $outdir = $ARGV[4]; my $cpus = $ARGV[5]; opendir (INDIR, $seq_dir) or die $!; my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta" close INDIR; #one cpu is used for perl, so the number of cpus left is $seq_dir-1 my $numberOfProcesses=($cpus-1); my $subsets=$numberOfProcesses; #build as many subsets as free cpus my $manager = new Parallel::ForkManager( $numberOfProcesses ); foreach my $file(@files){ my $time=localtime(); print "#########PROCESSING##########\n $file\t $time\n"; my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file", -format => "fasta"); my $seq; my @seq_array; while( $seq = $input->next_seq() ) { push(@seq_array,$seq); } my $numberofsequences=@seq_array; system "makeblastdb -dbtype nucl -in $seq_dir/$file"; my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences $seq=$seq_array[$i]; my $id=$seq->id(); my $sequence=$seq->seq(); print OUTFILE ">$id\n$sequence\n"; } close OUTFILE; $manager->start and next; system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast "; $manager->finish; } print "#########END##########\n $file\t $time\n"; } #cleaning up directory $manager->wait_all_children; foreach my $file(@files){ my $outfile=$file; $outfile=~s/fasta/blast/g; system "touch $outdir/$outfile"; #creates one outfile per fasta file for (my $j=0;$j<$subsets;$j++){ system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results system "rm $outdir/subset_$j\_$file"; #removes data subsets } system "rm $outdir/$file.nhr"; system "rm $outdir/$file.nin"; system "rm $outdir/$file.nsq"; #print "#########COMPLETE##########\n $file\t $time\n"; }
Comment
-
I am using following script to speed up my query at tblastn, hence it is showing following error...
Can't exec "makeblastdb": No such file or directory at blast.pl line 42, <GEN0> line 42132.
#############################
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Parallel::ForkManager;
use Time::Local;
#############################
#USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus
#e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10
#############################
# author: Ulrike Loeber ([email protected])
#This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+;
my $blast_method = $ARGV[0];
my $seq_dir = $ARGV[1];
my $evalue = $ARGV[2];
my $outfmt = $ARGV[3];
my $outdir = $ARGV[4];
my $cpus = $ARGV[5];
opendir (INDIR, $seq_dir) or die $!;
my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta"
close INDIR;
#one cpu is used for perl, so the number of cpus left is $seq_dir-1
my $numberOfProcesses=($cpus-1);
my $subsets=$numberOfProcesses; #build as many subsets as free cpus
my $manager = new Parallel::ForkManager( $numberOfProcesses );
foreach my $file(@files){
my $time=localtime();
print "#########PROCESSING##########\n $file\t $time\n";
my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file",
-format => "fasta");
my $seq;
my @seq_array;
while( $seq = $input->next_seq() ) {
push(@seq_array,$seq);
}
my $numberofsequences=@seq_array;
system "makeblastdb -dbtype nucl -in $seq_dir/$file";
my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences
for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x
open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it
for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences
$seq=$seq_array[$i];
my $id=$seq->id();
my $sequence=$seq->seq();
print OUTFILE ">$id\n$sequence\n";
}
close OUTFILE;
$manager->start and next;
system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast ";
$manager->finish;
}
print "#########END##########\n $file\t $time\n";
}
#cleaning up directory
$manager->wait_all_children;
foreach my $file(@files){
my $outfile=$file;
$outfile=~s/fasta/blast/g;
system "touch $outdir/$outfile"; #creates one outfile per fasta file
for (my $j=0;$j<$subsets;$j++){
system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result
system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results
system "rm $outdir/subset_$j\_$file"; #removes data subsets
}
system "rm $outdir/$file.nhr";
system "rm $outdir/$file.nin";
system "rm $outdir/$file.nsq";
#print "#########COMPLETE##########\n $file\t $time\n";
}
Comment
-
-
Just copying the files to local directory is not enough if that directory is not in your PATH. Append your PATH to include the directory in question.
Comment
-
Actually there are a few prerequisites, like an executable makeblastdb where ever you run the script. Do you added ncbis blast to your path if you are a non root user? Be aware of the higher memory usage. But it still might be faster. You can contact me if you have any more questions. Bests, Ulrike
Comment
-
blast+ programs
I am at the same path where blast+ programs are there. Can see my path following
/Downloads/bp272/ncbi-blast-2.2.31+/bin$ perl blast.pl tblastn /home/sekhwalm/Oryza/blast/ e-5 6 /home/sekhwalm/Oryza/blast/ 5
Originally posted by GenoMax View PostJust copying the files to local directory is not enough if that directory is not in your PATH. Append your PATH to include the directory in question.
Comment
-
speed up tblastn query
Hi I am using following script to speed up my query at tblastn, hence it is showing error....
Can't exec "makeblastdb": No such file or directory at blast.pl line 41, <GEN0> line 42132.
Originally posted by uloeber View PostIt's not the most elegant script, but it works fine for an all-vs-all blast. It includes making databases. You need Bio::SeqIO;Parallel::ForkManager; Time::Local. Hope it helps:
Code:#!/usr/bin/perl -w use strict; use Bio::SeqIO; use Parallel::ForkManager; use Time::Local; ############################# #USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus #e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10 ############################# # author: Ulrike Loeber ([email protected]) #This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+; my $blast_method = $ARGV[0]; my $seq_dir = $ARGV[1]; my $evalue = $ARGV[2]; my $outfmt = $ARGV[3]; my $outdir = $ARGV[4]; my $cpus = $ARGV[5]; opendir (INDIR, $seq_dir) or die $!; my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta" close INDIR; #one cpu is used for perl, so the number of cpus left is $seq_dir-1 my $numberOfProcesses=($cpus-1); my $subsets=$numberOfProcesses; #build as many subsets as free cpus my $manager = new Parallel::ForkManager( $numberOfProcesses ); foreach my $file(@files){ my $time=localtime(); print "#########PROCESSING##########\n $file\t $time\n"; my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file", -format => "fasta"); my $seq; my @seq_array; while( $seq = $input->next_seq() ) { push(@seq_array,$seq); } my $numberofsequences=@seq_array; system "makeblastdb -dbtype nucl -in $seq_dir/$file"; my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences $seq=$seq_array[$i]; my $id=$seq->id(); my $sequence=$seq->seq(); print OUTFILE ">$id\n$sequence\n"; } close OUTFILE; $manager->start and next; system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast "; $manager->finish; } print "#########END##########\n $file\t $time\n"; } #cleaning up directory $manager->wait_all_children; foreach my $file(@files){ my $outfile=$file; $outfile=~s/fasta/blast/g; system "touch $outdir/$outfile"; #creates one outfile per fasta file for (my $j=0;$j<$subsets;$j++){ system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results system "rm $outdir/subset_$j\_$file"; #removes data subsets } system "rm $outdir/$file.nhr"; system "rm $outdir/$file.nin"; system "rm $outdir/$file.nsq"; #print "#########COMPLETE##########\n $file\t $time\n"; }
Comment
-
Until you fix the PATH this is not going to work for any blast type. Add "/Downloads/bp272/ncbi-blast-2.2.31+/bin" this directory to your PATH following instructions I had linked in a post above.
If you are serious about learning this then spend a bit of time here understanding some basic unix: http://korflab.ucdavis.edu/Unix_and_...ent.html#part1
Comment
-
Thanks...
now, I got the PATH issue, and blast is running, However after sometimes running it shows following errors..
Warning: [tblastn] Query is Empty!
cat: /home/sekhwalm/Downloads/bp272/ncbi-blast-2.2.31+/bin//subset_0_pep.fasta.blast: No such file or directory
rm: cannot remove ‘/home/sekhwalm/Downloads/bp272/ncbi-blast-2.2.31+/bin//subset_0_pep.fasta.blast’: No such file or directory
Originally posted by GenoMax View PostUntil you fix the PATH this is not going to work for any blast type. Add "/Downloads/bp272/ncbi-blast-2.2.31+/bin" this directory to your PATH following instructions I had linked in a post above.
If you are serious about learning this then spend a bit of time here understanding some basic unix: http://korflab.ucdavis.edu/Unix_and_...ent.html#part1
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:47 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Comment