Seqanswers Leaderboard Ad

**gsgs** · 08-29-2013, 05:33 PM

you wonder then, why the program isn't being doing this splitting by itself

**OpenHero** · 08-29-2013, 06:20 PM

There are 4 main steps in blastn.
1.Prepare the hash table with mask data.
2.Scan the hits in the database. And the -thread_num command only useful in this step.
3.Trace back the result in the database.
4.Print the result.

-thread_num command (multi-thread version in step 2) is better than multi-progress. Multi-progress will load database, mask database into RAM by each progress.

Our G-Blastn which speed up the scan step in GPU and speed up the trace back step by SSE, change the framework into pipeline, each step can be overlapped.

You can find the source code and release 1.0 on

G-BLASTN

http://www.comp.hkbu.edu.hk/~chxw/software/G-BLASTN.html

and

GBLASTN

https://sourceforge.net/projects/gblastn/

Download GBLASTN for free. G-BLASTN is a GPU-accelerated nucleotide alignment tool. G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST. G-BLASTN can produce exactly the same results as NCBI-BLAST, and it also has very similar user commands.

GitHub - OpenHero/gblastn: G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST.

https://github.com/OpenHero/gblastn

G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST. - GitHub - OpenHero/gblastn: G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widel...

**uloeber** · 11-12-2013, 03:16 AM

I had a speed up problem with blast+, too. I analysed the cpu usage for mulitthreading option in the "old" blast and blast+ and I saw that the multithreading option was not efficient for my dataset. So I parallized it with a perl script so increase the speed. Maybe thats an option to speed up blast+ runs for you.
The original reply from the blast team was:

...The overall total CPU time was about 160 minutes for both runs, but the blastall application did finish in less time than the BLAST+ application. We will work on improving the the parallelization of BLAST+ for this case. I've also looked at some test cases against our nt database, but blastall and the BLAST+ application did equally well on the parallelization.

Maybe this helps... If someone is interested in the script, just ask me.

**yaximik** · 11-12-2013, 04:32 AM

I;d certainly try - can you post it?

**uloeber** · 11-12-2013, 10:28 AM

It's not the most elegant script, but it works fine for an all-vs-all blast. It includes making databases. You need Bio::SeqIO;Parallel::ForkManager; Time::Local. Hope it helps:

Code:

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Parallel::ForkManager;
use Time::Local;
#############################
#USAGE: perl script.pl blastmethod directory_of_fasta_files  eval outfmt number_of_cpus
#e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10
#############################
# author: Ulrike Loeber ([email protected])
#This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+;

my $blast_method	=	$ARGV[0];
my $seq_dir			=	$ARGV[1];
my $evalue			=	$ARGV[2];
my $outfmt			=	$ARGV[3];
my $outdir			=	$ARGV[4];
my $cpus			=	$ARGV[5];

opendir (INDIR, $seq_dir) or die $!;
my @files=grep /\.fasta$/ , readdir (INDIR);    #greps every file in the determined directory which end with "fasta"
close INDIR;
#one cpu is used for perl, so the number of cpus left is $seq_dir-1
my $numberOfProcesses=($cpus-1);
my $subsets=$numberOfProcesses; 	#build as many subsets as free cpus
my $manager = new Parallel::ForkManager( $numberOfProcesses );
foreach my $file(@files){
	my $time=localtime();
	print "#########PROCESSING##########\n $file\t $time\n";
	my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file",	
			-format => "fasta");

			my $seq;
			my @seq_array;
	while( $seq = $input->next_seq() ) {
		push(@seq_array,$seq);
	}
	

	my $numberofsequences=@seq_array;
	system "makeblastdb -dbtype nucl -in $seq_dir/$file";
	my $loops=$numberofsequences/$subsets;	#is 1/(times) of the number of sequences
	for (my $j=0;$j<$subsets;$j++){			#creates as many files as subsets to build and loops as many times x
		open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!;	#creates a file which is named like the infile with subset_ in front of it
		for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){	#loops through 1/x  of the sequences 
			$seq=$seq_array[$i];
			my $id=$seq->id();
			my $sequence=$seq->seq();
			print OUTFILE ">$id\n$sequence\n";
		}
		close OUTFILE;
		$manager->start and next;   
		system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast ";
		$manager->finish;	
	}	
	print "#########END##########\n $file\t $time\n";
}
#cleaning up directory
$manager->wait_all_children;
foreach my $file(@files){
	my $outfile=$file;
	$outfile=~s/fasta/blast/g;
	system "touch $outdir/$outfile";						#creates one outfile per fasta file
	for (my $j=0;$j<$subsets;$j++){
		system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile";	#concatenates subfile results to one blast result
		system "rm $outdir/subset_$j\_$file.blast";	#removes subset blast results
		system "rm $outdir/subset_$j\_$file";			#removes data subsets
		}	
	system "rm $outdir/$file.nhr";
	system "rm $outdir/$file.nin";
	system "rm $outdir/$file.nsq";
	#print "#########COMPLETE##########\n $file\t $time\n";
}

**yaximik** · 11-15-2013, 12:32 PM

Thanks! I give it try.

**sekhwal** · 09-24-2015, 01:09 PM

I am using following script to speed up my query at tblastn, hence it is showing following error...

Can't exec "makeblastdb": No such file or directory at blast.pl line 42, <GEN0> line 42132.

#############################

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Parallel::ForkManager;
use Time::Local;
#############################
#USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus
#e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10
#############################
# author: Ulrike Loeber ([email protected])
#This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+;

my $blast_method = $ARGV[0];
my $seq_dir = $ARGV[1];
my $evalue = $ARGV[2];
my $outfmt = $ARGV[3];
my $outdir = $ARGV[4];
my $cpus = $ARGV[5];

opendir (INDIR, $seq_dir) or die $!;
my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta"
close INDIR;
#one cpu is used for perl, so the number of cpus left is $seq_dir-1
my $numberOfProcesses=($cpus-1);
my $subsets=$numberOfProcesses; #build as many subsets as free cpus
my $manager = new Parallel::ForkManager( $numberOfProcesses );
foreach my $file(@files){
my $time=localtime();
print "#########PROCESSING##########\n $file\t $time\n";
my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file",
-format => "fasta");

my $seq;
my @seq_array;
while( $seq = $input->next_seq() ) {
push(@seq_array,$seq);
}

my $numberofsequences=@seq_array;
system "makeblastdb -dbtype nucl -in $seq_dir/$file";
my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences
for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x
open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it
for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences
$seq=$seq_array[$i];
my $id=$seq->id();
my $sequence=$seq->seq();
print OUTFILE ">$id\n$sequence\n";
}
close OUTFILE;
$manager->start and next;
system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast ";
$manager->finish;
}
print "#########END##########\n $file\t $time\n";
}
#cleaning up directory
$manager->wait_all_children;
foreach my $file(@files){
my $outfile=$file;
$outfile=~s/fasta/blast/g;
system "touch $outdir/$outfile"; #creates one outfile per fasta file
for (my $j=0;$j<$subsets;$j++){
system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result
system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results
system "rm $outdir/subset_$j\_$file"; #removes data subsets
}
system "rm $outdir/$file.nhr";
system "rm $outdir/$file.nin";
system "rm $outdir/$file.nsq";
#print "#########COMPLETE##########\n $file\t $time\n";
}

**GenoMax** · 09-24-2015, 01:11 PM

Do you have blast+ installed? Makeblastdb program is part of blast+ package that you can download from NCBI.

**sekhwal** · 09-24-2015, 01:16 PM

blast+ programs

I copied all blast+ programs in same directory, where my input files are, still it is showing same error.

Originally posted by GenoMax View Post

Do you have blast+ installed? Makeblastdb program is part of blast+ package that you can download from NCBI.

**GenoMax** · 09-24-2015, 01:25 PM

Just copying the files to local directory is not enough if that directory is not in your PATH. Append your PATH to include the directory in question.

**uloeber** · 09-24-2015, 01:25 PM

Actually there are a few prerequisites, like an executable makeblastdb where ever you run the script. Do you added ncbis blast to your path if you are a non root user? Be aware of the higher memory usage. But it still might be faster. You can contact me if you have any more questions. Bests, Ulrike

**sekhwal** · 09-24-2015, 01:35 PM

blast+ programs

I am at the same path where blast+ programs are there. Can see my path following

/Downloads/bp272/ncbi-blast-2.2.31+/bin$ perl blast.pl tblastn /home/sekhwalm/Oryza/blast/ e-5 6 /home/sekhwalm/Oryza/blast/ 5

Originally posted by GenoMax View Post

Just copying the files to local directory is not enough if that directory is not in your PATH. Append your PATH to include the directory in question.

**sekhwal** · 09-24-2015, 01:47 PM

speed up tblastn query

Hi I am using following script to speed up my query at tblastn, hence it is showing error....

Can't exec "makeblastdb": No such file or directory at blast.pl line 41, <GEN0> line 42132.

Originally posted by uloeber View Post

It's not the most elegant script, but it works fine for an all-vs-all blast. It includes making databases. You need Bio::SeqIO;Parallel::ForkManager; Time::Local. Hope it helps:

Code:

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Parallel::ForkManager;
use Time::Local;
#############################
#USAGE: perl script.pl blastmethod directory_of_fasta_files  eval outfmt number_of_cpus
#e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10
#############################
# author: Ulrike Loeber ([email protected])
#This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+;

my $blast_method	=	$ARGV[0];
my $seq_dir			=	$ARGV[1];
my $evalue			=	$ARGV[2];
my $outfmt			=	$ARGV[3];
my $outdir			=	$ARGV[4];
my $cpus			=	$ARGV[5];

opendir (INDIR, $seq_dir) or die $!;
my @files=grep /\.fasta$/ , readdir (INDIR);    #greps every file in the determined directory which end with "fasta"
close INDIR;
#one cpu is used for perl, so the number of cpus left is $seq_dir-1
my $numberOfProcesses=($cpus-1);
my $subsets=$numberOfProcesses; 	#build as many subsets as free cpus
my $manager = new Parallel::ForkManager( $numberOfProcesses );
foreach my $file(@files){
	my $time=localtime();
	print "#########PROCESSING##########\n $file\t $time\n";
	my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file",	
			-format => "fasta");

			my $seq;
			my @seq_array;
	while( $seq = $input->next_seq() ) {
		push(@seq_array,$seq);
	}
	

	my $numberofsequences=@seq_array;
	system "makeblastdb -dbtype nucl -in $seq_dir/$file";
	my $loops=$numberofsequences/$subsets;	#is 1/(times) of the number of sequences
	for (my $j=0;$j<$subsets;$j++){			#creates as many files as subsets to build and loops as many times x
		open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!;	#creates a file which is named like the infile with subset_ in front of it
		for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){	#loops through 1/x  of the sequences 
			$seq=$seq_array[$i];
			my $id=$seq->id();
			my $sequence=$seq->seq();
			print OUTFILE ">$id\n$sequence\n";
		}
		close OUTFILE;
		$manager->start and next;   
		system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast ";
		$manager->finish;	
	}	
	print "#########END##########\n $file\t $time\n";
}
#cleaning up directory
$manager->wait_all_children;
foreach my $file(@files){
	my $outfile=$file;
	$outfile=~s/fasta/blast/g;
	system "touch $outdir/$outfile";						#creates one outfile per fasta file
	for (my $j=0;$j<$subsets;$j++){
		system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile";	#concatenates subfile results to one blast result
		system "rm $outdir/subset_$j\_$file.blast";	#removes subset blast results
		system "rm $outdir/subset_$j\_$file";			#removes data subsets
		}	
	system "rm $outdir/$file.nhr";
	system "rm $outdir/$file.nin";
	system "rm $outdir/$file.nsq";
	#print "#########COMPLETE##########\n $file\t $time\n";
}

**GenoMax** · 09-24-2015, 01:50 PM

Until you fix the PATH this is not going to work for any blast type. Add "/Downloads/bp272/ncbi-blast-2.2.31+/bin" this directory to your PATH following instructions I had linked in a post above.

If you are serious about learning this then spend a bit of time here understanding some basic unix: http://korflab.ucdavis.edu/Unix_and_...ent.html#part1

**sekhwal** · 09-25-2015, 07:58 AM

Thanks...
now, I got the PATH issue, and blast is running, However after sometimes running it shows following errors..

Warning: [tblastn] Query is Empty!

cat: /home/sekhwalm/Downloads/bp272/ncbi-blast-2.2.31+/bin//subset_0_pep.fasta.blast: No such file or directory

rm: cannot remove ‘/home/sekhwalm/Downloads/bp272/ncbi-blast-2.2.31+/bin//subset_0_pep.fasta.blast’: No such file or directory

Originally posted by GenoMax View Post

Until you fix the PATH this is not going to work for any blast type. Add "/Downloads/bp272/ncbi-blast-2.2.31+/bin" this directory to your PATH following instructions I had linked in a post above.

If you are serious about learning this then spend a bit of time here understanding some basic unix: http://korflab.ucdavis.edu/Unix_and_...ent.html#part1

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News