Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    you wonder then, why the program isn't being doing this splitting by itself

    Comment


    • #32
      There are 4 main steps in blastn.
      1.Prepare the hash table with mask data.
      2.Scan the hits in the database. And the -thread_num command only useful in this step.
      3.Trace back the result in the database.
      4.Print the result.

      -thread_num command (multi-thread version in step 2) is better than multi-progress. Multi-progress will load database, mask database into RAM by each progress.

      Our G-Blastn which speed up the scan step in GPU and speed up the trace back step by SSE, change the framework into pipeline, each step can be overlapped.

      You can find the source code and release 1.0 on

      and
      Download GBLASTN for free. G-BLASTN is a GPU-accelerated nucleotide alignment tool. G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST. G-BLASTN can produce exactly the same results as NCBI-BLAST, and it also has very similar user commands.

      G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST. - GitHub - OpenHero/gblastn: G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widel...

      Comment


      • #33
        I had a speed up problem with blast+, too. I analysed the cpu usage for mulitthreading option in the "old" blast and blast+ and I saw that the multithreading option was not efficient for my dataset. So I parallized it with a perl script so increase the speed. Maybe thats an option to speed up blast+ runs for you.
        The original reply from the blast team was:
        ...The overall total CPU time was about 160 minutes for both runs, but the blastall application did finish in less time than the BLAST+ application. We will work on improving the the parallelization of BLAST+ for this case. I've also looked at some test cases against our nt database, but blastall and the BLAST+ application did equally well on the parallelization.
        Maybe this helps... If someone is interested in the script, just ask me.

        Comment


        • #34
          I;d certainly try - can you post it?

          Comment


          • #35
            It's not the most elegant script, but it works fine for an all-vs-all blast. It includes making databases. You need Bio::SeqIO;Parallel::ForkManager; Time::Local. Hope it helps:
            Code:
            #!/usr/bin/perl -w
            use strict;
            use Bio::SeqIO;
            use Parallel::ForkManager;
            use Time::Local;
            #############################
            #USAGE: perl script.pl blastmethod directory_of_fasta_files  eval outfmt number_of_cpus
            #e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10
            #############################
            # author: Ulrike Loeber ([email protected])
            #This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+;
            
            my $blast_method	=	$ARGV[0];
            my $seq_dir			=	$ARGV[1];
            my $evalue			=	$ARGV[2];
            my $outfmt			=	$ARGV[3];
            my $outdir			=	$ARGV[4];
            my $cpus			=	$ARGV[5];
            
            opendir (INDIR, $seq_dir) or die $!;
            my @files=grep /\.fasta$/ , readdir (INDIR);    #greps every file in the determined directory which end with "fasta"
            close INDIR;
            #one cpu is used for perl, so the number of cpus left is $seq_dir-1
            my $numberOfProcesses=($cpus-1);
            my $subsets=$numberOfProcesses; 	#build as many subsets as free cpus
            my $manager = new Parallel::ForkManager( $numberOfProcesses );
            foreach my $file(@files){
            	my $time=localtime();
            	print "#########PROCESSING##########\n $file\t $time\n";
            	my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file",	
            			-format => "fasta");
            
            			my $seq;
            			my @seq_array;
            	while( $seq = $input->next_seq() ) {
            		push(@seq_array,$seq);
            	}
            	
            
            	my $numberofsequences=@seq_array;
            	system "makeblastdb -dbtype nucl -in $seq_dir/$file";
            	my $loops=$numberofsequences/$subsets;	#is 1/(times) of the number of sequences
            	for (my $j=0;$j<$subsets;$j++){			#creates as many files as subsets to build and loops as many times x
            		open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!;	#creates a file which is named like the infile with subset_ in front of it
            		for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){	#loops through 1/x  of the sequences 
            			$seq=$seq_array[$i];
            			my $id=$seq->id();
            			my $sequence=$seq->seq();
            			print OUTFILE ">$id\n$sequence\n";
            		}
            		close OUTFILE;
            		$manager->start and next;   
            		system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast ";
            		$manager->finish;	
            	}	
            	print "#########END##########\n $file\t $time\n";
            }
            #cleaning up directory
            $manager->wait_all_children;
            foreach my $file(@files){
            	my $outfile=$file;
            	$outfile=~s/fasta/blast/g;
            	system "touch $outdir/$outfile";						#creates one outfile per fasta file
            	for (my $j=0;$j<$subsets;$j++){
            		system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile";	#concatenates subfile results to one blast result
            		system "rm $outdir/subset_$j\_$file.blast";	#removes subset blast results
            		system "rm $outdir/subset_$j\_$file";			#removes data subsets
            		}	
            	system "rm $outdir/$file.nhr";
            	system "rm $outdir/$file.nin";
            	system "rm $outdir/$file.nsq";
            	#print "#########COMPLETE##########\n $file\t $time\n";
            }
            Last edited by uloeber; 11-12-2013, 10:30 AM. Reason: forgot package

            Comment


            • #36
              Thanks! I give it try.

              Comment


              • #37
                I am using following script to speed up my query at tblastn, hence it is showing following error...

                Can't exec "makeblastdb": No such file or directory at blast.pl line 42, <GEN0> line 42132.

                #############################

                #!/usr/bin/perl -w
                use strict;
                use Bio::SeqIO;
                use Parallel::ForkManager;
                use Time::Local;
                #############################
                #USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus
                #e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10
                #############################
                # author: Ulrike Loeber ([email protected])
                #This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+;


                my $blast_method = $ARGV[0];
                my $seq_dir = $ARGV[1];
                my $evalue = $ARGV[2];
                my $outfmt = $ARGV[3];
                my $outdir = $ARGV[4];
                my $cpus = $ARGV[5];

                opendir (INDIR, $seq_dir) or die $!;
                my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta"
                close INDIR;
                #one cpu is used for perl, so the number of cpus left is $seq_dir-1
                my $numberOfProcesses=($cpus-1);
                my $subsets=$numberOfProcesses; #build as many subsets as free cpus
                my $manager = new Parallel::ForkManager( $numberOfProcesses );
                foreach my $file(@files){
                my $time=localtime();
                print "#########PROCESSING##########\n $file\t $time\n";
                my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file",
                -format => "fasta");

                my $seq;
                my @seq_array;
                while( $seq = $input->next_seq() ) {
                push(@seq_array,$seq);
                }


                my $numberofsequences=@seq_array;
                system "makeblastdb -dbtype nucl -in $seq_dir/$file";
                my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences
                for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x
                open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it
                for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences
                $seq=$seq_array[$i];
                my $id=$seq->id();
                my $sequence=$seq->seq();
                print OUTFILE ">$id\n$sequence\n";
                }
                close OUTFILE;
                $manager->start and next;
                system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast ";
                $manager->finish;
                }
                print "#########END##########\n $file\t $time\n";
                }
                #cleaning up directory
                $manager->wait_all_children;
                foreach my $file(@files){
                my $outfile=$file;
                $outfile=~s/fasta/blast/g;
                system "touch $outdir/$outfile"; #creates one outfile per fasta file
                for (my $j=0;$j<$subsets;$j++){
                system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result
                system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results
                system "rm $outdir/subset_$j\_$file"; #removes data subsets
                }
                system "rm $outdir/$file.nhr";
                system "rm $outdir/$file.nin";
                system "rm $outdir/$file.nsq";
                #print "#########COMPLETE##########\n $file\t $time\n";
                }

                Comment


                • #38
                  Do you have blast+ installed? Makeblastdb program is part of blast+ package that you can download from NCBI.

                  Comment


                  • #39
                    blast+ programs

                    I copied all blast+ programs in same directory, where my input files are, still it is showing same error.



                    Originally posted by GenoMax View Post
                    Do you have blast+ installed? Makeblastdb program is part of blast+ package that you can download from NCBI.

                    Comment


                    • #40
                      Just copying the files to local directory is not enough if that directory is not in your PATH. Append your PATH to include the directory in question.

                      Comment


                      • #41
                        Actually there are a few prerequisites, like an executable makeblastdb where ever you run the script. Do you added ncbis blast to your path if you are a non root user? Be aware of the higher memory usage. But it still might be faster. You can contact me if you have any more questions. Bests, Ulrike

                        Comment


                        • #42
                          blast+ programs

                          I am at the same path where blast+ programs are there. Can see my path following

                          /Downloads/bp272/ncbi-blast-2.2.31+/bin$ perl blast.pl tblastn /home/sekhwalm/Oryza/blast/ e-5 6 /home/sekhwalm/Oryza/blast/ 5

                          Originally posted by GenoMax View Post
                          Just copying the files to local directory is not enough if that directory is not in your PATH. Append your PATH to include the directory in question.

                          Comment


                          • #43
                            speed up tblastn query

                            Hi I am using following script to speed up my query at tblastn, hence it is showing error....

                            Can't exec "makeblastdb": No such file or directory at blast.pl line 41, <GEN0> line 42132.

                            Originally posted by uloeber View Post
                            It's not the most elegant script, but it works fine for an all-vs-all blast. It includes making databases. You need Bio::SeqIO;Parallel::ForkManager; Time::Local. Hope it helps:
                            Code:
                            #!/usr/bin/perl -w
                            use strict;
                            use Bio::SeqIO;
                            use Parallel::ForkManager;
                            use Time::Local;
                            #############################
                            #USAGE: perl script.pl blastmethod directory_of_fasta_files  eval outfmt number_of_cpus
                            #e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10
                            #############################
                            # author: Ulrike Loeber ([email protected])
                            #This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+;
                            
                            my $blast_method	=	$ARGV[0];
                            my $seq_dir			=	$ARGV[1];
                            my $evalue			=	$ARGV[2];
                            my $outfmt			=	$ARGV[3];
                            my $outdir			=	$ARGV[4];
                            my $cpus			=	$ARGV[5];
                            
                            opendir (INDIR, $seq_dir) or die $!;
                            my @files=grep /\.fasta$/ , readdir (INDIR);    #greps every file in the determined directory which end with "fasta"
                            close INDIR;
                            #one cpu is used for perl, so the number of cpus left is $seq_dir-1
                            my $numberOfProcesses=($cpus-1);
                            my $subsets=$numberOfProcesses; 	#build as many subsets as free cpus
                            my $manager = new Parallel::ForkManager( $numberOfProcesses );
                            foreach my $file(@files){
                            	my $time=localtime();
                            	print "#########PROCESSING##########\n $file\t $time\n";
                            	my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file",	
                            			-format => "fasta");
                            
                            			my $seq;
                            			my @seq_array;
                            	while( $seq = $input->next_seq() ) {
                            		push(@seq_array,$seq);
                            	}
                            	
                            
                            	my $numberofsequences=@seq_array;
                            	system "makeblastdb -dbtype nucl -in $seq_dir/$file";
                            	my $loops=$numberofsequences/$subsets;	#is 1/(times) of the number of sequences
                            	for (my $j=0;$j<$subsets;$j++){			#creates as many files as subsets to build and loops as many times x
                            		open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!;	#creates a file which is named like the infile with subset_ in front of it
                            		for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){	#loops through 1/x  of the sequences 
                            			$seq=$seq_array[$i];
                            			my $id=$seq->id();
                            			my $sequence=$seq->seq();
                            			print OUTFILE ">$id\n$sequence\n";
                            		}
                            		close OUTFILE;
                            		$manager->start and next;   
                            		system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast ";
                            		$manager->finish;	
                            	}	
                            	print "#########END##########\n $file\t $time\n";
                            }
                            #cleaning up directory
                            $manager->wait_all_children;
                            foreach my $file(@files){
                            	my $outfile=$file;
                            	$outfile=~s/fasta/blast/g;
                            	system "touch $outdir/$outfile";						#creates one outfile per fasta file
                            	for (my $j=0;$j<$subsets;$j++){
                            		system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile";	#concatenates subfile results to one blast result
                            		system "rm $outdir/subset_$j\_$file.blast";	#removes subset blast results
                            		system "rm $outdir/subset_$j\_$file";			#removes data subsets
                            		}	
                            	system "rm $outdir/$file.nhr";
                            	system "rm $outdir/$file.nin";
                            	system "rm $outdir/$file.nsq";
                            	#print "#########COMPLETE##########\n $file\t $time\n";
                            }

                            Comment


                            • #44
                              Until you fix the PATH this is not going to work for any blast type. Add "/Downloads/bp272/ncbi-blast-2.2.31+/bin" this directory to your PATH following instructions I had linked in a post above.

                              If you are serious about learning this then spend a bit of time here understanding some basic unix: http://korflab.ucdavis.edu/Unix_and_...ent.html#part1

                              Comment


                              • #45
                                Thanks...
                                now, I got the PATH issue, and blast is running, However after sometimes running it shows following errors..

                                Warning: [tblastn] Query is Empty!

                                cat: /home/sekhwalm/Downloads/bp272/ncbi-blast-2.2.31+/bin//subset_0_pep.fasta.blast: No such file or directory


                                rm: cannot remove ‘/home/sekhwalm/Downloads/bp272/ncbi-blast-2.2.31+/bin//subset_0_pep.fasta.blast’: No such file or directory



                                Originally posted by GenoMax View Post
                                Until you fix the PATH this is not going to work for any blast type. Add "/Downloads/bp272/ncbi-blast-2.2.31+/bin" this directory to your PATH following instructions I had linked in a post above.

                                If you are serious about learning this then spend a bit of time here understanding some basic unix: http://korflab.ucdavis.edu/Unix_and_...ent.html#part1

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 08:47 AM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                54 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X