SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 08:04 AM
blast against Repbase using Repeatmasker or blast lran2008 Bioinformatics 1 08-16-2013 06:52 AM
BLAST+ vs BLASTALL (legacy BLAST) Symphysodon Bioinformatics 4 10-25-2011 02:52 PM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 05:20 AM
SequelPrep alternative Seqasaurus 454 Pyrosequencing 1 11-02-2010 03:37 AM

Reply
 
Thread Tools
Old 11-21-2013, 02:00 PM   #1
Will Nelson
Member
 
Location: Arizona

Join Date: Nov 2010
Posts: 16
Default Alternative to blast?

Is it just me or does blast seem increasingly to be out of date and a major bottleneck for RNA-seq applications?

The most popular aligners currently (e.g. bowtie, blast) trade off speed for low memory use, but now memory is cheap. There are very fast, memory-intensive aligners for some problems (e.g. star, mummer) but I don't yet know of one that can replace blast for basic problems such as annotating transcripts against a protein database. This basic operation takes us sometimes two weeks using blastx on a 24-cpu machine, which isn't really sustainable for RNA-seq processing.

So my question is, does anyone know of a better aligner for this problem, and does anyone else agree that someone *should* create an aligner that is more adapted to current hardware costs?
Will Nelson is offline   Reply With Quote
Old 11-21-2013, 02:32 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Have you used blat? http://genome.ucsc.edu/FAQ/FAQblat.html If you are looking for homologous matches this may be an option (but not against a huge db like genpept but if you are going against a proteome it would be fine).

You should also specify what DB you are using to blastx against for the 2 week (24 cores?) run. Are you using some kind of parallel method for that search or is it a serial job?
GenoMax is offline   Reply With Quote
Old 11-21-2013, 03:02 PM   #3
Will Nelson
Member
 
Location: Arizona

Join Date: Nov 2010
Posts: 16
Default

Blat is definitely better since it keeps the index in memory but it doesn't have the blastx mode (does it??) and it could still be much faster e.g. by using a suffix tree.

What we usually do is blast say 100k or 200k transcripts against the Uniprot taxonomic subsets and some smaller databases. The bacterial Uniprot is the biggest and the one that takes the longest. Usually we allocate one CPU to each database target, so we could improve on that by also threading the larger targets using blast's own threading.

But it doesn't change the fact that blast is built on an indexing strategy which economizes memory more than necessary, with consequent reduction in speed. I would not be surprised if 100x speedup is easy to achieve with very practical memory use.
Will Nelson is offline   Reply With Quote
Old 11-21-2013, 03:59 PM   #4
fahmida
Member
 
Location: Australia

Join Date: Aug 2010
Posts: 54
Default

Quote:
Originally Posted by Will Nelson View Post
Blat is definitely better since it keeps the index in memory but it doesn't have the blastx mode (does it??) and it could still be much faster e.g. by using a suffix tree..........
Have a look below into the blat options ...

Code:
options:
   -t=type     Database type.  Type is one of:
                 dna - DNA sequence
                 prot - protein sequence
                 dnax - DNA sequence translated in six frames to protein
               The default is dna
   -q=type     Query type.  Type is one of:
                 dna - DNA sequence
                 rna - RNA sequence
                 prot - protein sequence
                 dnax - DNA sequence translated in six frames to protein
                 rnax - DNA sequence translated in three frames to protein
fahmida is offline   Reply With Quote
Old 11-21-2013, 06:29 PM   #5
Will Nelson
Member
 
Location: Arizona

Join Date: Nov 2010
Posts: 16
Default

True, true...I haven't messed with blast for a while.

But look: this is a 10+ year-old program which has not been updated in forever. It doesn't thread. If you want to use it in blastx mode, then the proteins have to be the *query*, meaning they are streamed and not indexed, which is extremely inefficient for search a large protein DB. Moreover blat uses a seed index rather than the more efficient suffix tree....again trading off time for memory.

Blat is more scalable than blast, or it would be if the two problems above were addressed, but it certainly is nowhere near the best one can do, either for standalone usage, or much less as the engine of a large-scale cloud annotation service.

Next-gen sequencing needs a next-gen alignment solution. One of the groups with serious experience at this needs to step up and build something better.
Will Nelson is offline   Reply With Quote
Old 11-21-2013, 07:00 PM   #6
Dario1984
Senior Member
 
Location: Sydney, Australia

Join Date: Jun 2011
Posts: 166
Default

LAST is good for homologous sequences.
Dario1984 is offline   Reply With Quote
Old 11-21-2013, 11:18 PM   #7
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

For a blast alternative, how about usearch?

However, it sounds to me like OP isn't setting up his blasts properly. 200k queries against some subset of uniprot (or even the whole thing) with 24 cores shouldn't take even one day given sufficient RAM..
__________________
savetherhino.org

Last edited by rhinoceros; 11-21-2013 at 11:55 PM.
rhinoceros is offline   Reply With Quote
Old 11-22-2013, 08:23 PM   #8
jimmybee
Senior Member
 
Location: Adelaide, Australia

Join Date: Sep 2010
Posts: 119
Default

For BLASTX use pauda

http://ab.inf.uni-tuebingen.de/software/pauda/
jimmybee is offline   Reply With Quote
Old 11-25-2013, 06:53 AM   #9
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by Will Nelson View Post
True, true...I haven't messed with blast for a while.

But look: this is a 10+ year-old program which has not been updated in forever. It doesn't thread.
Wrong on both parts. Blast does thread. And improvements to it are on-going. Just because a program was created 10+ years ago does not make it obsolete.

Being current and multi-threaded doesn't neccessarily make Blast the best solution however I agree with 'rhinoceros' -- 200K queries vs uniprot using 24 core should not take too long. I routinely annotate large rnaSeq results via Blast. This gives at least a 'first-pass' level of annotation. What I have given up on is Blast2Go; that program is way too slow for a large number of reads.
westerman is offline   Reply With Quote
Old 11-26-2013, 05:25 AM   #10
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Pauda might be very interesting for you as suggested by jimmybee

Otherwise Gblast was just mentioned here
http://seqanswers.com/forums/showthread.php?t=36190
colindaven is offline   Reply With Quote
Reply

Tags
blast, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO