SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Annotating reference followed by mapping? tboothby Bioinformatics 0 12-22-2011 10:21 AM
Why don't mapping programs map directly into BAM format? oiiio Bioinformatics 4 11-03-2011 05:01 AM
Programs for 454 data sequence alignment and mapping Abishai3911 454 Pyrosequencing 0 06-30-2011 03:12 PM
Can BLAST do the RNA-seq reads mapping efficiently? nivea Bioinformatics 10 05-02-2011 09:33 AM
reference mapping dina Bioinformatics 0 10-03-2009 07:18 AM

Reply
 
Thread Tools
Old 08-26-2010, 06:13 AM   #1
thsuk1
Junior Member
 
Location: blacksburg

Join Date: Jun 2010
Posts: 7
Default Why do we use mapping programs instead of blast for mapping to a reference?

Hi guys,

I am wondering why people use mapping programs such as bwa and maq for mapping to a reference? I think blast also search mapping positions with some mismatches and INDELs.
Sorry for a foolish question, but what is reason?
thsuk1 is offline   Reply With Quote
Old 08-26-2010, 06:36 AM   #2
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

it's faster than blast for small sequence (cpu and memory optimized)
NicoBxl is offline   Reply With Quote
Old 08-26-2010, 07:07 AM   #3
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Also, they're more sensitive.
Blast typically needs a number of 'high scoring segment pairs' to even start considering an alignment.
ffinkernagel is offline   Reply With Quote
Old 08-26-2010, 07:49 AM   #4
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

Blast is just too slow - 100 million reads against a big genome would take days even on a large cluster.

Blat is fine for 454 reads.
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 08-26-2010, 11:36 AM   #5
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

blastn for DNA alignments can be sensitive if the right parameters are chosen (small word size in particular). It can find an alignment of a 42-mer with a multiple mismatches AND gaps. For example, using blastn with a word size of 11 to align 42-mers to a database of all human transcripts finds alignments with up to 6 mismatches and 2 gaps. Some next-gen aligners have arbitrary limits on the number of mismatches in a single read. Furthermore some next-gen aligners will fail to find an alignment if a mismatch or gap (or more than one of these) occurs within the beginning of the read, as this portion is used as a seed. Another advantage of blast is that all alignments are returned. If a read has 1000 alignments, 1000 alignments are reported. Another advantage is the ability to perform sub-string alignments. If the first or last read base positions of an Illumina run have very high error rates (e.g. the first three bases of many reads in a run are garbage), you may need to trim the reads to get successful alignment with some next-gen aligners. These aligners tend to be focused on aligning the entire read length. blast will find an alignment and report what position within the read that the alignment start and ends. Another advantage of BLAST is a more sensible treatment of N's. Some of the next-gen. aligners store bases in 2-bit format. Meaning they can only internally represent A,T,C,G. The solution is to randomly assign N's to one of the other bases, a solution that some may find imperfect.

As the other posts have indicated. All of these apparent advantages are trumped by the computational issue. BLAST is simply too slow. Speed is the main driving force behind the recent proliferation of aligners. And many of the advantages of BLAST suggested above are gradually being addressed by next-gen aligners...
malachig is offline   Reply With Quote
Old 08-27-2010, 12:41 AM   #6
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 203
Default

Quote:
Originally Posted by malachig View Post
blastn for DNA alignments can be sensitive if the right parameters are chosen (small word size in particular). It can find an alignment of a 42-mer with a multiple mismatches AND gaps. For example, using blastn with a word size of 11 to align 42-mers to a database of all human transcripts finds alignments with up to 6 mismatches and 2 gaps. Some next-gen aligners have arbitrary limits on the number of mismatches in a single read. Furthermore some next-gen aligners will fail to find an alignment if a mismatch or gap (or more than one of these) occurs within the beginning of the read, as this portion is used as a seed. Another advantage of blast is that all alignments are returned. If a read has 1000 alignments, 1000 alignments are reported. Another advantage is the ability to perform sub-string alignments. If the first or last read base positions of an Illumina run have very high error rates (e.g. the first three bases of many reads in a run are garbage), you may need to trim the reads to get successful alignment with some next-gen aligners. These aligners tend to be focused on aligning the entire read length. blast will find an alignment and report what position within the read that the alignment start and ends. Another advantage of BLAST is a more sensible treatment of N's. Some of the next-gen. aligners store bases in 2-bit format. Meaning they can only internally represent A,T,C,G. The solution is to randomly assign N's to one of the other bases, a solution that some may find imperfect.

As the other posts have indicated. All of these apparent advantages are trumped by the computational issue. BLAST is simply too slow. Speed is the main driving force behind the recent proliferation of aligners. And many of the advantages of BLAST suggested above are gradually being addressed by next-gen aligners...
Good summary!
Might I add that some of the limitations of short read mappers can also be addressed post mapping like using GATK's Local realigner
http://www.broadinstitute.org/gsa/wi..._around_indels
KevinLam is offline   Reply With Quote
Old 08-27-2010, 09:54 AM   #7
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Blast has other problems for short reads in addition to speed. Let's take 32bp reads as a little extreme example (32bp reads are rarely produced nowadays). By default, blast finds 11-mer exact hits as seeds. If two mismatches happen to occur at the 11th and the 22nd position, blast will not be able to find the hit. It cannot achieve the full sensitivity by eland/maq/bwa/soap2 (by default, bowtie does not guarantee full sensitivity). Although blast can find 3,4,5-mismatch hits by chance (again not fully sensitive), these hits are more likely to be artifacts especially when 2-mismatch hits are not guaranteed to be found. Slightly modified eland can also find a fraction of 3-mismatch hits.

Another problem with blast lies right in its local alignment. Suppose a true mutation occurs at the 4th bp of a read. Blast will trim off the first 4bp in alignment (by default, match=1 and mismatch=-3). Then you will see more reference bases mapped than alternate bases. This is reference bias. Although global-local alignment like eland has other problems (e.g. unalignable indels), it is less affected by this bias.

The two problems will be greatly alleviated by longer reads. For 100bp reads, I would guess the problems above are minor, but for 32bp reads, those short read aligners are better in almost all ways (faster, more sensitive and less bias). As to N, capable aligners (e.g. novoalign) do not have any problem with that. They may take the advantages of ambiguous base like R. I do not know if blast will do.

If we build index for the genome, the very inefficiency of blast comes from the fact that it loads only ONE read into memory, scans through the whole genome and then output. Most of scan is a purely waste of time. A better way to use blast is to concatenate multiple short sequences into one. Speed can be dramatically improved, although still much slower than modern aligners. I think the blast group have already noticed this trick in blast+.

Last edited by lh3; 08-27-2010 at 10:03 AM.
lh3 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:40 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO