Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why do we use mapping programs instead of blast for mapping to a reference?

    Hi guys,

    I am wondering why people use mapping programs such as bwa and maq for mapping to a reference? I think blast also search mapping positions with some mismatches and INDELs.
    Sorry for a foolish question, but what is reason?

  • #2
    it's faster than blast for small sequence (cpu and memory optimized)

    Comment


    • #3
      Also, they're more sensitive.
      Blast typically needs a number of 'high scoring segment pairs' to even start considering an alignment.

      Comment


      • #4
        Blast is just too slow - 100 million reads against a big genome would take days even on a large cluster.

        Blat is fine for 454 reads.
        --
        Jeremy Leipzig
        Bioinformatics Programmer
        --
        My blog
        Twitter

        Comment


        • #5
          blastn for DNA alignments can be sensitive if the right parameters are chosen (small word size in particular). It can find an alignment of a 42-mer with a multiple mismatches AND gaps. For example, using blastn with a word size of 11 to align 42-mers to a database of all human transcripts finds alignments with up to 6 mismatches and 2 gaps. Some next-gen aligners have arbitrary limits on the number of mismatches in a single read. Furthermore some next-gen aligners will fail to find an alignment if a mismatch or gap (or more than one of these) occurs within the beginning of the read, as this portion is used as a seed. Another advantage of blast is that all alignments are returned. If a read has 1000 alignments, 1000 alignments are reported. Another advantage is the ability to perform sub-string alignments. If the first or last read base positions of an Illumina run have very high error rates (e.g. the first three bases of many reads in a run are garbage), you may need to trim the reads to get successful alignment with some next-gen aligners. These aligners tend to be focused on aligning the entire read length. blast will find an alignment and report what position within the read that the alignment start and ends. Another advantage of BLAST is a more sensible treatment of N's. Some of the next-gen. aligners store bases in 2-bit format. Meaning they can only internally represent A,T,C,G. The solution is to randomly assign N's to one of the other bases, a solution that some may find imperfect.

          As the other posts have indicated. All of these apparent advantages are trumped by the computational issue. BLAST is simply too slow. Speed is the main driving force behind the recent proliferation of aligners. And many of the advantages of BLAST suggested above are gradually being addressed by next-gen aligners...

          Comment


          • #6
            Originally posted by malachig View Post
            blastn for DNA alignments can be sensitive if the right parameters are chosen (small word size in particular). It can find an alignment of a 42-mer with a multiple mismatches AND gaps. For example, using blastn with a word size of 11 to align 42-mers to a database of all human transcripts finds alignments with up to 6 mismatches and 2 gaps. Some next-gen aligners have arbitrary limits on the number of mismatches in a single read. Furthermore some next-gen aligners will fail to find an alignment if a mismatch or gap (or more than one of these) occurs within the beginning of the read, as this portion is used as a seed. Another advantage of blast is that all alignments are returned. If a read has 1000 alignments, 1000 alignments are reported. Another advantage is the ability to perform sub-string alignments. If the first or last read base positions of an Illumina run have very high error rates (e.g. the first three bases of many reads in a run are garbage), you may need to trim the reads to get successful alignment with some next-gen aligners. These aligners tend to be focused on aligning the entire read length. blast will find an alignment and report what position within the read that the alignment start and ends. Another advantage of BLAST is a more sensible treatment of N's. Some of the next-gen. aligners store bases in 2-bit format. Meaning they can only internally represent A,T,C,G. The solution is to randomly assign N's to one of the other bases, a solution that some may find imperfect.

            As the other posts have indicated. All of these apparent advantages are trumped by the computational issue. BLAST is simply too slow. Speed is the main driving force behind the recent proliferation of aligners. And many of the advantages of BLAST suggested above are gradually being addressed by next-gen aligners...
            Good summary!
            Might I add that some of the limitations of short read mappers can also be addressed post mapping like using GATK's Local realigner
            http://kevin-gattaca.blogspot.com/

            Comment


            • #7
              Blast has other problems for short reads in addition to speed. Let's take 32bp reads as a little extreme example (32bp reads are rarely produced nowadays). By default, blast finds 11-mer exact hits as seeds. If two mismatches happen to occur at the 11th and the 22nd position, blast will not be able to find the hit. It cannot achieve the full sensitivity by eland/maq/bwa/soap2 (by default, bowtie does not guarantee full sensitivity). Although blast can find 3,4,5-mismatch hits by chance (again not fully sensitive), these hits are more likely to be artifacts especially when 2-mismatch hits are not guaranteed to be found. Slightly modified eland can also find a fraction of 3-mismatch hits.

              Another problem with blast lies right in its local alignment. Suppose a true mutation occurs at the 4th bp of a read. Blast will trim off the first 4bp in alignment (by default, match=1 and mismatch=-3). Then you will see more reference bases mapped than alternate bases. This is reference bias. Although global-local alignment like eland has other problems (e.g. unalignable indels), it is less affected by this bias.

              The two problems will be greatly alleviated by longer reads. For 100bp reads, I would guess the problems above are minor, but for 32bp reads, those short read aligners are better in almost all ways (faster, more sensitive and less bias). As to N, capable aligners (e.g. novoalign) do not have any problem with that. They may take the advantages of ambiguous base like R. I do not know if blast will do.

              If we build index for the genome, the very inefficiency of blast comes from the fact that it loads only ONE read into memory, scans through the whole genome and then output. Most of scan is a purely waste of time. A better way to use blast is to concatenate multiple short sequences into one. Speed can be dramatically improved, although still much slower than modern aligners. I think the blast group have already noticed this trick in blast+.
              Last edited by lh3; 08-27-2010, 09:03 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X