Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why do we use mapping programs instead of blast for mapping to a reference?

    Hi guys,

    I am wondering why people use mapping programs such as bwa and maq for mapping to a reference? I think blast also search mapping positions with some mismatches and INDELs.
    Sorry for a foolish question, but what is reason?

  • #2
    it's faster than blast for small sequence (cpu and memory optimized)

    Comment


    • #3
      Also, they're more sensitive.
      Blast typically needs a number of 'high scoring segment pairs' to even start considering an alignment.

      Comment


      • #4
        Blast is just too slow - 100 million reads against a big genome would take days even on a large cluster.

        Blat is fine for 454 reads.
        --
        Jeremy Leipzig
        Bioinformatics Programmer
        --
        My blog
        Twitter

        Comment


        • #5
          blastn for DNA alignments can be sensitive if the right parameters are chosen (small word size in particular). It can find an alignment of a 42-mer with a multiple mismatches AND gaps. For example, using blastn with a word size of 11 to align 42-mers to a database of all human transcripts finds alignments with up to 6 mismatches and 2 gaps. Some next-gen aligners have arbitrary limits on the number of mismatches in a single read. Furthermore some next-gen aligners will fail to find an alignment if a mismatch or gap (or more than one of these) occurs within the beginning of the read, as this portion is used as a seed. Another advantage of blast is that all alignments are returned. If a read has 1000 alignments, 1000 alignments are reported. Another advantage is the ability to perform sub-string alignments. If the first or last read base positions of an Illumina run have very high error rates (e.g. the first three bases of many reads in a run are garbage), you may need to trim the reads to get successful alignment with some next-gen aligners. These aligners tend to be focused on aligning the entire read length. blast will find an alignment and report what position within the read that the alignment start and ends. Another advantage of BLAST is a more sensible treatment of N's. Some of the next-gen. aligners store bases in 2-bit format. Meaning they can only internally represent A,T,C,G. The solution is to randomly assign N's to one of the other bases, a solution that some may find imperfect.

          As the other posts have indicated. All of these apparent advantages are trumped by the computational issue. BLAST is simply too slow. Speed is the main driving force behind the recent proliferation of aligners. And many of the advantages of BLAST suggested above are gradually being addressed by next-gen aligners...

          Comment


          • #6
            Originally posted by malachig View Post
            blastn for DNA alignments can be sensitive if the right parameters are chosen (small word size in particular). It can find an alignment of a 42-mer with a multiple mismatches AND gaps. For example, using blastn with a word size of 11 to align 42-mers to a database of all human transcripts finds alignments with up to 6 mismatches and 2 gaps. Some next-gen aligners have arbitrary limits on the number of mismatches in a single read. Furthermore some next-gen aligners will fail to find an alignment if a mismatch or gap (or more than one of these) occurs within the beginning of the read, as this portion is used as a seed. Another advantage of blast is that all alignments are returned. If a read has 1000 alignments, 1000 alignments are reported. Another advantage is the ability to perform sub-string alignments. If the first or last read base positions of an Illumina run have very high error rates (e.g. the first three bases of many reads in a run are garbage), you may need to trim the reads to get successful alignment with some next-gen aligners. These aligners tend to be focused on aligning the entire read length. blast will find an alignment and report what position within the read that the alignment start and ends. Another advantage of BLAST is a more sensible treatment of N's. Some of the next-gen. aligners store bases in 2-bit format. Meaning they can only internally represent A,T,C,G. The solution is to randomly assign N's to one of the other bases, a solution that some may find imperfect.

            As the other posts have indicated. All of these apparent advantages are trumped by the computational issue. BLAST is simply too slow. Speed is the main driving force behind the recent proliferation of aligners. And many of the advantages of BLAST suggested above are gradually being addressed by next-gen aligners...
            Good summary!
            Might I add that some of the limitations of short read mappers can also be addressed post mapping like using GATK's Local realigner
            http://kevin-gattaca.blogspot.com/

            Comment


            • #7
              Blast has other problems for short reads in addition to speed. Let's take 32bp reads as a little extreme example (32bp reads are rarely produced nowadays). By default, blast finds 11-mer exact hits as seeds. If two mismatches happen to occur at the 11th and the 22nd position, blast will not be able to find the hit. It cannot achieve the full sensitivity by eland/maq/bwa/soap2 (by default, bowtie does not guarantee full sensitivity). Although blast can find 3,4,5-mismatch hits by chance (again not fully sensitive), these hits are more likely to be artifacts especially when 2-mismatch hits are not guaranteed to be found. Slightly modified eland can also find a fraction of 3-mismatch hits.

              Another problem with blast lies right in its local alignment. Suppose a true mutation occurs at the 4th bp of a read. Blast will trim off the first 4bp in alignment (by default, match=1 and mismatch=-3). Then you will see more reference bases mapped than alternate bases. This is reference bias. Although global-local alignment like eland has other problems (e.g. unalignable indels), it is less affected by this bias.

              The two problems will be greatly alleviated by longer reads. For 100bp reads, I would guess the problems above are minor, but for 32bp reads, those short read aligners are better in almost all ways (faster, more sensitive and less bias). As to N, capable aligners (e.g. novoalign) do not have any problem with that. They may take the advantages of ambiguous base like R. I do not know if blast will do.

              If we build index for the genome, the very inefficiency of blast comes from the fact that it loads only ONE read into memory, scans through the whole genome and then output. Most of scan is a purely waste of time. A better way to use blast is to concatenate multiple short sequences into one. Speed can be dramatically improved, although still much slower than modern aligners. I think the blast group have already noticed this trick in blast+.
              Last edited by lh3; 08-27-2010, 09:03 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              43 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              29 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Working...
              X