Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST to Sourceforge.net

    We have decided to move BFAST to the sourceforge.net website. We invite people to submit questions, discussions, or other input either to the BFAST sourceforge mailing lists, to the BFAST Bug Tracker, or to this seqanswers.com.

    BFAST facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include:

    * Speed: enables billions of short reads to be mapped quickly.
    * Accuracy: A priori probabilities for mapping reads with defined set of variants.
    * An easy way to measurably tune accuracy at the expense of speed.

    Specifically, BFAST was designed to facilitate whole-genome resequencing, where mapping billions of short reads with variants is of utmost importance.

    BFAST supports both Illumina and ABI SOLiD data, as well as any other Next-Generation Sequencing Technology (454, Helicos), with particular emphasis on sensitivity towards errors, SNPs and especially indels. Other algorithms take short-cuts by ignoring errors, certain types of variants (indels), and even require further alignment, all to be the "fastest" (but still not complete). BFAST is able to be tuned to find variants regardless of the error-rate, polymorphism rate, or other factors.

    Nils Homer

  • #2
    How does Bfast compare to other mapping tools like Bowtie, BWA, Maq, Zoom, etc?

    Comment


    • #3
      Originally posted by maasha View Post
      How does Bfast compare to other mapping tools like Bowtie, BWA, Maq, Zoom, etc?
      I have compared BFAST to all of the above (including Zoom, which is a commercial product) as well as others (BLAT, SHRiMP, SOAP), and it is much more sensitive/robust to errors and variants, especially indels (>10bp), while having comparable or better accuracy (paper in review). If you don't search for variants you will never find them. The high sensitivity has benefits with ABI SOLiD data, where the color error rate can be greater than 10%, so to properly identify the errors as well as find variants, sensitivity is of the utmost importance. Although BFAST can be flexibly tuned, trading off speed for sensitivity, it is slower than say Bowtie (no ABI support) or BWA when the sensitivity is at the recommended settings, but does find variants (based on empirical and simulated data). In the speed regard, if we ask what aligner is the fastest when searching for SNPS and indels all in the presence of errors, then in my (biased) opinion, it is BFAST.

      My point really is, if you want to find only perfect matches to the genome, then you can design a fast algorithm for that. If you want to find only SNPs where the data has <2% error, it is clear what shortcuts can be taken. If you want to align any type of data searching for SNPs and indels and make the aligner tunable, then you arrive at BFAST.

      I would be happy to share my results to you in private (as the paper is in review) so PM me if desired.


      Nils

      Comment


      • #4
        Originally posted by nilshomer View Post
        I have compared BFAST to all of the above (including Zoom, which is a commercial product) as well as others (BLAT, SHRiMP, SOAP), and it is much more sensitive/robust to errors and variants, especially indels (>10bp), while having comparable or better accuracy (paper in review). If you don't search for variants you will never find them. The high sensitivity has benefits with ABI SOLiD data, where the color error rate can be greater than 10%, so to properly identify the errors as well as find variants, sensitivity is of the utmost importance. Although BFAST can be flexibly tuned, trading off speed for sensitivity, it is slower than say Bowtie (no ABI support) or BWA when the sensitivity is at the recommended settings, but does find variants (based on empirical and simulated data). In the speed regard, if we ask what aligner is the fastest when searching for SNPS and indels all in the presence of errors, then in my (biased) opinion, it is BFAST.

        My point really is, if you want to find only perfect matches to the genome, then you can design a fast algorithm for that. If you want to find only SNPs where the data has <2% error, it is clear what shortcuts can be taken. If you want to align any type of data searching for SNPs and indels and make the aligner tunable, then you arrive at BFAST.

        I would be happy to share my results to you in private (as the paper is in review) so PM me if desired.


        Nils
        Is BFAST good enough for calling >10bp indels or local assembly is still preferred? Also, how does it compare with bwa for <4bp indels?

        Comment


        • #5
          Originally posted by ech View Post
          Is BFAST good enough for calling >10bp indels or local assembly is still preferred? Also, how does it compare with bwa for <4bp indels?
          I also put up a BFAST Server version, where you can have a local web-server running BFAST and an interactive web page (inspired by the UCSC BLAT). It handles both Illumina and ABI SOLiD data natively. I put up a BFAST Server for you to see (click here), since our normal BFAST Server website is down (click here).

          For >10bp indels, it can be tuned to have any power depending on the error and polymorphism rate, with the power increasing obviously for longer reads (more room for the indel, especially insertions). Compared to BWA, which states it should be used on data with <%2 error, it performs similarly (>95% power) with <4bp indels, but excels in scenarios where there is a non-trivial error-rate (>2%) and/or when there is an indel and a SNP. In our own human reseq experiments, we found a 10bp deletion and a SNP 4bp downstream, which was validated with sanger sequencing etc. The biggest increase in robustness/sensitivity is with ABI SOLiD data due to the complete gapped local alignment (see Paper)

          I think there is still room for micro-reassembly. For example, although the reads may be mapped to the correct location, their local alignment may be wrong given an insertion or a deletion breakpoint near the either end of the read. I will let you ponder over why this is the case.

          Comment


          • #6
            I have a question about how BFAST/BFAST-BWA handles SNPs vs. read errors for AB-SOLiD (CS) reads.

            On viewing the resulting aligned mappings (in base space), do single base differences to the reference represent SNPs? That is, are they a result of detecting an appropriate 2-color mismatch, with single (or more) color mismatches identified as read errors and appropriately "corrected"?

            Comment


            • #7
              Yes, see the accompanying papers for information.

              Comment


              • #8
                Originally posted by nilshomer View Post
                Yes, see the accompanying papers for information.
                Thanks. I assume you mean the paper linked to a few posts back? I'm looking at this now.

                I read through the original paper (SHRiMP: Accurate Mapping of Short Color-space Reads), which has been the only one I've looked at so far that is specifically concerned with the notion that the read color space is degenerate (i.e. reads could in theory map to 4 alternative sequences in the reference). However, although the theory/method is presented I was confused how the actual reads in base space are finally output. For example, are corrected read errors marked in some way. Or if base inserts are ever discarded.
                Last edited by Guidobot; 03-31-2011, 09:56 AM. Reason: Never mind

                Comment


                • #9
                  Read some more . I have published two papers with descriptions and you can also take a look at the BWA (short) paper. Note that the adapter will reduce the four to one.

                  Comment


                  • #10
                    Originally posted by nilshomer View Post
                    Read some more . I have published two papers with descriptions and you can also take a look at the BWA (short) paper. Note that the adapter will reduce the four to one.
                    Cheers. I understand how BFAST could use the adapter (base) to define a specific (nt) read sequence, although BWA and MAQ appear to ignore this in translation to base space (and reduce the effective read length by 2 in the process). As a programmer I get curious about some of the implementation details but will continue reading.

                    Edit: I originally read your paper (BFAST: An Alignment Tool for Large Scale Genome Resequencing) and misinterpreted the statement "...each genomic read offset is artificially started with an A base to mimic the process of decoding...", thinking this meant the adapter base (e.g. in the csfasta file) was ignored.

                    Btw, in an experiment I did with the Streptococcus suis genome and SOLiD SE reads I found that BFAST mapped 2.34% more reads than BWA, which includes a correction for the reads BWA mapped to repeated regions. (I used the recommended 10 seeds but because my PC had only 2Gb RAM I used a index word size of 12.)
                    Last edited by Guidobot; 04-01-2011, 07:11 AM. Reason: Added edit note

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advancing Precision Medicine for Rare Diseases in Children
                      by seqadmin




                      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                      12-16-2024, 07:57 AM
                    • seqadmin
                      Recent Advances in Sequencing Technologies
                      by seqadmin



                      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                      Long-Read Sequencing
                      Long-read sequencing has seen remarkable advancements,...
                      12-02-2024, 01:49 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 12-17-2024, 10:28 AM
                    0 responses
                    33 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-13-2024, 08:24 AM
                    0 responses
                    48 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-12-2024, 07:41 AM
                    0 responses
                    34 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-11-2024, 07:45 AM
                    0 responses
                    46 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X