Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    BFAST to Sourceforge.net

    We have decided to move BFAST to the sourceforge.net website. We invite people to submit questions, discussions, or other input either to the BFAST sourceforge mailing lists, to the BFAST Bug Tracker, or to this seqanswers.com.

    BFAST facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include:

    * Speed: enables billions of short reads to be mapped quickly.
    * Accuracy: A priori probabilities for mapping reads with defined set of variants.
    * An easy way to measurably tune accuracy at the expense of speed.

    Specifically, BFAST was designed to facilitate whole-genome resequencing, where mapping billions of short reads with variants is of utmost importance.

    BFAST supports both Illumina and ABI SOLiD data, as well as any other Next-Generation Sequencing Technology (454, Helicos), with particular emphasis on sensitivity towards errors, SNPs and especially indels. Other algorithms take short-cuts by ignoring errors, certain types of variants (indels), and even require further alignment, all to be the "fastest" (but still not complete). BFAST is able to be tuned to find variants regardless of the error-rate, polymorphism rate, or other factors.

    Nils Homer
  • maasha
    Senior Member
    • Apr 2009
    • 153

    #2
    How does Bfast compare to other mapping tools like Bowtie, BWA, Maq, Zoom, etc?

    Comment

    • nilshomer
      Nils Homer
      • Nov 2008
      • 1283

      #3
      Originally posted by maasha View Post
      How does Bfast compare to other mapping tools like Bowtie, BWA, Maq, Zoom, etc?
      I have compared BFAST to all of the above (including Zoom, which is a commercial product) as well as others (BLAT, SHRiMP, SOAP), and it is much more sensitive/robust to errors and variants, especially indels (>10bp), while having comparable or better accuracy (paper in review). If you don't search for variants you will never find them. The high sensitivity has benefits with ABI SOLiD data, where the color error rate can be greater than 10%, so to properly identify the errors as well as find variants, sensitivity is of the utmost importance. Although BFAST can be flexibly tuned, trading off speed for sensitivity, it is slower than say Bowtie (no ABI support) or BWA when the sensitivity is at the recommended settings, but does find variants (based on empirical and simulated data). In the speed regard, if we ask what aligner is the fastest when searching for SNPS and indels all in the presence of errors, then in my (biased) opinion, it is BFAST.

      My point really is, if you want to find only perfect matches to the genome, then you can design a fast algorithm for that. If you want to find only SNPs where the data has <2% error, it is clear what shortcuts can be taken. If you want to align any type of data searching for SNPs and indels and make the aligner tunable, then you arrive at BFAST.

      I would be happy to share my results to you in private (as the paper is in review) so PM me if desired.


      Nils

      Comment

      • ech
        Junior Member
        • Jul 2009
        • 7

        #4
        Originally posted by nilshomer View Post
        I have compared BFAST to all of the above (including Zoom, which is a commercial product) as well as others (BLAT, SHRiMP, SOAP), and it is much more sensitive/robust to errors and variants, especially indels (>10bp), while having comparable or better accuracy (paper in review). If you don't search for variants you will never find them. The high sensitivity has benefits with ABI SOLiD data, where the color error rate can be greater than 10%, so to properly identify the errors as well as find variants, sensitivity is of the utmost importance. Although BFAST can be flexibly tuned, trading off speed for sensitivity, it is slower than say Bowtie (no ABI support) or BWA when the sensitivity is at the recommended settings, but does find variants (based on empirical and simulated data). In the speed regard, if we ask what aligner is the fastest when searching for SNPS and indels all in the presence of errors, then in my (biased) opinion, it is BFAST.

        My point really is, if you want to find only perfect matches to the genome, then you can design a fast algorithm for that. If you want to find only SNPs where the data has <2% error, it is clear what shortcuts can be taken. If you want to align any type of data searching for SNPs and indels and make the aligner tunable, then you arrive at BFAST.

        I would be happy to share my results to you in private (as the paper is in review) so PM me if desired.


        Nils
        Is BFAST good enough for calling >10bp indels or local assembly is still preferred? Also, how does it compare with bwa for <4bp indels?

        Comment

        • nilshomer
          Nils Homer
          • Nov 2008
          • 1283

          #5
          Originally posted by ech View Post
          Is BFAST good enough for calling >10bp indels or local assembly is still preferred? Also, how does it compare with bwa for <4bp indels?
          I also put up a BFAST Server version, where you can have a local web-server running BFAST and an interactive web page (inspired by the UCSC BLAT). It handles both Illumina and ABI SOLiD data natively. I put up a BFAST Server for you to see (click here), since our normal BFAST Server website is down (click here).

          For >10bp indels, it can be tuned to have any power depending on the error and polymorphism rate, with the power increasing obviously for longer reads (more room for the indel, especially insertions). Compared to BWA, which states it should be used on data with <%2 error, it performs similarly (>95% power) with <4bp indels, but excels in scenarios where there is a non-trivial error-rate (>2%) and/or when there is an indel and a SNP. In our own human reseq experiments, we found a 10bp deletion and a SNP 4bp downstream, which was validated with sanger sequencing etc. The biggest increase in robustness/sensitivity is with ABI SOLiD data due to the complete gapped local alignment (see Paper)

          I think there is still room for micro-reassembly. For example, although the reads may be mapped to the correct location, their local alignment may be wrong given an insertion or a deletion breakpoint near the either end of the read. I will let you ponder over why this is the case.

          Comment

          • Guidobot
            Junior Member
            • Jan 2011
            • 8

            #6
            I have a question about how BFAST/BFAST-BWA handles SNPs vs. read errors for AB-SOLiD (CS) reads.

            On viewing the resulting aligned mappings (in base space), do single base differences to the reference represent SNPs? That is, are they a result of detecting an appropriate 2-color mismatch, with single (or more) color mismatches identified as read errors and appropriately "corrected"?

            Comment

            • nilshomer
              Nils Homer
              • Nov 2008
              • 1283

              #7
              Yes, see the accompanying papers for information.

              Comment

              • Guidobot
                Junior Member
                • Jan 2011
                • 8

                #8
                Originally posted by nilshomer View Post
                Yes, see the accompanying papers for information.
                Thanks. I assume you mean the paper linked to a few posts back? I'm looking at this now.

                I read through the original paper (SHRiMP: Accurate Mapping of Short Color-space Reads), which has been the only one I've looked at so far that is specifically concerned with the notion that the read color space is degenerate (i.e. reads could in theory map to 4 alternative sequences in the reference). However, although the theory/method is presented I was confused how the actual reads in base space are finally output. For example, are corrected read errors marked in some way. Or if base inserts are ever discarded.
                Last edited by Guidobot; 03-31-2011, 09:56 AM. Reason: Never mind

                Comment

                • nilshomer
                  Nils Homer
                  • Nov 2008
                  • 1283

                  #9
                  Read some more . I have published two papers with descriptions and you can also take a look at the BWA (short) paper. Note that the adapter will reduce the four to one.

                  Comment

                  • Guidobot
                    Junior Member
                    • Jan 2011
                    • 8

                    #10
                    Originally posted by nilshomer View Post
                    Read some more . I have published two papers with descriptions and you can also take a look at the BWA (short) paper. Note that the adapter will reduce the four to one.
                    Cheers. I understand how BFAST could use the adapter (base) to define a specific (nt) read sequence, although BWA and MAQ appear to ignore this in translation to base space (and reduce the effective read length by 2 in the process). As a programmer I get curious about some of the implementation details but will continue reading.

                    Edit: I originally read your paper (BFAST: An Alignment Tool for Large Scale Genome Resequencing) and misinterpreted the statement "...each genomic read offset is artificially started with an A base to mimic the process of decoding...", thinking this meant the adapter base (e.g. in the csfasta file) was ignored.

                    Btw, in an experiment I did with the Streptococcus suis genome and SOLiD SE reads I found that BFAST mapped 2.34% more reads than BWA, which includes a correction for the reads BWA mapped to repeated regions. (I used the recommended 10 seeds but because my PC had only 2Gb RAM I used a index word size of 12.)
                    Last edited by Guidobot; 04-01-2011, 07:11 AM. Reason: Added edit note

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Pathogen Surveillance with Advanced Genomic Tools
                      by seqadmin




                      The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                      03-24-2025, 11:48 AM
                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    49 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    57 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    50 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    201 views
                    0 reactions
                    Last Post seqadmin  
                    Working...