Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST using GPUs

    Dear Sir,

    We are Computer Engineering Students. We have read the BFAST paper
    Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.


    We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
    Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?

  • #2
    Originally posted by nikhil.stephen View Post
    Dear Sir,

    We are Computer Engineering Students. We have read the BFAST paper
    Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.


    We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
    Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?
    The best place to look would be the source code found at http://bfast.sourceforge.net, especially "RGIndex.{c,h}" and "RGBinary.{c,h}".

    Comment


    • #3
      this will be interesting. good luck!

      Comment


      • #4
        difficulty in understanding code

        @nilshomer
        we are 4th year students doing our final year project and we tried understanding the code given in "RGIndex.{c,h}" and "RGBinary.{c,h}". but its almost incomprehensible for us
        Could you email us the algorithm in detail so that we can try coding it on our own? We're confident about coding since we have a fully functional GPU parallelized Smith-Waterman code, written by us

        sorry for the trouble.. thank you for ur time

        Comment


        • #5
          Originally posted by nikhil.stephen View Post
          @nilshomer
          we are 4th year students doing our final year project and we tried understanding the code given in "RGIndex.{c,h}" and "RGBinary.{c,h}". but its almost incomprehensible for us
          Could you email us the algorithm in detail so that we can try coding it on our own? We're confident about coding since we have a fully functional GPU parallelized Smith-Waterman code, written by us

          sorry for the trouble.. thank you for ur time
          This will be beyond my ability to help,

          Nils

          Comment


          • #6
            Originally posted by nikhil.stephen View Post
            Dear Sir,

            We are Computer Engineering Students. We have read the BFAST paper
            Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.


            We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
            Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?
            Hi, Interesting work. I am working in a R&D lab on various High Performance Computing applications. Would like to see if we can collaborate on this effort. Please contact me if you are interested ([email protected]).

            Comment


            • #7
              Why not try to make the core parts vectorized (i.e the Smith Waterman)? The extra money spent on GPUs could be used to buy multi-core processors (not to mention saving rack space). The vectorized implementation would have a greater impact on users than GPUs. I think SHRiMP has vectorized code embedded. This is on my wishlist for BFAST ahead of GPU support.

              My 2 cents.

              Comment


              • #8
                Originally posted by nilshomer View Post
                Why not try to make the core parts vectorized (i.e the Smith Waterman)? The extra money spent on GPUs could be used to buy multi-core processors (not to mention saving rack space). The vectorized implementation would have a greater impact on users than GPUs. I think SHRiMP has vectorized code embedded. This is on my wishlist for BFAST ahead of GPU support.

                My 2 cents.
                Makes sense. I am looking at using OpenCL rather than CUDA, hence still allowing it to take the path you have mentioned.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. β€œThe better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X