Seqanswers Leaderboard Ad

**nilshomer** · 02-11-2011, 01:52 AM

Originally posted by nikhil.stephen View Post

Dear Sir,

We are Computer Engineering Students. We have read the BFAST paper

BFAST: An Alignment Tool for Large Scale Genome Resequencing

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007767

Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.

We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?

The best place to look would be the source code found at http://bfast.sourceforge.net, especially "RGIndex.{c,h}" and "RGBinary.{c,h}".

**csoong** · 02-11-2011, 03:41 AM

this will be interesting. good luck!

**nikhil.stephen** · 02-14-2011, 10:15 PM

difficulty in understanding code

@nilshomer
we are 4th year students doing our final year project and we tried understanding the code given in "RGIndex.{c,h}" and "RGBinary.{c,h}". but its almost incomprehensible for us

Could you email us the algorithm in detail so that we can try coding it on our own? We're confident about coding since we have a fully functional GPU parallelized Smith-Waterman code, written by us

sorry for the trouble.. thank you for ur time

**nilshomer** · 02-14-2011, 11:04 PM

Originally posted by nikhil.stephen View Post

@nilshomer
we are 4th year students doing our final year project and we tried understanding the code given in "RGIndex.{c,h}" and "RGBinary.{c,h}". but its almost incomprehensible for us

Could you email us the algorithm in detail so that we can try coding it on our own? We're confident about coding since we have a fully functional GPU parallelized Smith-Waterman code, written by us

sorry for the trouble.. thank you for ur time

This will be beyond my ability to help,

Nils

**god_particle** · 02-20-2011, 09:17 PM

Originally posted by nikhil.stephen View Post

Dear Sir,

We are Computer Engineering Students. We have read the BFAST paper

BFAST: An Alignment Tool for Large Scale Genome Resequencing

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007767

Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.

We are trying to implement it using GPUs.(CUDA) But we are facing some trouble with the indexing bit.
Could you explain or provide sources for more information on how BFAST creates a reference genome and how indexing is done?

Hi, Interesting work. I am working in a R&D lab on various High Performance Computing applications. Would like to see if we can collaborate on this effort. Please contact me if you are interested ([email protected]).

**nilshomer** · 02-20-2011, 10:31 PM

Why not try to make the core parts vectorized (i.e the Smith Waterman)? The extra money spent on GPUs could be used to buy multi-core processors (not to mention saving rack space). The vectorized implementation would have a greater impact on users than GPUs. I think SHRiMP has vectorized code embedded. This is on my wishlist for BFAST ahead of GPU support.

My 2 cents.

**god_particle** · 02-20-2011, 11:14 PM

Originally posted by nilshomer View Post

Why not try to make the core parts vectorized (i.e the Smith Waterman)? The extra money spent on GPUs could be used to buy multi-core processors (not to mention saving rack space). The vectorized implementation would have a greater impact on users than GPUs. I think SHRiMP has vectorized code embedded. This is on my wishlist for BFAST ahead of GPU support.

My 2 cents.

Makes sense. I am looking at using OpenCL rather than CUDA, hence still allowing it to take the path you have mentioned.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

BFAST using GPUs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News