SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat Error: Could not find Bowtie index files /bowtie-0.12.5/indexes/. rebrendi Bioinformatics 11 06-22-2016 09:55 AM
bowtie index problem (bowtie-build and then bowtie-inspect) tgenahmet Bioinformatics 4 09-10-2013 11:51 AM
Bowtie...Error: could not execute Bowtie Brajbio Bioinformatics 6 02-12-2013 01:01 PM

Reply
 
Thread Tools
Old 08-08-2010, 05:37 PM   #1
xinwu
Member
 
Location: Beijing

Join Date: Jul 2010
Posts: 33
Default CloudBurst VS Bowtie

Hi all,
I am very interested in Hadoop applications in NGS. The well known project showed in Hadoop community is CloudBurst. But I saw an evaluation of it from a paper "Searching SNPs with Cloud Computing". It said, "CloudBurst is capable of reporting all alignments for millions of human short reads in minutes, but does not scale well to human resequencing applications involving billions of reads. Whereas CloudBurst aligns about 1 million short reads per minute on a 24-core cluster, a typical human resequencing project generates billions of reads, requiring more than 100 days of cluster time or a much larger cluster"
. Why it claimed CloudBurst is not scalable? Crossbow in this paper adopted bowtie instead of Cloudburst for mapping short reads to the reference genome. I wanna know the reasons. In my opinion, Cloudburst is natively map-reduced, while bowtie does not, why the authors claimed such conclusion? Is there any solid comparison of these two short reads mapping tools? And if I just wanna map short reads to the reference genome, which should I take: Cloudburst or Crossbow without Soapsnp (Only the map step using Bowtie)?Thanks in advanced.

Last edited by xinwu; 08-08-2010 at 05:46 PM.
xinwu is offline   Reply With Quote
Old 08-09-2010, 04:59 AM   #2
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Quote:
Originally Posted by xinwu View Post
Hi all,
I am very interested in Hadoop applications in NGS. The well known project showed in Hadoop community is CloudBurst. But I saw an evaluation of it from a paper "Searching SNPs with Cloud Computing". It said, "CloudBurst is capable of reporting all alignments for millions of human short reads in minutes, but does not scale well to human resequencing applications involving billions of reads. Whereas CloudBurst aligns about 1 million short reads per minute on a 24-core cluster, a typical human resequencing project generates billions of reads, requiring more than 100 days of cluster time or a much larger cluster"
. Why it claimed CloudBurst is not scalable? Crossbow in this paper adopted bowtie instead of Cloudburst for mapping short reads to the reference genome. I wanna know the reasons. In my opinion, Cloudburst is natively map-reduced, while bowtie does not, why the authors claimed such conclusion? Is there any solid comparison of these two short reads mapping tools? And if I just wanna map short reads to the reference genome, which should I take: Cloudburst or Crossbow without Soapsnp (Only the map step using Bowtie)?Thanks in advanced.
Hi Xinwu,

To be clear, both techniques are "scalable", in the sense that they both make good use of additional CPUs when they are added. (Granted: the authors only show experiments using a few dozen up to a few hundred CPU cores.) The problem with CloudBurst is that it's slower than Bowtie on a comparable number of cores. So the authors (I'm one of them, as is Mike Schatz, the author of CloudBurst) are saying that when CloudBurst is scaled to a *dataset* the size of a human resequencing dataset, it takes longer than researchers are willing to wait. I hope that's more clear. Frankly, we should probably have said "but takes a very long time to finish for" instead of "but does not scale well to".

Ben
Ben Langmead is offline   Reply With Quote
Old 08-10-2010, 03:06 AM   #3
xinwu
Member
 
Location: Beijing

Join Date: Jul 2010
Posts: 33
Default

Hi Ben,

Thanks for the clarification. CloudBurst combines hadoop and RMAP, I guess maybe RMAP is the bottleneck of the speed. Is it possible to replace RMAP with Bowtie? I mean a hadoop version of Bowtie to do the large scale short reads mapping.
xinwu is offline   Reply With Quote
Old 08-10-2010, 04:25 AM   #4
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

In general, it is possible to swap different algorithms into cloud pipelines. In practice this takes some effort since programs' input and output formats might need to be changed, and you must consider whether the tool's memory footprint fits on a particular EC2 instance type, etc.

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Old 09-09-2010, 10:00 PM   #5
xinwu
Member
 
Location: Beijing

Join Date: Jul 2010
Posts: 33
Default

Hi Ben,
One more question , Bowtie is based on BWT and Cloudburst is based on seed-extended like RMAP. Is it true that seed-extended is higher sensitive and fewer limitation (say, allow gap and indel, etc) than other algorithms? If the only drawback is time consuming for seed extended method, it will be relatively easy to overcome in order to get more "accurate" or "flexible" result.
xinwu is offline   Reply With Quote
Old 11-06-2013, 02:09 AM   #6
haoyue
Junior Member
 
Location: china

Join Date: Nov 2013
Posts: 1
Default how to read CloudBurst source code

Hi Mike Schatz,
Recently,I am reading cloudBurst source code,but it is too hard to read codes,because the CloudBurst has little source code comments,I wanna know the detail of implementation.would help me please?thanks!
haoyue is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO