Seqanswers Leaderboard Ad

**lh3** · 10-16-2009, 10:35 AM

This sounds great!

**nilshomer** · 10-16-2009, 11:02 AM

Originally posted by Ben Langmead View Post

Hi all,

If you work with large genomes and large sets of short reads, please take a look at Crossbow (http://bowtie-bio.sf.net/crossbow), an open source pipeline leveraging cloud computing for whole genome SNP discovery from short reads. Crossbow combines Bowtie and SoapSNP, under the umbrella of Hadoop. Hadoop handles all data movement and large distributed sorts (e.g. between alignment and SNP calling), and provides storage redundancy and fault tolerance. In experiments, we observe that Crossbow aligns Illumina reads and calls accurate SNPs (99% concordance with a BeadChip assay) from over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $100 using a 40-node, 320-core Hadoop cluster rented from Amazon's EC2 utility computing service.

Crossbow is distributed with driver scripts both for running either on a local cluster or on a cluster rented through Amazon EC2. Crossbow also includes scripts that automatically preprocess and copy large datasets into Amazon S3. Both EC2 and S3 are accessible to anyone with an AWS account (and a credit card), giving the user full control over computers and storage rented over the Internet on a pay-as-you-go basis.

As of this posting, Crossbow is preliminary software (witness: the version number starts with a 0), though we are actively maintaining and extending it.

If you're looking for how to get started, first read through the "Checklist for Preparing to Run on Amazon Web Services" in the MANUAL file, then read through the TUTORIAL (which currently just points to the C. elegans example).

Crossbow is written by myself (Ben Langmead, Johns Hopkins University) and Michael C. Schatz at University of Maryland.

Thanks!
Ben and Mike

Awesome! Any plans to include indel calling? Also, does this run on a hadoop enabled cluster or just for the Amazon cloud?

**Ben Langmead** · 10-16-2009, 12:31 PM

Originally posted by nilshomer View Post

Awesome! Any plans to include indel calling? Also, does this run on a hadoop enabled cluster or just for the Amazon cloud?

Hey Nils,

Yes, we'd love to include indel calling; we'd love to include anything else that fits! And I think there are a lot of other things (indels, SV detection) that could fit.

And yes, Crossbow can be run on a non-EC2 cluster as long as (a) you have working Bowtie and SOAPsnp binaries for the cluster machines' OS, (b) Hadoop is installed, and (c) you don't mind tweaking some settings in the driver script. For our experiments, we generally tried a small version of the experiment on our local Hadoop cluster first, then, once we confirmed a sane result, we ran the full experiment on EC2 where we could grab up hundreds of cores. The script we used to run on the local cluster is included in the download (local/crossbow.pl). Note that that script will need some tweaking before it works on your cluster, since, unlike with EC2, we don't know what your filesystem and settings will look like ahead of time.

Ben

**nilshomer** · 10-18-2009, 05:33 PM

The bad: How do you perform indel calling without modelling indels during alignment? Without proper identification of such variants (among others) whole-genome resequencing is not performed. Also, since only bowtie is currently supported, platforms like ABI SOLiD are not supported.

The good:
The authors of Crossbow have done an amazing job giving a proof-of-concept of running well-known tools (bowtie and SoapSNP) on the cloud. A potential next step would be to generalize crossbow to support any aligner, variant callers, or other analysis tool. Given this type of general framework, crossbow would solve the practical computational problem of human whole-genome re-sequencing using the tools that the user deems most suitable/powerful. The onus would not be on the crossbow authors to write this support, but to enable any author of such a tool to contribute to crossbow by writing support themselves.

How hard would it be to get other aligners, variant callers, or other tools to work in crossbow? Do they have to model the workflow of bowtie/SoapSNP?

I envision having a workflow where you align the reads, then run many variant callers (SNP/indel, reassembly, structural variants, others...), then other analysis (assessing the potential for the SNPs to cause protein coding changes etc.), and many more processes that both branch and merge.

Thanks for your contribution and I look forward to watching, and potentially contributing myself, to the evolution of crossbow.

**Ben Langmead** · 11-20-2009, 06:14 AM

Hi all,

The Crossbow paper, Searching for SNPs with cloud computing came out in provisional form today. Take a look if you're interested.

Thanks,
Ben

**lilithdog** · 12-09-2009, 08:05 PM

bowtie is extremely impressive! However i cannot get it to load to my terminal. I am using $ cd bowtie and $ cd ./bowtie, but the commands are unrecognized. I'm in Ubuntu, but i know this works fine on a Mac. Any suggestions for such a dumb question?

**mmanrique** · 02-04-2010, 07:26 AM

I'm looking forward trying Crossbow!

It's satisfying finding other guys using and promoting cloud computing for bioinformatics analyses

Congratulations!

**jiwu2573** · 02-04-2010, 05:04 PM

Does Crossbow only analyze data from whole genome DNA sequencing?

Applicable to mRNA sequencing?

**jlmlj** · 02-04-2010, 08:34 PM

Cong! Sounds very cool!

**Xi Wang** · 02-04-2010, 11:12 PM

Originally posted by lilithdog View Post

bowtie is extremely impressive! However i cannot get it to load to my terminal. I am using $ cd bowtie and $ cd ./bowtie, but the commands are unrecognized. I'm in Ubuntu, but i know this works fine on a Mac. Any suggestions for such a dumb question?

Do you mean you want to run bowtie?
First, you should cd the dir of bowtie executable file.

Code:

cd /PATH/TO/BOWTIE/DIR/

then, execute bowtie

Code:

./bowtie

**VIX_Z** · 04-07-2010, 09:02 PM

crossbow on local cluster

Hi,
Did anybody tried crossbow on their local cluster?
I want to try the same...

. Any insight and experience will be appreciated...

Thanks
~Vix

**VIX_Z** · 04-13-2010, 10:30 PM

Sample dataset for crossbow on local cluster

Originally posted by Ben Langmead View Post

And yes, Crossbow can be run on a non-EC2 cluster as long as (a) you have working Bowtie and SOAPsnp binaries for the cluster machines' OS, (b) Hadoop is installed, and (c) you don't mind tweaking some settings in the driver script. For our experiments, we generally tried a small version of the experiment on our local Hadoop cluster first, then, once we confirmed a sane result, we ran the full experiment on EC2 where we could grab up hundreds of cores. The script we used to run on the local cluster is included in the download (local/crossbow.pl). Note that that script will need some tweaking before it works on your cluster, since, unlike with EC2, we don't know what your filesystem and settings will look like ahead of time.

Ben

Hi Ben,
I want to try crossbow on my local hadoop enabled cluster. Can you share the data you tried for the "small version of the experiment on local Hadoop cluster". I am ending up with various errors while using other reads data.

With thanks,
Vix

**Dan326** · 07-20-2010, 04:55 PM

Does Crossbow Produce Standard Bowtie Results?

Hey all,
Crossbow looks like a fantastic program. At this point I am just looking for a way to run Bowtie in parallel on an EC2 cluster. Does Crossbow have an option for just running Bowtie? If not then does Crossbow produce the Bowtie outputs that I can access? Any insight or suggestions on other software that may accomplish this would be great.
Dan

**xinwu** · 07-21-2010, 12:43 AM

Hi all,
My question is same as Dan326. CloudBurst used rmap algorithm with hadoop, so Dan's question can be summarized as How can I run CloudBurst using bowtie algorithm rather than rmap. As paper of bowtie indicates, bowtie is much faster than other mapping tools, so, if it combines with hadoop, you will get the quickest solution so far. Correct me if I am wrong. I am also try to find this kind of short reads mapping solution.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 49 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Crossbow: Genotyping from short reads using cloud computing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News