SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BGI EasyGenomics cloud computing Jeremy Bioinformatics 5 02-14-2016 10:15 PM
Online course in Cloud Computing hmv Events / Conferences 0 02-01-2013 12:53 AM
Cloud Computing for Assembly? peromhc Bioinformatics 7 10-11-2012 06:40 AM
The Future of Cloud Computing jenniferwatson Events / Conferences 1 10-11-2012 12:23 AM
Searching for SNPs with cloud computing Ben Langmead Literature Watch 2 10-26-2011 05:05 PM

Reply
 
Thread Tools
Old 03-19-2014, 12:29 AM   #1
gprakhar
Member
 
Location: India

Join Date: Aug 2010
Posts: 78
Default Preinstalled Genomic analysis Tools for Cloud Computing

One challenge in harnessing Cloud computing is IT related i.e., installation and testing of bioinformatics tools.
The situation is compounded by the fact that there is no common platform/language/library/API, bioinformatics software developers stick to.
In hindsight it would time-saving to have all these tools available pre-installed and tested for Cloud computing.

Objectives:
  • Creation of bootable cloud-volumes (Amazon:EBS & Google: Disk) with Bioinfomatics tools installed
  • Periodic upgrade of tools and Operating System updates in case of new release
  • Easy scalability options for tools that support up-scaling
  • Documentation for Usage, Security, Scaling up and a mailing list


Remarks:

The tools are to be tested after installation. The cloud-volumes have tools broadly grouped according to analysis tasks e.g., Prokaryotic assembly, Prokaryotic annotation tools, Assembly improvement tools, RNA seq analysis, Metagenomic pipelines. Some degree of redundany of tools in different volumes is expected. The user raw data can be stored in Amazon S3 bucket or a cloud-volume.

Current Status:
I have created bootable Cloud Volume with tools for Prokaryotic assembly (Soap Denovo, a5-2014 pipeline, MaSuRCA, SPAdes). In process of creating volume for Prokaryotic annotation tools and assembly improvement tools. Next would be RNA seq analysis and Metagenomic pipelines.

Technical Information:
OS : Ubuntu 12.04 LTS 64 bit


Questions:
Is this effort something that users(community) would find useful ?
gprakhar is offline   Reply With Quote
Old 03-19-2014, 04:31 AM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 835
Default

I strongly recommend people to consider the cost of cloud computing in comparison to a cheap (but high-performance) desktop/server system. Local systems have the benefit of lower latency and much greater storage capacities, as well as being able to know exactly where your data is.
gringer is offline   Reply With Quote
Old 03-19-2014, 05:04 PM   #3
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

Reinventing the wheel?

http://cloudbiolinux.org/
Bukowski is offline   Reply With Quote
Old 03-21-2014, 06:10 AM   #4
gprakhar
Member
 
Location: India

Join Date: Aug 2010
Posts: 78
Default

Quote:
Originally Posted by Bukowski View Post
Reinventing the wheel?

http://cloudbiolinux.org/
I am currently reading through the documentation of CloudBioLinux, it should do the job if I can figure out how to regulate the packages getting installed.
The latest Ubutnu 13.04 based ami is 35 Gb instance size and has a huge number of tools.

Thank you

--
prakhar
gprakhar is offline   Reply With Quote
Old 03-21-2014, 06:20 AM   #5
gprakhar
Member
 
Location: India

Join Date: Aug 2010
Posts: 78
Default

Quote:
Originally Posted by gringer View Post
I strongly recommend people to consider the cost of cloud computing in comparison to a cheap (but high-performance) desktop/server system. Local systems have the benefit of lower latency and much greater storage capacities, as well as being able to know exactly where your data is.
L, Stein 2010 Genome Biology provides convincing arguments for moving to the cloud.
Secondly in my country wide scale adoption of Computational analysis has been lagging due to high costs involved.
At ~$100 a month for a prokaryotic analysis on small scale on AWS, that works out perfect for us.


cheers,
--
prakhar
gprakhar is offline   Reply With Quote
Old 03-21-2014, 01:55 PM   #6
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 835
Default

Quote:
L, Stein 2010 Genome Biology provides convincing arguments for moving to the cloud.
Okay, let me cherry-pick from that article:

Quote:
Transferring a 100 gigabyte next-generation sequencing data file across such a link will take about a week in the best case. A 10 gigabit/second connection (1.25 gigabytes/second), which is typical for major universities and some of the larger research institutions, reduces the transfer time to under a day, but only at the cost of hogging much of the institution's bandwidth. Clearly cloud services will not be used for production sequencing any time soon. If cloud computing is to work for genomics, the service providers will have to offer some flexibility in how large datasets get into the system.
Additionally, the paper was written in 2010. 4 Years have passed since then, during which time Intel has pushed out quite a few power-efficient processors with large capabilities for parallel processing. Moore's law has continued in computers, but sequencing volumes haven't changed so much in terms of total data sizes (admittedly driven by customers that are content with the produced volumes), allowing the computers to catch up. I don't think the paper is providing ultimate arguments for cloud computing, just that there are some cases where it can be more cost-effective.

Quote:
At ~$100 a month for a prokaryotic analysis on small scale on AWS, that works out perfect for us.
Well, it's good that you've looked at the options. As I mentioned previously, a $1500 computer (15 months at $100/month, plus a bit more for power) will probably be capable of doing prokaryotic analysis (including genome assembly), and you get the additional benefit of large cheap storage (3TB for $200), as well as the knowledge of precisely where your data is.

edit: changed drive cost to a more reasonable value

Last edited by gringer; 03-21-2014 at 02:50 PM.
gringer is offline   Reply With Quote
Old 03-21-2014, 02:47 PM   #7
biznatch
Senior Member
 
Location: Canada

Join Date: Nov 2010
Posts: 124
Default

Quote:
Originally Posted by gringer View Post
Yes, sorry. I was thinking $200, but wrote $400, because I was looking at prices for 4TB at the same time.
That's ok, and I deleted my comment because I realized I wasn't sure if you were talking about US dollars or something else, and I also thought maybe you're factoring in the extra cost to back up local files on a second drive. So I figured it was just getting too confusing and decided not to post
biznatch is offline   Reply With Quote
Reply

Tags
aws, cloud computing, gce

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:53 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO