SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BGI EasyGenomics cloud computing Jeremy Bioinformatics 5 02-14-2016 10:15 PM
Online course in Cloud Computing hmv Events / Conferences 0 02-01-2013 12:53 AM
Cloud Computing for the Life Sciences mza Events / Conferences 6 10-24-2012 01:32 AM
Cloud Computing for Assembly? peromhc Bioinformatics 7 10-11-2012 06:40 AM
The Future of Cloud Computing jenniferwatson Events / Conferences 1 10-11-2012 12:23 AM

Reply
 
Thread Tools
Old 02-12-2015, 05:43 AM   #1
thh32
Member
 
Location: UK

Join Date: Feb 2014
Posts: 60
Default Experiences using cloud computing?

So I am currently considering using a cloud computing service as I have 180,000 blast jobs that need doing and to do that on our Uni servers would take a few months. Each job will take ~30 hours and so I was wondering what services others have used and how expensive they are etc. The main one I am looking at is Amazon as I am unaware of any others however their sales team seem to be taking ages to get back to me with pricing. Any advice would be great.
thh32 is offline   Reply With Quote
Old 02-12-2015, 06:09 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,014
Default

Google: https://cloud.google.com/
Microsoft: http://azure.microsoft.com/en-us/

Amazon's EC2 pricing is on the web unless you were looking for some specific discounts for your institution: http://aws.amazon.com/ec2/pricing/

What DB are you going to blast against? You probably want to use the AMI that NCBI has for Amazon to make things simple: http://blast.ncbi.nlm.nih.gov/Blast....YPE=CloudBlast

Last edited by GenoMax; 02-12-2015 at 06:11 AM.
GenoMax is offline   Reply With Quote
Old 02-12-2015, 06:30 AM   #3
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Given what you say -- 180,000 jobs at 30 hours each -- I suspect that Amazon will give you big thumbs up. Your Amazon instances run about $0.20/hour so a job is $6.00 and 180,000 jobs will be ... well ... more than I'd like to consider. :-)
westerman is offline   Reply With Quote
Old 02-12-2015, 06:58 AM   #4
sarvidsson
Senior Member
 
Location: Berlin, Germany

Join Date: Jan 2015
Posts: 137
Default

Quote:
Originally Posted by westerman View Post
Given what you say -- 180,000 jobs at 30 hours each -- I suspect that Amazon will give you big thumbs up. Your Amazon instances run about $0.20/hour so a job is $6.00 and 180,000 jobs will be ... well ... more than I'd like to consider. :-)
I'd start thinking about alternative ways to perform that analysis... You'd be busy for months just handling the logistics of running these jobs.
sarvidsson is offline   Reply With Quote
Old 02-12-2015, 07:00 AM   #5
thh32
Member
 
Location: UK

Join Date: Feb 2014
Posts: 60
Default

We are currently blasting against Swiss prot and the trembl sections specific for bactieria and archaea. Also yes I was hoping I could disciss prices with them but the free 750 hours you get per month could be quite useful as I hadnt seen that before.
thh32 is offline   Reply With Quote
Old 02-12-2015, 07:05 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,014
Default

Perhaps you are not running your blast jobs efficiently? Just a thought.

Swissprot/Trembl restricted to bacteria has got to be smaller than nr. 30 hours seems fairly long unless your input file has hundreds/thousands of sequences.
GenoMax is offline   Reply With Quote
Old 02-12-2015, 07:13 AM   #7
sarvidsson
Senior Member
 
Location: Berlin, Germany

Join Date: Jan 2015
Posts: 137
Default

I don't know what you are BLASTing, but if it is partly redundant, you may want to remove redundancy before running the jobs...
sarvidsson is offline   Reply With Quote
Old 02-12-2015, 07:22 AM   #8
thh32
Member
 
Location: UK

Join Date: Feb 2014
Posts: 60
Default

Yes this is my issue, just looked at the cost and its going to be better to buy a whole load of new nodes for our Uni HPC instead.
thh32 is offline   Reply With Quote
Old 02-12-2015, 07:26 AM   #9
thh32
Member
 
Location: UK

Join Date: Feb 2014
Posts: 60
Default

Quote:
Originally Posted by GenoMax View Post
Perhaps you are not running your blast jobs efficiently? Just a thought.

Swissprot/Trembl restricted to bacteria has got to be smaller than nr. 30 hours seems fairly long unless your input file has hundreds/thousands of sequences.
Each of the query files is ~9Mb as the original file of 6.5Gb was split into 1000 smaller pieces to speed up the process however the bacteria subset of Trembl is 30Gb which seems to be the issue but even when split into 1Gb subsets it still haves 30 hours. How do you increase the efficiency of your blast jobs?
thh32 is offline   Reply With Quote
Old 02-12-2015, 07:59 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,014
Default

Can you elaborate what exactly you are trying to do with the blasting? You are using multiple threads for the blast?
GenoMax is offline   Reply With Quote
Old 02-12-2015, 08:37 AM   #11
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

One problem of running multiple Blast jobs on a cluster is reading in the Blast database into each cluster node. I find that if I run Blast on too many nodes, even with a screaming fast file server, my I/O wait time goes sky high.

Another possible solution is to use the program called 'Diamond' which is a blastx replacement.
westerman is offline   Reply With Quote
Old 02-12-2015, 09:01 AM   #12
thh32
Member
 
Location: UK

Join Date: Feb 2014
Posts: 60
Default

Quote:
Originally Posted by GenoMax View Post
Can you elaborate what exactly you are trying to do with the blasting? You are using multiple threads for the blast?
We are trying to provide functional annotation to an assembly we have recently created. We are using 1 core per blast job as to allow as many as possible to get onto the server at once.
thh32 is offline   Reply With Quote
Old 02-12-2015, 09:39 AM   #13
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,014
Default

Quote:
Originally Posted by thh32 View Post
We are trying to provide functional annotation to an assembly we have recently created. We are using 1 core per blast job as to allow as many as possible to get onto the server at once.
As Rick mentioned above that is probably not good since each of those jobs is trying to read the 30G database simultaneously on the same node.

Try using all cores on a physical server for one job with multiple threads (depending on the scheduler you should be able to ask it to run those threads on one physical machine) and see if that speeds things up. Logically it should, though I can't predict the drop in number from 30h per job (since you would still need to chunk through an equivalent number of jobs sequentially).

If you have access to a server with enough RAM you could try making a RAMdisk, cache the database there and do without disk access for index access part. Worth a try.
GenoMax is offline   Reply With Quote
Old 02-12-2015, 10:04 AM   #14
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Putting everything into memory is a good. Ramdisk or just letting blast run in a large memory space. AWS has some large memory multi-cpu machines -- 60 GB upward -- which would allow for a test of the concept.
westerman is offline   Reply With Quote
Old 02-12-2015, 11:04 AM   #15
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245
Default

Depending on what you are looking for and your stringency requirements, could you switch to BLAT instead? Maybe adopt a tiered approach of a first pass with BLAT to reduce the search space, then BLAST or HMMER (in parallel runs) for the higher stringency search on selected hits.
__________________
Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.
mbblack is offline   Reply With Quote
Old 02-17-2015, 09:06 PM   #16
FastAnnot
Junior Member
 
Location: USA

Join Date: Dec 2014
Posts: 4
Default

You aren't going to be able to do this with blastx. Try the new Diamond aligner, it is a seriously amazing program.
FastAnnot is offline   Reply With Quote
Reply

Tags
amazon web services, blast, cloud computing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:06 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO