Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • thh32
    Member
    • Feb 2014
    • 60

    Experiences using cloud computing?

    So I am currently considering using a cloud computing service as I have 180,000 blast jobs that need doing and to do that on our Uni servers would take a few months. Each job will take ~30 hours and so I was wondering what services others have used and how expensive they are etc. The main one I am looking at is Amazon as I am unaware of any others however their sales team seem to be taking ages to get back to me with pricing. Any advice would be great.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Google: https://cloud.google.com/
    Microsoft: http://azure.microsoft.com/en-us/

    Amazon's EC2 pricing is on the web unless you were looking for some specific discounts for your institution: http://aws.amazon.com/ec2/pricing/

    What DB are you going to blast against? You probably want to use the AMI that NCBI has for Amazon to make things simple: http://blast.ncbi.nlm.nih.gov/Blast....YPE=CloudBlast
    Last edited by GenoMax; 02-12-2015, 06:11 AM.

    Comment

    • westerman
      Rick Westerman
      • Jun 2008
      • 1104

      #3
      Given what you say -- 180,000 jobs at 30 hours each -- I suspect that Amazon will give you big thumbs up. Your Amazon instances run about $0.20/hour so a job is $6.00 and 180,000 jobs will be ... well ... more than I'd like to consider. :-)

      Comment

      • sarvidsson
        Senior Member
        • Jan 2015
        • 137

        #4
        Originally posted by westerman View Post
        Given what you say -- 180,000 jobs at 30 hours each -- I suspect that Amazon will give you big thumbs up. Your Amazon instances run about $0.20/hour so a job is $6.00 and 180,000 jobs will be ... well ... more than I'd like to consider. :-)
        I'd start thinking about alternative ways to perform that analysis... You'd be busy for months just handling the logistics of running these jobs.

        Comment

        • thh32
          Member
          • Feb 2014
          • 60

          #5
          We are currently blasting against Swiss prot and the trembl sections specific for bactieria and archaea. Also yes I was hoping I could disciss prices with them but the free 750 hours you get per month could be quite useful as I hadnt seen that before.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Perhaps you are not running your blast jobs efficiently? Just a thought.

            Swissprot/Trembl restricted to bacteria has got to be smaller than nr. 30 hours seems fairly long unless your input file has hundreds/thousands of sequences.

            Comment

            • sarvidsson
              Senior Member
              • Jan 2015
              • 137

              #7
              I don't know what you are BLASTing, but if it is partly redundant, you may want to remove redundancy before running the jobs...

              Comment

              • thh32
                Member
                • Feb 2014
                • 60

                #8
                Yes this is my issue, just looked at the cost and its going to be better to buy a whole load of new nodes for our Uni HPC instead.

                Comment

                • thh32
                  Member
                  • Feb 2014
                  • 60

                  #9
                  Originally posted by GenoMax View Post
                  Perhaps you are not running your blast jobs efficiently? Just a thought.

                  Swissprot/Trembl restricted to bacteria has got to be smaller than nr. 30 hours seems fairly long unless your input file has hundreds/thousands of sequences.
                  Each of the query files is ~9Mb as the original file of 6.5Gb was split into 1000 smaller pieces to speed up the process however the bacteria subset of Trembl is 30Gb which seems to be the issue but even when split into 1Gb subsets it still haves 30 hours. How do you increase the efficiency of your blast jobs?

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #10
                    Can you elaborate what exactly you are trying to do with the blasting? You are using multiple threads for the blast?

                    Comment

                    • westerman
                      Rick Westerman
                      • Jun 2008
                      • 1104

                      #11
                      One problem of running multiple Blast jobs on a cluster is reading in the Blast database into each cluster node. I find that if I run Blast on too many nodes, even with a screaming fast file server, my I/O wait time goes sky high.

                      Another possible solution is to use the program called 'Diamond' which is a blastx replacement.

                      Comment

                      • thh32
                        Member
                        • Feb 2014
                        • 60

                        #12
                        Originally posted by GenoMax View Post
                        Can you elaborate what exactly you are trying to do with the blasting? You are using multiple threads for the blast?
                        We are trying to provide functional annotation to an assembly we have recently created. We are using 1 core per blast job as to allow as many as possible to get onto the server at once.

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #13
                          Originally posted by thh32 View Post
                          We are trying to provide functional annotation to an assembly we have recently created. We are using 1 core per blast job as to allow as many as possible to get onto the server at once.
                          As Rick mentioned above that is probably not good since each of those jobs is trying to read the 30G database simultaneously on the same node.

                          Try using all cores on a physical server for one job with multiple threads (depending on the scheduler you should be able to ask it to run those threads on one physical machine) and see if that speeds things up. Logically it should, though I can't predict the drop in number from 30h per job (since you would still need to chunk through an equivalent number of jobs sequentially).

                          If you have access to a server with enough RAM you could try making a RAMdisk, cache the database there and do without disk access for index access part. Worth a try.

                          Comment

                          • westerman
                            Rick Westerman
                            • Jun 2008
                            • 1104

                            #14
                            Putting everything into memory is a good. Ramdisk or just letting blast run in a large memory space. AWS has some large memory multi-cpu machines -- 60 GB upward -- which would allow for a test of the concept.

                            Comment

                            • mbblack
                              Senior Member
                              • Aug 2009
                              • 245

                              #15
                              Depending on what you are looking for and your stringency requirements, could you switch to BLAT instead? Maybe adopt a tiered approach of a first pass with BLAT to reduce the search space, then BLAST or HMMER (in parallel runs) for the higher stringency search on selected hits.
                              Michael Black, Ph.D.
                              ScitoVation LLC. RTP, N.C.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...