Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Preinstalled Genomic analysis Tools for Cloud Computing

    One challenge in harnessing Cloud computing is IT related i.e., installation and testing of bioinformatics tools.
    The situation is compounded by the fact that there is no common platform/language/library/API, bioinformatics software developers stick to.
    In hindsight it would time-saving to have all these tools available pre-installed and tested for Cloud computing.

    Objectives:
    • Creation of bootable cloud-volumes (Amazon:EBS & Google: Disk) with Bioinfomatics tools installed
    • Periodic upgrade of tools and Operating System updates in case of new release
    • Easy scalability options for tools that support up-scaling
    • Documentation for Usage, Security, Scaling up and a mailing list



    Remarks:

    The tools are to be tested after installation. The cloud-volumes have tools broadly grouped according to analysis tasks e.g., Prokaryotic assembly, Prokaryotic annotation tools, Assembly improvement tools, RNA seq analysis, Metagenomic pipelines. Some degree of redundany of tools in different volumes is expected. The user raw data can be stored in Amazon S3 bucket or a cloud-volume.

    Current Status:
    I have created bootable Cloud Volume with tools for Prokaryotic assembly (Soap Denovo, a5-2014 pipeline, MaSuRCA, SPAdes). In process of creating volume for Prokaryotic annotation tools and assembly improvement tools. Next would be RNA seq analysis and Metagenomic pipelines.

    Technical Information:
    OS : Ubuntu 12.04 LTS 64 bit


    Questions:
    Is this effort something that users(community) would find useful ?

  • #2
    I strongly recommend people to consider the cost of cloud computing in comparison to a cheap (but high-performance) desktop/server system. Local systems have the benefit of lower latency and much greater storage capacities, as well as being able to know exactly where your data is.

    Comment


    • #3
      Reinventing the wheel?

      Comment


      • #4
        Originally posted by Bukowski View Post
        Reinventing the wheel?

        http://cloudbiolinux.org/
        I am currently reading through the documentation of CloudBioLinux, it should do the job if I can figure out how to regulate the packages getting installed.
        The latest Ubutnu 13.04 based ami is 35 Gb instance size and has a huge number of tools.

        Thank you

        --
        prakhar

        Comment


        • #5
          Originally posted by gringer View Post
          I strongly recommend people to consider the cost of cloud computing in comparison to a cheap (but high-performance) desktop/server system. Local systems have the benefit of lower latency and much greater storage capacities, as well as being able to know exactly where your data is.
          L, Stein 2010 Genome Biology provides convincing arguments for moving to the cloud.
          Secondly in my country wide scale adoption of Computational analysis has been lagging due to high costs involved.
          At ~$100 a month for a prokaryotic analysis on small scale on AWS, that works out perfect for us.


          cheers,
          --
          prakhar

          Comment


          • #6
            L, Stein 2010 Genome Biology provides convincing arguments for moving to the cloud.
            Okay, let me cherry-pick from that article:

            Transferring a 100 gigabyte next-generation sequencing data file across such a link will take about a week in the best case. A 10 gigabit/second connection (1.25 gigabytes/second), which is typical for major universities and some of the larger research institutions, reduces the transfer time to under a day, but only at the cost of hogging much of the institution's bandwidth. Clearly cloud services will not be used for production sequencing any time soon. If cloud computing is to work for genomics, the service providers will have to offer some flexibility in how large datasets get into the system.
            Additionally, the paper was written in 2010. 4 Years have passed since then, during which time Intel has pushed out quite a few power-efficient processors with large capabilities for parallel processing. Moore's law has continued in computers, but sequencing volumes haven't changed so much in terms of total data sizes (admittedly driven by customers that are content with the produced volumes), allowing the computers to catch up. I don't think the paper is providing ultimate arguments for cloud computing, just that there are some cases where it can be more cost-effective.

            At ~$100 a month for a prokaryotic analysis on small scale on AWS, that works out perfect for us.
            Well, it's good that you've looked at the options. As I mentioned previously, a $1500 computer (15 months at $100/month, plus a bit more for power) will probably be capable of doing prokaryotic analysis (including genome assembly), and you get the additional benefit of large cheap storage (3TB for $200), as well as the knowledge of precisely where your data is.

            edit: changed drive cost to a more reasonable value
            Last edited by gringer; 03-21-2014, 01:50 PM.

            Comment


            • #7
              Originally posted by gringer
              Yes, sorry. I was thinking $200, but wrote $400, because I was looking at prices for 4TB at the same time.
              That's ok, and I deleted my comment because I realized I wasn't sure if you were talking about US dollars or something else, and I also thought maybe you're factoring in the extra cost to back up local files on a second drive. So I figured it was just getting too confusing and decided not to post

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              69 views
              0 likes
              Last Post seqadmin  
              Working...
              X