Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Idiots guide to setting up Amazon instance for sequence analysis

    Does anyone know of a good "how-to" on setting up an Amazon instance for sequence analysis? IN an ideal world, this would not be targeted at the uber-expert user, more at someone with middling unix skills, and some experience of common tools (e.g. bowtie, MAQ, samtools, GATK).

    Also-can someone point me to a list of the available resources (e.g. reference sequences, Hapmap data, RNA-seq sets) that can be accessed through such an instance. Conversely, what would you recommend should be loaded and stored "locally"?

    I have already investigated DNAnexus, Galaxy and geschicten, each of which has a lot to offer, bu this would be more for smaller jobs that require non-routine use and custom scripting of standard tools.

  • #2
    Did you try Cloudman from Galaxy? It is a web-based manager for cloud resources that provisions and manages all of the components required to run Galaxy and requires no command line manipulations. Instructions and a video are available here usegalaxy.org/cloud.

    Comment


    • #3
      One small addition regarding CloudMan-- though it doesn't require any command line manipulation out of the box to use the included Galaxy instance, you could certainly ssh in and do all of the custom work you wanted, and still get the benefits of being able to scale up/down a cluster to handle the load, use Galaxy for certain tasks, etc.

      Comment


      • #4
        I don't know of a good how-to, but have some notes I should probably distill & put in my blog (and will try to do so this week). My word of encouragement is that if I can do it, so can you; I am notoriously all thumbs at installs and sys admin.

        The default EC2 instance is not a bad place to start, though you will have to install plenty and it is a Red Hat type Linux, which I find more difficult than ubuntu. It is quite bare bones; for example, I needed to install emacs (emacs!). The two key skills you will need is creating, mounting and unmounting volumes and how to use the package manager (yum).

        There are instances configured for bioinformatics, such as bioperl-max.


        As far as installing software, I tend to grab most stand-alone bioinformatics tools from their home site and compile them locally. There are a few which are challenging (Cufflinks requires Boost, which I've sometimes found troublesome; dindel has given me trouble but I need to check my notes as to why) but most compile straight out. Extensions to languages such as R and Perl can generally be done within that language; the
        Code:
        Bio::DB::Sam
        package for reading SAM/BAM in Perl is a notable exception.

        Comment


        • #5
          cloudman makes it easy



          makes it even easier.

          Comment


          • #6
            You might also want to check out CloudBioLinux:



            which is an EC2 AMI that has many commonly used bioinformatics tools and data sets preinstalled. I've used it for small jobs that sound like what you might be talking about.

            Comment


            • #7
              turtles all the way down

              FWIW

              runblast is built on top of cloudman

              cloudman is built on top of cloudbiolinux

              Comment


              • #8
                I have to say RunBlast is pretty slick. I tried it yesterday. The only thing you need to do, really, is find your AWS access keys and RunBlast takes care of everything else.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X