Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Seeking advice for Amazon Web Services usage

    I have been searching for some specific sequences in the 1000 Genomes Project data, using samtools view and BreakSeq, until the IT services in my University contacted me, because I was taking too much bandwidth. Then, the 1000G people suggested me to use AWS. It looks like a good solution, but I have some doubts, and I would appreciate if other users of AWS can ease my concerns.

    1. I don't understand the language used in the AWS website ("instances", "API", bla, bla, bla). May I assume that if I start an EC2 instance, I will connect to it through ssh as with any remote machine, and be able to install samtools and what not?

    2. They claim most of the 1000 Genomes Project data is available in a "bucket", and they mention several ways of accessing it that I don't know about. Will I be able to samtools-view the bam files or read fastq files?

    3. Assuming so, how fast the data would be transferred from that bucket to my EC2 instance? Most of the time consumed by the pipeline before was to download. I need to know the speed of data transfer to estimate the cost.

    4. Almost the first thing AWS asks you for is your credit card number. I don't want to give mine, and there's none available for the lab. Do you know of alternative ways to pay? We have a budget, but it's managed by the University, which requires invoices and so on.

    Thank you.

  • #2
    Hello Lluc,
    Perhaps you've resolved your questions by now, but I'll just post my answer anyway, and hope someone adds or corrects me.
    I have had the same problem, and I haven't found a really "for dummies" page. Up to now, what I have found out is:
    AWS is a service where you rent servers offsite. The way they do it is by renting virtual servers, that they call "instances" on "EC2". You have complete control of your instance, so it's like having your own server. You have ssh command line access, as well as a web-based control panel. You can rent several instances at a time, and there is a cluster option to rent several instances that work as a cluster. There are several types of instances which include different RAM, number of processors, number of cores per processor and instance disk storage. You will need external storage, which they call "S3". When you "initiate an instance" you have to load an "image" of a server (RAM and disk) so that you don't have to install everything from zero. These images are called "AMI". Amazon provides several pre made images with different pre installed OS (Debian, RedHat, Windows, etc.) Once you install something new on your instance, you will have to save that image on the S3 storage in order to have it ready when you connect to your instance again. The space used for your instance is grouped in objects called "buckets", and can be accessed at the time of instance creation (or re-creation) or even through the web using keys that you can give to third parties.
    There are several applications, both native and third party, that you can access directly from your instance without installing the whole thing. These are the "APIs". A common API is the storefront, which makes your instance use all of Amazon's web store functions on your own domain and products. There are some APIs for science and sequencing.
    So for your question, the transfer would be between the 1000G's bucket and your instance, without going through your local network. The speed can be anything from 1.5 to 10 Mbps, from what I've read, depending on your luck. Once you configure your instance you can use it as your own server.
    There is no way of avoiding the Credit Card step, I've asked. In theory, you can use a "Free Tier" level for one year, and not have any charges made to your card, but they will not tell you if you went over the limit and they will start charging.
    I don't know what sequences you're querying at 1000G, but perhaps it would be best to download them first and do the queries locally. It would be a one time huge download that could be done overnight with your IT's approval.

    Hope this helps, and I hope someone else that is more knowledgeable jumps in.

    Comment


    • #3
      It is such a really old topic. But while I was searching for AWS EC2 AMI related threads on here, I came up to this post.
      Just in case there are some other newbies like me, you can try to learn more by exploring this page:
      Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file form...

      It is a really good resource that introduces all about AWS, which I found more easily to understand than the tutorials on AWS itself. And you can also explore their lectures on AWS and RNA seq analysis using AWS.
      Thanks!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-27-2024, 06:37 PM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-27-2024, 06:07 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      69 views
      0 likes
      Last Post seqadmin  
      Working...
      X