Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Seeking Reliable Bioinformatics Services in EU/US? puggie Service Providers 15 08-30-2013 05:03 AM
Seeking advice on PathSeq dbrazel Bioinformatics 56 03-04-2013 11:28 AM
This weekend: Amazon Web Services Training for Science & Engineering in La Jolla Pavolga Events / Conferences 0 09-15-2011 05:00 AM
Amazon Web Services Training for Science & Engineering - more dates/locations Pavolga Events / Conferences 0 10-05-2010 12:04 PM
Fundamentals: Amazon Web Services for Science & Engineering Pavolga Events / Conferences 0 05-05-2010 08:17 AM

Thread Tools
Old 11-22-2012, 05:39 AM   #1
Location: Barcelona

Join Date: Aug 2010
Posts: 12
Default Seeking advice for Amazon Web Services usage

I have been searching for some specific sequences in the 1000 Genomes Project data, using samtools view and BreakSeq, until the IT services in my University contacted me, because I was taking too much bandwidth. Then, the 1000G people suggested me to use AWS. It looks like a good solution, but I have some doubts, and I would appreciate if other users of AWS can ease my concerns.

1. I don't understand the language used in the AWS website ("instances", "API", bla, bla, bla). May I assume that if I start an EC2 instance, I will connect to it through ssh as with any remote machine, and be able to install samtools and what not?

2. They claim most of the 1000 Genomes Project data is available in a "bucket", and they mention several ways of accessing it that I don't know about. Will I be able to samtools-view the bam files or read fastq files?

3. Assuming so, how fast the data would be transferred from that bucket to my EC2 instance? Most of the time consumed by the pipeline before was to download. I need to know the speed of data transfer to estimate the cost.

4. Almost the first thing AWS asks you for is your credit card number. I don't want to give mine, and there's none available for the lab. Do you know of alternative ways to pay? We have a budget, but it's managed by the University, which requires invoices and so on.

Thank you.
Lluc is offline   Reply With Quote
Old 01-17-2013, 04:11 PM   #2
Junior Member
Location: Mexico City

Join Date: Jan 2012
Posts: 8

Hello Lluc,
Perhaps you've resolved your questions by now, but I'll just post my answer anyway, and hope someone adds or corrects me.
I have had the same problem, and I haven't found a really "for dummies" page. Up to now, what I have found out is:
AWS is a service where you rent servers offsite. The way they do it is by renting virtual servers, that they call "instances" on "EC2". You have complete control of your instance, so it's like having your own server. You have ssh command line access, as well as a web-based control panel. You can rent several instances at a time, and there is a cluster option to rent several instances that work as a cluster. There are several types of instances which include different RAM, number of processors, number of cores per processor and instance disk storage. You will need external storage, which they call "S3". When you "initiate an instance" you have to load an "image" of a server (RAM and disk) so that you don't have to install everything from zero. These images are called "AMI". Amazon provides several pre made images with different pre installed OS (Debian, RedHat, Windows, etc.) Once you install something new on your instance, you will have to save that image on the S3 storage in order to have it ready when you connect to your instance again. The space used for your instance is grouped in objects called "buckets", and can be accessed at the time of instance creation (or re-creation) or even through the web using keys that you can give to third parties.
There are several applications, both native and third party, that you can access directly from your instance without installing the whole thing. These are the "APIs". A common API is the storefront, which makes your instance use all of Amazon's web store functions on your own domain and products. There are some APIs for science and sequencing.
So for your question, the transfer would be between the 1000G's bucket and your instance, without going through your local network. The speed can be anything from 1.5 to 10 Mbps, from what I've read, depending on your luck. Once you configure your instance you can use it as your own server.
There is no way of avoiding the Credit Card step, I've asked. In theory, you can use a "Free Tier" level for one year, and not have any charges made to your card, but they will not tell you if you went over the limit and they will start charging.
I don't know what sequences you're querying at 1000G, but perhaps it would be best to download them first and do the queries locally. It would be a one time huge download that could be done overnight with your IT's approval.

Hope this helps, and I hope someone else that is more knowledgeable jumps in.
keo is offline   Reply With Quote
Old 01-19-2019, 10:59 AM   #3
Junior Member
Location: The Bay Area

Join Date: Jan 2019
Posts: 1

It is such a really old topic. But while I was searching for AWS EC2 AMI related threads on here, I came up to this post.
Just in case there are some other newbies like me, you can try to learn more by exploring this page:
It is a really good resource that introduces all about AWS, which I found more easily to understand than the tutorials on AWS itself. And you can also explore their lectures on AWS and RNA seq analysis using AWS.
nhntran is offline   Reply With Quote

1000 genomes, 1000 genomes project, amazon web services, aws, ec2

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 07:15 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO