SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: Large-scale sequencing of plant small RNAs. Newsbot! Literature Watch 0 08-07-2011 05:12 AM
Help with plotting ngs (small RNA seq) data Ami Bioinformatics 4 06-07-2011 12:12 AM
Computing power for in-house Next Gen analysis JackieBadger Bioinformatics 24 03-23-2011 08:37 AM
Setting up a shared analysis platform for NGS (advice is welcome) splaisan Core Facilities 1 09-05-2010 11:40 PM
What would be recommended hardware (computing) for a NGS lab? sameet Bioinformatics 10 05-14-2010 12:02 AM

Reply
 
Thread Tools
Old 10-03-2011, 02:37 AM   #1
dalesan
Member
 
Location: portugal

Join Date: Feb 2011
Posts: 14
Question Newbie needing advice on required computing power for small-scale NGS facility

Hello, friendly SEQanswers community .

Recently, we received a sizeable grant to start up a next-gen sequencing facility. Since I am the only bioinformaticist at my institute (but a biologist by training) -- I am having some difficulty estimating the computing infrastructure needed to handle a thus far unknown number of sequencing projects.

However, here are some things that I am fairly certain about:
  1. We will probably buy one HiSeq 1000 machine.
  2. We will probably buy one MiSeq machine.
  3. We may buy non-illumina technology, i.e., Ion Torrent, Roche 454.
  4. I would prefer to use open source software only.

Here's what I do not know:
  1. How often these sequencing machines will be used.
  2. How many users at the institute will be sequencing.
  3. How many users at the institute will want direct access to the server to run downstream jobs.
  4. If users outside of the institute will be allowed access to the servers to run other jobs (e.g., climate studies).

I suppose my problem is manifold. I can not reliable estimate the usage that these sequencing machines will receive, thus I am having trouble coming up with the computing requirements to handle the data.

As my institute has not even placed the order for the sequencing machines (there's still a lot of bickering about which sequencing technologies to choose), should I simply wait and see what we REALLY end up getting before putting together a parts list?

I am thinking about polling the institute and talking with higher-ups to figure out how many groups will be doing sequencing at our currently non-existent facility as this information is probably critical in determining HDD/Memory/CPU requirements.

Nevertheless, what would be an economical and scalable set-up that one might start with, assuming that a single HiSeq machine will be used to capacity every month?

Thank you all a thousand times for any information.
dalesan is offline   Reply With Quote
Old 10-03-2011, 03:24 AM   #2
gprakhar
Member
 
Location: India

Join Date: Aug 2010
Posts: 78
Default

Hello,

Please refer to this thread, a similar topic has been discussed before.
http://seqanswers.com/forums/showthread.php?t=13995

Regards,
--
pg
gprakhar is offline   Reply With Quote
Old 10-03-2011, 03:29 AM   #3
dalesan
Member
 
Location: portugal

Join Date: Feb 2011
Posts: 14
Default

Sweet, thanks. I had a feeling a similar post was already out there -- but my crappy searching didn't reveal it. Thanks a bunch!
dalesan is offline   Reply With Quote
Old 10-03-2011, 03:31 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,365
Default

To me there is a vital question missing: Will your new sequencing service also be expected to offer some analysis services, or just provide the raw data after basic QC?

e.g. mapping RNA seq onto a reference genome (relatively straight forward and can be automated), or de novo assembly (still very hands on and demanding in terms of bioinformatician time as well as computational load).

Will you have access to any existing computational resources, e.g. an institute cluster?

Also what kind of organisms will you be working with? Bacteria and virus genomes being small will require less computational resources.

I'm sure you'll be thinking about this too, but you will need more staff (wet lab expert for library preparation and loading the machines, bioinformaticians, and probably a Linux systems admin). In your shoes I would try to head-hunt someone from an existing sequencing center to run it, and try to do this as soon as possible (to they can deal with many of these choices).

Also, I would suggest you sign up to the bioinfo-core mailing list at http://bioinfo-core.org/ and ask their advice too.
maubp is offline   Reply With Quote
Old 10-03-2011, 03:45 AM   #5
dalesan
Member
 
Location: portugal

Join Date: Feb 2011
Posts: 14
Default

Quote:
Originally Posted by maubp View Post
To me there is a vital question missing: Will your new sequencing service also be expected to offer some analysis services, or just provide the raw data after basic QC?

e.g. mapping RNA seq onto a reference genome (relatively straight forward and can be automated), or de novo assembly (still very hands on and demanding in terms of bioinformatician time as well as computational load).

Will you have access to any existing computational resources, e.g. an institute cluster?

Also what kind of organisms will you be working with? Bacteria and virus genomes being small will require less computational resources.

I'm sure you'll be thinking about this too, but you will need more staff (wet lab expert for library preparation and loading the machines, bioinformaticians, and probably a Linux systems admin). In your shoes I would try to head-hunt someone from an existing sequencing center to run it, and try to do this as soon as possible (to they can deal with many of these choices).

Also, I would suggest you sign up to the bioinfo-core mailing list at http://bioinfo-core.org/ and ask their advice too.
These are excellent points that you bring up. I am still trying to find out the extent of the services required by our in-house researchers. The majority of the sequencing will be done on eukaryotic organisms, this much I know. Regarding the offering of analysis-services, I think basic mapping and assembly is a given. Anything beyond that, I still do not know as this depends a lot on the particular research group/individual, as well as our staffing.

Last I heard, people from the University wanted to use our new (but non-existent) computing resources to run their jobs. I feel like gouging my eyes out with a spoon.

I am working with so little information and it's entirely frustrating. I think I am going to end up posting an institute-wide email with some very pertinent questions to get a handle on things before committing to anything.
dalesan is offline   Reply With Quote
Old 10-03-2011, 04:54 AM   #6
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 206
Default

You really should be in on the discussions of what vendors the PIs plan to look at and the whole setup of the core. The reason I say that is often you can get analytical hardware bundled into a complete system quote for less than buying separately.

When we bought our ABI SOLiD system, we ended up also purchasing a Penguin computing cluster (ABI has partnered with Penguin to provide downstream computing resources for SOLiD customers). The whole deal as quoted to us was a better deal than we could get purchasing a cluster separately. And while the Penguin cluster had BioScope preinstalled, it is just a basic Beowulf cluster at heart, so you are not limited in what else you can do with it.

It just has been my experience that the optimal way to do this, when starting from scratch, is to consider the whole core system as one integrated purchase - sequencers and associated lab equipment, along with computational and storage needs all discussed together (and keep in mind you may need to visit things like network issues for data transfer, environmentally controlled server/cluster space, power requirements for the hardware, backup and archival storage systems , and so on).

Storage is not a trivial issue and needs to be discussed. How many jobs per week or month do you anticipate? If the core is performing primary and secondary analyses, will the PIs also insist on access to raw data? Who is responsible for final data&results storage and archiving? Will data be archived permanently? If not, for how long (and hence, how much storage do you need). Do you have the network bandwidth to handle the data, or will you also need to upgrade there as well? If the core is not storing data permanently, how is final data to be delivered to the PI?

There are a whole host of data and analysis issues involved in setting up a core, and they need to be considered upfront, and budgeted appropriately. Far too often, academic cores are set up by PIs who think solely of the data generation aspects. Then, when there is no money left for the bioinformatics resources, the lab resources end up being grossly underused (and thus never recoup their costs) because the downstream support was never anticipated and allowed for. I've been there, seen that (and know of several academic NGS "cores" that have sat largely idle, as their own institution's PIs have farmed their NGS work out, since their in-house cores cannot provide any support for post-sequencing data or analysis).

You need to make it clear to the PIs that you are not talking about a few off-the-shelf desktop computers here and a couple of cheap disk drives. They need to think about data analysis and storage issues right up front, and factor that fully into their initial and long term plans for a core, including some available bioinformatics expertise to at least guide them in both analysis and interpretation. Otherwise, you don't really have a core as you cannot offer end-user services.
mbblack is offline   Reply With Quote
Old 10-03-2011, 05:02 AM   #7
ETHANol
Senior Member
 
Location: Greece

Join Date: Feb 2010
Posts: 309
Default

As previously mentioned there is really no way to know how much computing power will be needed without knowing how much the machines will be used what they will be used for. That being said, as a core facility you should have enough computing power to align every run to the human genome. As an institution you will obviously need more.

On actual usage of the machines, my guess is that they will not get much use. It is extremely expensive to run a HiSeq (and even more on a per read or per MB rate with the MiSeq). I don't know the cost of disposables per run but it is high. I doubt your average lab at your institution in Portugal has the funding to have many samples run. Certainly it will be hard to attract any external users. Your operational costs for disposables will be higher then what heavy users are paying. It is much easier to get funding to 'bring cutting edge genomics' to your institution by buying expensive equipment then it is to get the funding to run them. There is also the issue of human capital. How many people at your institution have any experience with next-generation sequencing? How many are working on projects that use next generation sequencing? From the sounds of it, not a whole lot or you would be getting better advice.

The much more logical way to get genomics going at an institution is come up with projects that use next generation sequencing, prepare libraries and send them out to be sequenced somewhere else. When the institutional demand reaches a point where the machine will be running full time then you can start your own facility. To buy a HiSeq and have it sit idle is a huge waste of money. It will be obsolete in less then 2 years. Especially if that money could have been used to sequence samples and actually do cutting edge genomics at your institution and publish in high impact journals.

Genomics is not the machine. Genomics is the experimental design and the analysis.

I hope that didn't sound too harsh. There was a post by a guy in Bulgaria contemplating getting a GAIIx a few days ago. Probably worth a read if you can find it.

I didn't really answer your question but I think goes to the reason why no one knows the answer.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 10-03-2011, 05:15 AM   #8
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,365
Default

Quote:
Originally Posted by ETHANol View Post
The much more logical way to get genomics going at an institution is come up with projects that use next generation sequencing, prepare libraries and send them out to be sequenced somewhere else. When the institutional demand reaches a point where the machine will be running full time then you can start your own facility.
That's pretty much the approach our Institute has taken thus far. Initially we outsourced the library preparation and sequencing, but are now looking to do the library preparation in house. My guess is once that is up and running, we may look at the new "desktop sequencers", but already it seems the bottleneck is bioinformatics staff rather than data generation. Again, some of the analysis can be outsourced (or done via collaborations).

Perhaps you can get your bosses to invite some existing sequencing center managers over to visit, and go though some of these issues with their first hand advice?
maubp is offline   Reply With Quote
Reply

Tags
computing power, hiseq, new lab, system requirements

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:32 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.