Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Newbie needing advice on required computing power for small-scale NGS facility

    Hello, friendly SEQanswers community .

    Recently, we received a sizeable grant to start up a next-gen sequencing facility. Since I am the only bioinformaticist at my institute (but a biologist by training) -- I am having some difficulty estimating the computing infrastructure needed to handle a thus far unknown number of sequencing projects.

    However, here are some things that I am fairly certain about:
    1. We will probably buy one HiSeq 1000 machine.
    2. We will probably buy one MiSeq machine.
    3. We may buy non-illumina technology, i.e., Ion Torrent, Roche 454.
    4. I would prefer to use open source software only.


    Here's what I do not know:
    1. How often these sequencing machines will be used.
    2. How many users at the institute will be sequencing.
    3. How many users at the institute will want direct access to the server to run downstream jobs.
    4. If users outside of the institute will be allowed access to the servers to run other jobs (e.g., climate studies).


    I suppose my problem is manifold. I can not reliable estimate the usage that these sequencing machines will receive, thus I am having trouble coming up with the computing requirements to handle the data.

    As my institute has not even placed the order for the sequencing machines (there's still a lot of bickering about which sequencing technologies to choose), should I simply wait and see what we REALLY end up getting before putting together a parts list?

    I am thinking about polling the institute and talking with higher-ups to figure out how many groups will be doing sequencing at our currently non-existent facility as this information is probably critical in determining HDD/Memory/CPU requirements.

    Nevertheless, what would be an economical and scalable set-up that one might start with, assuming that a single HiSeq machine will be used to capacity every month?

    Thank you all a thousand times for any information.

  • #2
    Hello,

    Please refer to this thread, a similar topic has been discussed before.
    http://seqanswers.com/forums/showthread.php?t=13995

    Regards,
    --
    pg

    Comment


    • #3
      Sweet, thanks. I had a feeling a similar post was already out there -- but my crappy searching didn't reveal it. Thanks a bunch!

      Comment


      • #4
        To me there is a vital question missing: Will your new sequencing service also be expected to offer some analysis services, or just provide the raw data after basic QC?

        e.g. mapping RNA seq onto a reference genome (relatively straight forward and can be automated), or de novo assembly (still very hands on and demanding in terms of bioinformatician time as well as computational load).

        Will you have access to any existing computational resources, e.g. an institute cluster?

        Also what kind of organisms will you be working with? Bacteria and virus genomes being small will require less computational resources.

        I'm sure you'll be thinking about this too, but you will need more staff (wet lab expert for library preparation and loading the machines, bioinformaticians, and probably a Linux systems admin). In your shoes I would try to head-hunt someone from an existing sequencing center to run it, and try to do this as soon as possible (to they can deal with many of these choices).

        Also, I would suggest you sign up to the bioinfo-core mailing list at http://bioinfo-core.org/ and ask their advice too.

        Comment


        • #5
          Originally posted by maubp View Post
          To me there is a vital question missing: Will your new sequencing service also be expected to offer some analysis services, or just provide the raw data after basic QC?

          e.g. mapping RNA seq onto a reference genome (relatively straight forward and can be automated), or de novo assembly (still very hands on and demanding in terms of bioinformatician time as well as computational load).

          Will you have access to any existing computational resources, e.g. an institute cluster?

          Also what kind of organisms will you be working with? Bacteria and virus genomes being small will require less computational resources.

          I'm sure you'll be thinking about this too, but you will need more staff (wet lab expert for library preparation and loading the machines, bioinformaticians, and probably a Linux systems admin). In your shoes I would try to head-hunt someone from an existing sequencing center to run it, and try to do this as soon as possible (to they can deal with many of these choices).

          Also, I would suggest you sign up to the bioinfo-core mailing list at http://bioinfo-core.org/ and ask their advice too.
          These are excellent points that you bring up. I am still trying to find out the extent of the services required by our in-house researchers. The majority of the sequencing will be done on eukaryotic organisms, this much I know. Regarding the offering of analysis-services, I think basic mapping and assembly is a given. Anything beyond that, I still do not know as this depends a lot on the particular research group/individual, as well as our staffing.

          Last I heard, people from the University wanted to use our new (but non-existent) computing resources to run their jobs. I feel like gouging my eyes out with a spoon.

          I am working with so little information and it's entirely frustrating. I think I am going to end up posting an institute-wide email with some very pertinent questions to get a handle on things before committing to anything.

          Comment


          • #6
            You really should be in on the discussions of what vendors the PIs plan to look at and the whole setup of the core. The reason I say that is often you can get analytical hardware bundled into a complete system quote for less than buying separately.

            When we bought our ABI SOLiD system, we ended up also purchasing a Penguin computing cluster (ABI has partnered with Penguin to provide downstream computing resources for SOLiD customers). The whole deal as quoted to us was a better deal than we could get purchasing a cluster separately. And while the Penguin cluster had BioScope preinstalled, it is just a basic Beowulf cluster at heart, so you are not limited in what else you can do with it.

            It just has been my experience that the optimal way to do this, when starting from scratch, is to consider the whole core system as one integrated purchase - sequencers and associated lab equipment, along with computational and storage needs all discussed together (and keep in mind you may need to visit things like network issues for data transfer, environmentally controlled server/cluster space, power requirements for the hardware, backup and archival storage systems , and so on).

            Storage is not a trivial issue and needs to be discussed. How many jobs per week or month do you anticipate? If the core is performing primary and secondary analyses, will the PIs also insist on access to raw data? Who is responsible for final data&results storage and archiving? Will data be archived permanently? If not, for how long (and hence, how much storage do you need). Do you have the network bandwidth to handle the data, or will you also need to upgrade there as well? If the core is not storing data permanently, how is final data to be delivered to the PI?

            There are a whole host of data and analysis issues involved in setting up a core, and they need to be considered upfront, and budgeted appropriately. Far too often, academic cores are set up by PIs who think solely of the data generation aspects. Then, when there is no money left for the bioinformatics resources, the lab resources end up being grossly underused (and thus never recoup their costs) because the downstream support was never anticipated and allowed for. I've been there, seen that (and know of several academic NGS "cores" that have sat largely idle, as their own institution's PIs have farmed their NGS work out, since their in-house cores cannot provide any support for post-sequencing data or analysis).

            You need to make it clear to the PIs that you are not talking about a few off-the-shelf desktop computers here and a couple of cheap disk drives. They need to think about data analysis and storage issues right up front, and factor that fully into their initial and long term plans for a core, including some available bioinformatics expertise to at least guide them in both analysis and interpretation. Otherwise, you don't really have a core as you cannot offer end-user services.
            Michael Black, Ph.D.
            ScitoVation LLC. RTP, N.C.

            Comment


            • #7
              As previously mentioned there is really no way to know how much computing power will be needed without knowing how much the machines will be used what they will be used for. That being said, as a core facility you should have enough computing power to align every run to the human genome. As an institution you will obviously need more.

              On actual usage of the machines, my guess is that they will not get much use. It is extremely expensive to run a HiSeq (and even more on a per read or per MB rate with the MiSeq). I don't know the cost of disposables per run but it is high. I doubt your average lab at your institution in Portugal has the funding to have many samples run. Certainly it will be hard to attract any external users. Your operational costs for disposables will be higher then what heavy users are paying. It is much easier to get funding to 'bring cutting edge genomics' to your institution by buying expensive equipment then it is to get the funding to run them. There is also the issue of human capital. How many people at your institution have any experience with next-generation sequencing? How many are working on projects that use next generation sequencing? From the sounds of it, not a whole lot or you would be getting better advice.

              The much more logical way to get genomics going at an institution is come up with projects that use next generation sequencing, prepare libraries and send them out to be sequenced somewhere else. When the institutional demand reaches a point where the machine will be running full time then you can start your own facility. To buy a HiSeq and have it sit idle is a huge waste of money. It will be obsolete in less then 2 years. Especially if that money could have been used to sequence samples and actually do cutting edge genomics at your institution and publish in high impact journals.

              Genomics is not the machine. Genomics is the experimental design and the analysis.

              I hope that didn't sound too harsh. There was a post by a guy in Bulgaria contemplating getting a GAIIx a few days ago. Probably worth a read if you can find it.

              I didn't really answer your question but I think goes to the reason why no one knows the answer.
              --------------
              Ethan

              Comment


              • #8
                Originally posted by ETHANol View Post
                The much more logical way to get genomics going at an institution is come up with projects that use next generation sequencing, prepare libraries and send them out to be sequenced somewhere else. When the institutional demand reaches a point where the machine will be running full time then you can start your own facility.
                That's pretty much the approach our Institute has taken thus far. Initially we outsourced the library preparation and sequencing, but are now looking to do the library preparation in house. My guess is once that is up and running, we may look at the new "desktop sequencers", but already it seems the bottleneck is bioinformatics staff rather than data generation. Again, some of the analysis can be outsourced (or done via collaborations).

                Perhaps you can get your bosses to invite some existing sequencing center managers over to visit, and go though some of these issues with their first hand advice?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X