Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Computing power for in-house Next Gen analysis

    What with costs of outsourcing the standard bioinfomatics needed for next gen data, quickly being able to reach dizzing heights (I was just quoted 1000 Euros for clustering and contig assembly for one cDNA library), I wonder if any advice could be given on a decent powerful set up for in-lab use?

    The trade-off between computing time and accumulating outsourcing costs is important, yet would not be to upset if it took one week for contig and clustering of a cDNA library on our own machine.

    Does anyone have a powerful setup of their own, which is not an expensive cluster system, able to execute the basic necessities e.g. alignments, BLASTs, SNP searches?

    Thanks

    Jack

    Biology
    Dalhousie University
    Canada

  • #2
    I am also very interested in learning about this. If no contributions lets start an effort to come up with our own answer.

    Comment


    • #3
      You prob can buy a 24 GB ram desktop for 2.5 k USD and install linux on it and work on that.
      if you are tech savvy you can create a rocks cluster by stringing up a couple of these desktops for a beowulf type cluster.
      http://kevin-gattaca.blogspot.com/

      Comment


      • #4
        I looked into this about two months ago. Minimum recommendations I was given were 64bit proc, at least 8 cores (so 2x quad), at least 16gb ram, at least 1tb hdd. I put one together with only 8gb ram to start out with and the total cost was about $1.7k USD.

        Comment


        • #5
          If you're only talking about analysing a relatively small number of experiments for a single lab then I don't think you need anything too fancy. The main requirement is for lots of memory (16GB is a pretty cost effective option these days). This will in turn require a 64 bit OS to make use of it. Most tools work most easily under Linux so that would probably be the way to go.

          In terms of CPU you don't actually get much advantage from having lots of cores. Very few mapping or assembly tools are multi-threaded so they're only going to occupy a single core. Memory constraints will probably prevent you from running multiple jobs in parallel.

          The other thing I'd look into is ensuring you can back up whatever data you create. It's surprising how quickly your storage will fill up. Having a large, fast local disk is a must, and then mirror this to an external storage system to ensure you don't lose anything if you lose data from the main system.

          Comment


          • #6
            In house analysis

            We bought a 24 thread (Dual Xeon 6 Core with HT) 32GB RAM server for mapping recently (6,000 EUR). I've been able to look at Solexa data without much trouble. One important thing was that I put in a 8 Sata hdd Raid 5. This makes a huge difference because I can get up to 600Mb/s read speed.

            Now I only use the cluster when I need to verify alignments with high sensitivity blat.

            Comment


            • #7
              If you don't have node-locked licenses, another possibility is to "rent" time on machines with a lot of RAM through Amazon Web Services.

              Comment


              • #8
                Costing is a deeply slippery subject in computing in general - I've done it often in my 30 years in the biz and you really can make the numbers come out the way you want to. the infamous jargon phrase - total cost of ownership ( TCO ) is infamous for a reason - check out the wiki on where that comes from.

                For this particular industry here is a pertinent blog entry I came across last month:

                This website is for sale! politigenomics.com is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, politigenomics.com has it all. We hope you find what you are searching for!


                the money quote ( hahahahahahaha ) there is:

                Using the entire cost of the Dell workstation (even though you require less than 25% of its computational capacity), the break even point is about 14 genomes. It would take about 1.5 years (about half the expected life of IT hardware) at current throughput to sequence 14 genomes with a single Illumina GA IIx. At data rates expected in January 2010, it would take less than a year to break even.

                You must of course consider the source - I don't mean to be mean to the author David Dooling - reads like a really decent human being - but he is the LIMS and IS man at the St. Louis School of Med.

                My experience - I'm learning here so making many false steps in data downloads/redownloads, running and rerunning and that immediately points to doing it in-house for me When I actually got my act together, doing the IT bit for SNP generation for a 100Mb mouse chromosome on a 4 node 10 core cluster took 3 hours, but about 1 hour index prep time and 5 hours of effective 3Mbps bandwidth time for the data download and about 30 Gb of storage. Oh and the temperatures on 4 PCs went up to an average of 64C. ( I should be able to work out something from that - next run I'll put that meter called Kill-A-Watt on it to see what power they actually draw).

                Unquestionably while I'm learning, in-house is the way to go. Once I'm productionized - will this field ever become productionized ? - EC2/Amazon/Azure ( not Google's cloud computing concept - its quite different, chops the task off if any instance takes more than 30 secs of elapsed time ) will be very strong candidates again.

                I am very pro-Amazon style cloud computing - computing power as an utility feels like the way to go, but the fact that they threw off WIKILEAKS from their cloud deeply worried me !

                I was reasonably pro out-sourcing software dev - on a project by project basis - and to the right set up to India..

                For this stuff - for me, not yet.

                Comment


                • #9
                  On a computer with 16GB of RAM and 4 fast processors it takes 2 days to assemble a single HiSeq lane of data to the reference human genome.

                  Comment


                  • #10
                    Originally posted by JackieBadger View Post
                    What with costs of outsourcing the standard bioinfomatics needed for next gen data, quickly being able to reach dizzing heights (I was just quoted 1000 Euros for clustering and contig assembly for one cDNA library), I wonder if any advice could be given on a decent powerful set up for in-lab use?

                    The trade-off between computing time and accumulating outsourcing costs is important, yet would not be to upset if it took one week for contig and clustering of a cDNA library on our own machine.

                    Does anyone have a powerful setup of their own, which is not an expensive cluster system, able to execute the basic necessities e.g. alignments, BLASTs, SNP searches?

                    Thanks

                    Jack

                    Biology
                    Dalhousie University
                    Canada

                    My 2 cents:
                    if it's de novo transcriptomics you are outsourcing, I can understand the pricing the difficulties. If i can, I would outsource it myself. There's so many ways you can tweak it to get the most out of your data.

                    to me machine price is 'cheap' as it wont be for the sole purpose of said project. It can always double for something else.
                    development time and manpower costs are always the killer in the cost equation.
                    http://kevin-gattaca.blogspot.com/

                    Comment


                    • #11
                      Originally posted by NextGenSeq View Post
                      On a computer with 16GB of RAM and 4 fast processors it takes 2 days to assemble a single HiSeq lane of data to the reference human genome.
                      I assume you mean align not assemble? And if you use one of the BWT based aligners (bowtie/bwa/soap2) you could probably do that in 2 hours not 2 days

                      Comment


                      • #12
                        with already 100M reads of 2*100bp from a single hiseq lane, in house resources seem unlikely to catch up the amount of data increase. i vote for cloud solutions before any revolutionary computation architectures become available (like a quantum computer?).

                        Comment


                        • #13
                          Originally posted by csoong View Post
                          with already 100M reads of 2*100bp from a single hiseq lane, in house resources seem unlikely to catch up the amount of data increase. i vote for cloud solutions before any revolutionary computation architectures become available (like a quantum computer?).
                          Which does not come in exactly cheap either. The biggest instance Amazon currently has to offer is ~68 GiB and will hurt you at USD 2.28 per hour. An own computer with that much memory can be had for ~10k EUR or perhaps less.

                          Especially in the "learning" phase where you try out different programs, analyses etc.pp, the cloud prices quickly sum up. Not to speak about data transfer to and from the cloud.

                          Once you have a SOP (Standard Operation Procedure) for a certain kind of data, then it might be OK.

                          B.

                          Comment


                          • #14
                            One other consideration - if you can build your own computer(s), then the h/w cost for a top end set of clustered PCs is quite a lot less - if you can overclock, unlock not-so-duff cores you get even more bang for the buck - if you watch for the deals that come along at any of the various online sites, there's yet another advantage. I added a 6 core, AMD Phenom II, 16 Gb, Mboard, 2 Tb disk and cannibalizing existing bits for the rest - for USD 420.

                            Comment


                            • #15
                              the high through put sequencing technologies might follow most other then-high-throughput technologies such as microarrays and that biology labs eventually would not have interest dealing with the nitty gritty stats and computer issues.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X