Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Building new sequencing system - IT questions

    Hi all,

    I just moved from physics to biology world and everything is all new to me. Recently I joined a lab which is trying to build a new DNA sequencing system, and I will be the person who is in charge with the IT system as well as the analysis system. I will appreciate your valuable advice, experience, suggestions or any input.

    OK, I will start with our limitation first:
    - SPACE: we plan to have the sequencer system in the lab, and we do not have much space, so space is limited. We try to keep minimize the space and the number of computers.
    - MONEY: obvious . We are willing to buy a pricy server, but we do not want to buy junks that is not very helpful in the future.

    So the system we plan to have is Genome Analyzer IIX from Illumina, and coming together with the Analyzer is one PC (to control the Analyzer), a Cluster Station and a PC to control the Cluster, and a Pipepline Analysis server. Now I am considering about the primary analysis computer/server which will handle all the analysis jobs (alignments, assembling etc...) for our own purposes. Questions are:

    1. How powerful should it be? I read some posts here about some computers people using for analysis: some quad-core chips, around 64 - 128 GB of memory?

    2. To save space, can we get a much more powerful Pippline Server to handle the jobs (the pipeline server is around 8GB memory and a quad-core chip, not so attractive to me, and it already costs around 60k)? But it seems to me that Illumina does not offer such high specs.

    3. To save space, I am thinking of using the supercomputer and cluster system that we have here at the institute. Can you share me what you think about having stand alone powerful server, or using the cluster system? Using cluster system is much faster of course and has more memory of course, but transferring data could be the pain in the neck. What are those other advantages and disadvantages of using cluster compared to using stand alone server?

    Sorry for the long message and thank you all in advance,

    D.

  • #2
    When you say the sequencer is coming with a "Pipepline Analysis server", do you mean the IPAR? If so, you will probably also get a 1/4 height computer rack. The rack has enough room that you can add servers there.

    We have our sequencer sitting on a table on wheels (not sure where it was ordered from, but can find out if necessary). When we ordered the table, we specified the height. We got one high enough that the IPAR fits underneath. You could do something like that.

    For an analysis server, we have a dual quad-core Dell "PowerEdge 2950 III" server with 24 GB of memory and 4 TB of local storage. It works well for our needs. The person who purchased it purchased the server with only a small hard drive and a small amount of RAM and then purchased additional RAM and hard drives from somewhere else to save money. The only changes I would make to this system if I were to buy another one are: 1) increase the RAM to 32 GB and 2) increase the hard drive space to 6 TB.

    A sequencing run of 36 bases will generate around 1.5 TB of images and analysis files. Once the analysis is done, you can delete most of the files and have less than 100 GB if you don't have much hard drive space.

    For us, we do the analysis on the Dell server, then move the files to a Sun Fire X4500 server (a "thumper") with 48 TB of storage.

    The Pipeline software that Illumina provides runs on linux and linux is the only platform they support. Right now, there isn't a parallelized version of the software, so the cluster/supercomputer will only be helpful if they let you run the software (like you normally would) and use a lot of processors.

    Transferring the data is a potential bottleneck. We transfer the image files during the run. As soon as the run is finished, I can start the analysis.

    Hope this helps

    Comment


    • #3
      With the release of SCS 2.4 the IPAR is no longer needed. Image analysis and basecalling are done in real time on the instrument control computer. Illumina had also sold a prebuilt system for doing downstream analysis which I think was called the Pipeline Analysis Server. Again, with SCS v2.4 running on the instrument computer, the pipeline is not required. You can still download the Pipeline software (currently version 1.4) and install it on a general purpose Linux box.

      I would agree with picabo's recommendations above for analysis computer. Ours has 12TB of available HD space and I back up the image files immediately after the run to a tape (LTO-3) system. We keep the images for 90 days in case there is a problem or we wish to reanalyze them (which came in handy when the pipeline moved from 1.3 to 1.4).

      Comment


      • #4
        Is the Genome Analyzer IIX default setup going to write over the images as part of the processing to save space?

        Comment


        • #5
          Originally posted by dcfargo View Post
          Is the Genome Analyzer IIX default setup going to write over the images as part of the processing to save space?
          The behavior of the instrument is controlled by a set of configurable Run Parameters. In SCS v2.4 two options related to image handling are "Copy images to network folder" and "Delete images". The default is to enable both of these options. As images are collected they would be simultaneously copied to a network location. This is typically an SMB share on a Linux box mounted on the instrument computer. After the real time analysis software on the instrument computer has finished the analysis of an image it is then deleted from the instrument computer.

          Further, it appears that under SCS 2.4 the instrument performs disk space checking prior to each cycle. If insufficient disk space is available on the instrument computer the recipe will pause and wait for space to be made available (i.e. manually deleting images).

          I don't know how much disk space is provided on currently shipping GA-IIx instruments. Our instrument (originally a GA, upgraded to GA-II) has ~900 GB of available space. A 36 cycle, single end run generates 789 GB of image data. The IIx collects 20% more data than the II (120 tiles per lane vs 100) or ~947 GB. When you add the data generated by the real time analysis which I expect would be > 200 GB there is no way our instrument could complete a run without deleting image files as it goes along.

          Comment


          • #6
            Originally posted by picabo View Post
            When you say the sequencer is coming with a "Pipepline Analysis server", do you mean the IPAR? If so, you will probably also get a 1/4 height computer rack. The rack has enough room that you can add servers there.

            We have our sequencer sitting on a table on wheels (not sure where it was ordered from, but can find out if necessary). When we ordered the table, we specified the height. We got one high enough that the IPAR fits underneath. You could do something like that.

            For an analysis server, we have a dual quad-core Dell "PowerEdge 2950 III" server with 24 GB of memory and 4 TB of local storage. It works well for our needs. The person who purchased it purchased the server with only a small hard drive and a small amount of RAM and then purchased additional RAM and hard drives from somewhere else to save money. The only changes I would make to this system if I were to buy another one are: 1) increase the RAM to 32 GB and 2) increase the hard drive space to 6 TB.

            A sequencing run of 36 bases will generate around 1.5 TB of images and analysis files. Once the analysis is done, you can delete most of the files and have less than 100 GB if you don't have much hard drive space.

            For us, we do the analysis on the Dell server, then move the files to a Sun Fire X4500 server (a "thumper") with 48 TB of storage.

            The Pipeline software that Illumina provides runs on linux and linux is the only platform they support. Right now, there isn't a parallelized version of the software, so the cluster/supercomputer will only be helpful if they let you run the software (like you normally would) and use a lot of processors.

            Transferring the data is a potential bottleneck. We transfer the image files during the run. As soon as the run is finished, I can start the analysis.

            Hope this helps
            Yes, the IPAR. And the specs of the system we will get is four dual-core 3.4GHz with 32GB of memory and a HP modular array of total 9TB of storage. We also have "unlimited storage" on our institutional server (I dont really know how "unlimited" they mean, and I am trying to figure it out), but storing the data there after all the analysis can be one option to minimize the space and number of computers.

            Since I have no experience at all, and I have never tried sequencing analysis before, but reading SEQ forum showing me that some people running quite a powerful server with memory can be around 128GB. So I am really not sure our coming server is capable or not.

            As about the data transferring, we have super computer / cluster with 1Gb connection (I tested it and 10GB file takes about 8 minutes, not very fast though). Will it be a pain in the neck if we want to do analysis on server?

            Thanks,

            D.

            Comment


            • #7
              Originally posted by kmcarr View Post
              The behavior of the instrument is controlled by a set of configurable Run Parameters. In SCS v2.4 two options related to image handling are "Copy images to network folder" and "Delete images". The default is to enable both of these options. As images are collected they would be simultaneously copied to a network location. This is typically an SMB share on a Linux box mounted on the instrument computer. After the real time analysis software on the instrument computer has finished the analysis of an image it is then deleted from the instrument computer.

              Further, it appears that under SCS 2.4 the instrument performs disk space checking prior to each cycle. If insufficient disk space is available on the instrument computer the recipe will pause and wait for space to be made available (i.e. manually deleting images).

              I don't know how much disk space is provided on currently shipping GA-IIx instruments. Our instrument (originally a GA, upgraded to GA-II) has ~900 GB of available space. A 36 cycle, single end run generates 789 GB of image data. The IIx collects 20% more data than the II (120 tiles per lane vs 100) or ~947 GB. When you add the data generated by the real time analysis which I expect would be > 200 GB there is no way our instrument could complete a run without deleting image files as it goes along.
              I am sorry if I am too noob, but what is SCS 2.4 that you are talking about? If SCS2.4 is released, then why they still sell IPRA to us?

              I also have a question about the electrical requirements that we should prepare before the system comes. Here is what they suggest in the Preparation Guide for the GA-IIx and IPRA: APS SmartUPS model SUA3000RM2U. I just wonder that UPS is good enough, or what do you have for your system so that you never have any problem with the sequencer?

              Thanks,

              D.

              Comment


              • #8
                Originally posted by dukevn View Post
                I am sorry if I am too noob, but what is SCS 2.4 that you are talking about? If SCS2.4 is released, then why they still sell IPRA to us?
                No questions are too noobish. I apologize for falling back on abbreviations.

                SCS v2.4 is Illumina's latest software release for the GA-II and GA-IIx. It has been officially released to current GA users. SCS==Sequencing Control Software. The SCS handles tasks of setting up the run, configuring the instrument, managing run "recipes" and data collection. In addition the SCS v2.4 there is a component called RTA 1.4 (RTA==Real Time Analysis). Here is the description of what RTA does from Illumina's publication "Using Genome Analyzer Sequencing Control Software Version 2.4" (Part# 15003831, April 2009)

                SCS Real Time Analysis

                The Genome Analyzer Sequencing Control Software (SCS) v2.4 performs real time image analysis and base calling, and provides fast access to quality metrics. The analysis is performed during the chemistry and imaging cycles of a sequencing run, which saves downstream analysis time and allows you to quickly decide whether or not your run is progressing as expected.

                SCS real time analysis runs automatically on the instrument computer, and is configured through the SCS interface at the beginning of a run.
                This is the function formerly performed by the IPAR unit. (Actually the IPAR never did base calling in real time, you had to do that post run on an another computer running the Pipeline software.) Our Illumina FAS was just at our site and I asked him what I was supposed to do with our IPAR. He just gave a sheepish shrug. He acknowledged that even within Illumina the fact that the software development team was obsoleting the IPAR was kept pretty low key.

                Originally posted by dukevn View Post
                I also have a question about the electrical requirements that we should prepare before the system comes. Here is what they suggest in the Preparation Guide for the GA-IIx and IPRA: APS SmartUPS model SUA3000RM2U. I just wonder that UPS is good enough, or what do you have for your system so that you never have any problem with the sequencer?

                Thanks,

                D.
                The IPAR unit itself shipped with that very model of UPS in its cabinet. Illumina recommended that in addition to supporting the IPAR you plug the GA-II, the instrument control computer, the Cluster Station and its computer into that UPS (assuming the equipment was all close to each other).

                I would clarify with your contacts at Illumina whether the order really does include an IPAR. As I mentioned up thread Illumina also sold a turnkey analysis computer meant to perform downstream analysis, particularly using their software (e.g. Eland and CASAVA).

                Comment


                • #9
                  Originally posted by kmcarr View Post
                  No questions are too noobish. I apologize for falling back on abbreviations.

                  SCS v2.4 is Illumina's latest software release for the GA-II and GA-IIx. It has been officially released to current GA users. SCS==Sequencing Control Software. The SCS handles tasks of setting up the run, configuring the instrument, managing run "recipes" and data collection. In addition the SCS v2.4 there is a component called RTA 1.4 (RTA==Real Time Analysis). Here is the description of what RTA does from Illumina's publication "Using Genome Analyzer Sequencing Control Software Version 2.4" (Part# 15003831, April 2009)

                  This is the function formerly performed by the IPAR unit. (Actually the IPAR never did base calling in real time, you had to do that post run on an another computer running the Pipeline software.) Our Illumina FAS was just at our site and I asked him what I was supposed to do with our IPAR. He just gave a sheepish shrug. He acknowledged that even within Illumina the fact that the software development team was obsoleting the IPAR was kept pretty low key.

                  The IPAR unit itself shipped with that very model of UPS in its cabinet. Illumina recommended that in addition to supporting the IPAR you plug the GA-II, the instrument control computer, the Cluster Station and its computer into that UPS (assuming the equipment was all close to each other).

                  I would clarify with your contacts at Illumina whether the order really does include an IPAR. As I mentioned up thread Illumina also sold a turnkey analysis computer meant to perform downstream analysis, particularly using their software (e.g. Eland and CASAVA).
                  You got it right kmcarr. I was totally confused and mixed. What I read before was a Sequencing Site Preparation Guide (Rev A July 2008) and it seems that it said about the old version of GA and IPRA. I checked the documents again, and we do have another version (Rev E April 2009) which describes GA-IIx and Pipeline Server (no IPRA). I also checked the order and saw that the coming system includes: GA-IIx, Cluster Station, PE Module and GA Piple Analysis Server. Sorry for that confusion.

                  So in the Sequencing Site Guide (Rev. # April 2009), it is recommended that we have the APS Smart-UPS Model SUA3000 for the GA-IIx and the APS Smart-UPS Model SUA3000RM2U. The saleman suggested us purchase the APC Smart-UPS SUA2200 for the GA-IIx to save money. Yes, we want to save money, but obviously we do not want to buy something that runs few minutes and stop or be broken.

                  Anybody has any advice for us?

                  Thanks,

                  D.

                  Comment


                  • #10
                    Originally posted by kmcarr View Post
                    With the release of SCS 2.4 the IPAR is no longer needed. Image analysis and basecalling are done in real time on the instrument control computer. Illumina had also sold a prebuilt system for doing downstream analysis which I think was called the Pipeline Analysis Server. Again, with SCS v2.4 running on the instrument computer, the pipeline is not required. You can still download the Pipeline software (currently version 1.4) and install it on a general purpose Linux box.

                    I would agree with picabo's recommendations above for analysis computer. Ours has 12TB of available HD space and I back up the image files immediately after the run to a tape (LTO-3) system. We keep the images for 90 days in case there is a problem or we wish to reanalyze them (which came in handy when the pipeline moved from 1.3 to 1.4).
                    Sorry, I think I dont get this right. What the pipeline is for if we have SCS 2.4 on the instrument computer? And what can the Pipeline Analysis software do?

                    Thanks,

                    D.

                    Comment


                    • #11
                      space/room temp

                      In regards to space--I read in other posts elsewhere on this site that maintaining a lower room temperature around the machine is desirable--if this is still considered the case for optimum operation (and it does make a lot of sense), then the closet sequencer is not too far fetched.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 08:47 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      54 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X