Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hardware for NGS analysis - GPU vs CPU?

    Hi all,

    Our small core lab purchased two Dell Precision T7610 Tower Workstations equipped with 1 Intel Xeon E5-2687W v2 Eight-core 3.4 GHz Turbo, 25 MB processor, 64 GB 1866MHz DDR3 RAM, 1GB NVIDIA Quadro K600 Video card, 256 GB Solid-state drive and two 1TB SATA drives, DVD-RW drive, 10Gb Network adapter, and an Nvidia Tesla K20C Computer Processor.

    I am a novice user, but some initial thoughts I have are:

    1) Do we have enough RAM to support multiple (2-3) RNA-seq analyses? For example, alignments, mapping, differential expression analysis, etc.

    2) Do we need an additional CPU? (Assuming we will be analyzing at least 2 RNA-seq experiments at any given time and will have additional users (2-3) logged on and trying to analyze their own data.)

    3) It is my understanding that the greatest limiting factor in computational requirements for NGS analysis is I/O. At this point, is there any advantage to having a GPU versus CPU when it comes to NGS analysis?

  • #2
    Originally posted by eb0906 View Post
    Hi all,

    Our small core lab purchased two Dell Precision T7610 Tower Workstations equipped with 1 Intel Xeon E5-2687W v2 Eight-core 3.4 GHz Turbo, 25 MB processor, 64 GB 1866MHz DDR3 RAM, 1GB NVIDIA Quadro K600 Video card, 256 GB Solid-state drive and two 1TB SATA drives, DVD-RW drive, 10Gb Network adapter, and an Nvidia Tesla K20C Computer Processor.

    I am a novice user, but some initial thoughts I have are:

    1) Do we have enough RAM to support multiple (2-3) RNA-seq analyses? For example, alignments, mapping, differential expression analysis, etc.

    2) Do we need an additional CPU? (Assuming we will be analyzing at least 2 RNA-seq experiments at any given time and will have additional users (2-3) logged on and trying to analyze their own data.)

    3) It is my understanding that the greatest limiting factor in computational requirements for NGS analysis is I/O. At this point, is there any advantage to having a GPU versus CPU when it comes to NGS analysis?
    It is tricky to provide meaningful answers for these kind of questions since the actual workflow will vary from time to time plus it is hard for outsiders to completely understand how your lab/users operate on a daily basis.

    But here goes.

    #1. Probably. Depending on memory usage you may have to limit number of jobs that can be running at a given time. If you work with small genomes it may not be a big problem.

    #2. If you do get an additional CPU you should look into getting more RAM (hopefully the RAM slots are not maxed out otherwise you will need to discard some memory sticks to get higher capacity ones), at least for one of the two machines. 2 x 1 TB is not much storage (hopefully you have other storage available over the network). It is not going to be enough to support multiple users.

    #3. At this time there is likely no practical benefit in your case to worry about GPU computing.

    Comment


    • #3
      Thanks, Genomax!

      You are right; it is hard to anticipate workflows.

      1) It's interesting that you mention we probably have enough RAM. Currently, one of my colleagues is running cuffdiff on 16 c.elegans samples (15M reads/sample), and it looks like it's stalling at the 'Processing Loci' step with 98% of the memory in use. Is this typical? This is our first time using these workstations for RNA-seq analysis, so we are not sure what to expect with processing time.

      2) I agree, and yes, we do have additional server space, 20 TB local and 110 TB on the network.

      Comment


      • #4
        Originally posted by eb0906 View Post
        1) It's interesting that you mention we probably have enough RAM. Currently, one of my colleagues is running cuffdiff on 16 c.elegans samples (15M reads/sample), and it looks like it's stalling at the 'Processing Loci' step with 98% of the memory in use. Is this typical? This is our first time using these workstations for RNA-seq analysis, so we are not sure what to expect with processing time.
        Is there anything else running on the system (what OS are you running BTW)? On a single server (without a job queuing system) you (or a sys admin) is going to have to keep an eye on things since resource constrained jobs would slow everything to a crawl or at the worst case lead to a hung/non-responsive server.

        With newer UNIX/Linux distros just looking at free memory (in top or a similar tool) in not enough. The OS normally caches RAM and will use it in most efficient way as needed. If system starts using a large amount of swap space (how much swap is configured on your machines) then there may be a problem. Have you looked at the swap usage?

        Comment


        • #5
          The OS is Red Hat Enterprise and it's a single server with no job queuing system (as far as I know as I have not personally run anything yet).

          This is what my colleague sent for the current run:
          top - 15:26:10 up 3 days, 5:09, 7 users, load average: 19.31, 19.18, 19.27
          Tasks: 392 total, 4 running, 388 sleeping, 0 stopped, 0 zombie
          Cpu(s): 51.7%us, 26.1%sy, 0.0%ni, 22.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
          Mem: 65919692k total, 65459552k used, 460140k free, 5228k buffer PID USER PR NI VIRT
          Swap: 25001980k total, 18036092k used, 6965888k free, 292000k cached
          RES SHR S %CPU %MEM TIME+ COMMAND
          5673 usr 20 0 78.8g 60g 1960 S 1016.5 96.9 4879:22 cuffdiff

          Is the above helpful? This is all new to me.

          Comment


          • #6
            Originally posted by eb0906 View Post
            The OS is Red Hat Enterprise and it's a single server with no job queuing system (as far as I know as I have not personally run anything yet).

            This is what my colleague sent for the current run:
            top - 15:26:10 up 3 days, 5:09, 7 users, load average: 19.31, 19.18, 19.27
            Tasks: 392 total, 4 running, 388 sleeping, 0 stopped, 0 zombie
            Cpu(s): 51.7%us, 26.1%sy, 0.0%ni, 22.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
            Mem: 65919692k total, 65459552k used, 460140k free, 5228k buffer PID USER PR NI VIRT
            Swap: 25001980k total, 18036092k used, 6965888k free, 292000k cached
            RES SHR S %CPU %MEM TIME+ COMMAND
            5673 usr 20 0 78.8g 60g 1960 S 1016.5 96.9 4879:22 cuffdiff

            Is the above helpful? This is all new to me.
            You might need to mask rRNA and other abundant RNA species. I've had similar issues with cufflinks hanging at this step when processing human RNA-Seq data on a very similarly built workstation. Building a GTF of rRNA from the UCSC repeatmasker table to use with the -M flag fixed it right up for me. I couldn't find the original thread where I found the solution, but this one seems pretty similar

            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

            Comment


            • #7
              Of course google finds it for me right after I posted. I'm not sure if it's a STAR specific issue, but I was using it as my aligner when I ran into the problem and found the solution on their message boards

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Working...
              X