Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • number of cores in DEXSeq using parallel library

    I am using DEXSeq and one of the steps is to estimateDispersions. It offers the parallel computing functionality via the parallel R package. In particular, I can run:

    estimateDispersions(pExons, nCores=3, quiet=TRUE)

    I think here, nCores=3 indicates that 4 CPU's are available but we only use 3 to avoid any disadvantageous overhead. My question is, because I am using a cluster, and the command:

    $ less /proc/cpuinfo

    displays different processors information (for a total of 15 processors), and each processor has 4 CPU cores. In this case, what shall I specify for the "nCores=" argument? Still 3? Thank you!
    Last edited by alittleboy; 06-26-2013, 06:52 PM.

  • #2
    If you're looking at the number of "processor" lines in /proc/cpuinfo, that will typically be the number of cores (or hyperthreads, depending on the cpu) available. For example, my desktop computer has a single processor with 6 cores (each with hyperthreading, so sort of 12). /proc/cpuinfo, then, reports 12 processors. The nodes on our cluster have two processors like this, so they'll report 24 "processor" lines in /proc/cpuinfo.

    It's usually most convenient to just think about number of available cores, not processors. If you use, say, 12 cores, they'll likely just be distributed over all of the physical processors.

    Comment


    • #3
      Originally posted by dpryan View Post
      If you're looking at the number of "processor" lines in /proc/cpuinfo, that will typically be the number of cores (or hyperthreads, depending on the cpu) available. For example, my desktop computer has a single processor with 6 cores (each with hyperthreading, so sort of 12). /proc/cpuinfo, then, reports 12 processors. The nodes on our cluster have two processors like this, so they'll report 24 "processor" lines in /proc/cpuinfo.

      It's usually most convenient to just think about number of available cores, not processors. If you use, say, 12 cores, they'll likely just be distributed over all of the physical processors.
      Hi @dpryan:

      Thanks for the information! So in my situation, since I have 15 processors, each having 4 cores, then I can specify nCores=45 so that each processor can be distributed with 3 cores?

      Comment


      • #4
        On average, at least (there's no guarantee that things will be equally distributed).

        I should note that having 15 actual CPUs on one system is unusual (I've never heard of it, at least). I suspect that you actually have a system with 16 cores (the counting starts at 0), which would be more common. Also, even with an infinite number of cores, performance won't always increase with increasing number of allocated cores. So, if you'll be doing this a lot then just run a few tests to find out what's fastest.

        Comment


        • #5
          Originally posted by dpryan View Post
          On average, at least (there's no guarantee that things will be equally distributed).

          I should note that having 15 actual CPUs on one system is unusual (I've never heard of it, at least). I suspect that you actually have a system with 16 cores (the counting starts at 0), which would be more common. Also, even with an infinite number of cores, performance won't always increase with increasing number of allocated cores. So, if you'll be doing this a lot then just run a few tests to find out what's fastest.
          Hi @dpryan:

          Yes, you're right, there are 16 CPUs on the system... I set nCores=4 and so far the program runs well.

          I am a little bit surprise that for a simple two-group comparison (control vs. treatment), with >55,000 "genes" (some rows are like ENSG1+ENSG2+...), it might take more than 2 days to get the estimateDispersions results... I guess for the differential exon testing part, it may take that long as well... I also tried to include another covariate in the model, and after 4 hours, there is not a single dot (meaning 100 genes processed) shown on screen (for disp. estimation)!

          Is it common for exon-level inference to take such a long time?

          Thanks ;-)
          Last edited by alittleboy; 06-27-2013, 12:53 PM.

          Comment


          • #6
            Yes, a lot of the DEXseq steps can take quite a long time to complete since there are so many exons to calculate. You might try increasing nCores to 12 or 16.

            Comment


            • #7
              Originally posted by dpryan View Post
              Yes, a lot of the DEXseq steps can take quite a long time to complete since there are so many exons to calculate. You might try increasing nCores to 12 or 16.
              Hi @dpryan:

              Thanks! I realize that it seems to be faster than expected: the speed for each 100 genes slot differs, and I notice that it takes much longer for the 1st point to appear ;-) The dispersion estimation took ~9 hours and DEU <2 hours.

              Comment


              • #8
                Originally posted by alittleboy View Post
                Hi @dpryan:

                Thanks! I realize that it seems to be faster than expected: the speed for each 100 genes slot differs, and I notice that it takes much longer for the 1st point to appear ;-) The dispersion estimation took ~9 hours and DEU <2 hours.
                Hi alittleboy,

                Could you see any dot when you used multiple cores? I can see dots when I use only one core but there's no dot when I use multiple cores (after one day). Is that normal?

                Thank you,

                -Jia

                Comment


                • #9
                  Originally posted by jialu View Post
                  Hi alittleboy,

                  Could you see any dot when you used multiple cores? I can see dots when I use only one core but there's no dot when I use multiple cores (after one day). Is that normal?

                  Thank you,

                  -Jia
                  The same as you when I set nCores=8.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X