Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina pipeline multicoring

    hi, we just installed the illumina pipeline 1.3.2. I haven't used it myself, but someone else who used it told me that only 6 of the 16 cores were used while multithreading.

    They asked me if i had capped the core usage on some users, which i have not. In fact we run the basecalling for the 454 on our own server and that process uses all 16 cores while multicoring. I also have my own perl scripts that query the DB for analysis and those scripts are using threads and i get to 1600% CPU usage, so i am using 16 cores.

    Is there any logical reason why the illumina pipeline is only 6 cores? Has anyone experienced this as well? It is the 1.3.2 illumina pipeline.

  • #2
    I don't run the Illumina software and so can not answer your question directly.

    However your message did raise a warning flag in my sysadmin brain. You say " ..I haven't used it myself, but someone else who used it told me..." My suggestion is to double check this claim yourself before wondering about if the pipeline is limited to 6 cores. The pipeline may or may not be so limited. But without independent replication of the problem I would not trust the end user to be correct.

    Comment


    • #3
      I have used the Illumina pipeline regularly but have no connection with the developers so I can't speak with authority. The pipeline is made up from a large collection of perl and python scripts plus some compiled executables. It is my understanding that none of the components of the pipeline are themselves multiprocessing capable. The operation of the pipeline is managed by the make utility. To utilize multiple CPUs (or cores) you have to pass the '-j N' option to make where 'N' is the number of CPUs you want to use for running the pipeline.

      Since the data processing performed by the pipeline falls into the class of embarrassingly parallel problems multiple CPUs are utilized by launching multiple instances of the programs to work on different chunks of data. By default most of the pipeline programs work on the data one tile at a time. For a full flow cell on the GAII this would be 800 independent chunks of data. The make utility handles launching the pipeline programs and passing out chunks of data to them. It will use at most the number of CPUs specified by the '-j' parameter. If the user inadvertently passed '-j 6' instead of '-j 16' to make when he was launching the pipeline that may explain the problem.

      Comment


      • #4
        well, i don't know what the other persons were using. They only told me that they ran a validation run and only 6 of the 16 cores were used. I was just wondering if that perhaps was a illumina pipeline limitation or if somebody else had experienced the same.

        And btw, perl is capable of multicoring, I just discovered some nice tricks last week. Just read on Threads on CPAN, it is possible to use a 'for loop' and push every loop into a single thread. With some tweaking you can push for example 16 loops in 16 cores and speed up your code significantly. You can contact me via PM if you want more information.

        Comment


        • #5
          pebcak ?

          -j6 rather than -j16 sounds pretty likely.

          Comment


          • #6
            An alternative explanation:

            The pipeline parallelizes with different granularities for different tasks. For most steps, the granularity is at the tile level, while for the eland and the reporting steps the granularity is at the lane level. Depending on how many lanes and exactly when you look, it is quite possible to see fewer processors being used than what you specified.

            In this specific case, the user could have been using 6 lanes, and looked at the core usage during eland.

            (I think the -j6 rather than -j16 is more likely though).

            Curt

            Comment


            • #7
              Curt,

              Good point. I couldn't recall when I wrote my response if Eland split tasks on the tile or lane level so I didn't go into that possibility. I would say your explanation, looking at usage while Eland is processing 6 lanes, is equally probable. That assumes the original user had set ELAND_MULTIPLE_INSTANCES in his/her GERALD config file.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X