Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • yaximik
    Senior Member
    • Apr 2011
    • 199

    GNU parallel - any benefits?

    I wonder if anyone tried speeding up bioinformatics jobs with GNU parallel - any experience or just thoughts?
  • maasha
    Senior Member
    • Apr 2009
    • 153

    #2
    GNU parallel is brilliant for executing command line tools in a Unix/Linux setup with multiple servers/CPUs. It works very well with Biopieces. See the HowTo.

    Comment

    • turnersd
      Senior Member
      • May 2011
      • 115

      #3
      I use it all the time in place of xargs.

      Comment

      • yaximik
        Senior Member
        • Apr 2011
        • 199

        #4
        Yep, Biopieces is one example, although How To carefully says it can be used for some tasks. As examples for use of parallel do not include much of bioinformatic tasks, I wonder if there is some general idea(s) what tasks can benefit from parallel use. More specifically, could compute-intensive and long-running jobs like BLAST, alignment or de novo assembly benefit from parallel?

        Comment

        • turnersd
          Senior Member
          • May 2011
          • 115

          #5
          Parallel won't parallelize an intrinsically serial job, but very easily allows you to launch many serial jobs in parallel. I use it all the time to run an operation on lots of files by using something like, e.g.:

          Code:
          find *.fq | parallel fastqc {} --outdir .  # run fastqc on all .fq files
          find *.bam | parallel samtools index {}    # index all bam files

          Comment

          • Richard Finney
            Senior Member
            • Feb 2009
            • 701

            #6
            This looks perfect. I've got my own homebrewed program I called "tetris" which does the same thing but I'll definitely switch to this.

            Note the --max-procs parameter which throttles the serialized jobs to only use the specified amount of CPUs.

            Anybody hooked this thing up to "gnu niceload"? Any examples?

            Comment

            • yaximik
              Senior Member
              • Apr 2011
              • 199

              #7
              Interesting. Little bit off the topic, but I encountered strange difference in CPU use with fastqc. I made a small script to process 10 files at once (the box has 2 quad core processors with multithreading enabled, that is 8 physical cores and 16 threads), like
              fastqc -t 10 [file1 ... file10]
              When I launched the script CPU(s) got only to something 26us% in top. But when I just copied the above task to the command line, CPU(s) jumped to something 85us% in top. What may be the reason for the difference? Did you notice something like that with parallel?

              Comment

              • tange
                Junior Member
                • Feb 2013
                • 7

                #8
                If used for research please remember:

                Code:
                parallel --bibtex

                Comment

                • maasha
                  Senior Member
                  • Apr 2009
                  • 153

                  #9
                  @yaximik I see these major benefits of parallel: 1) use parallel instead of a for; do & done; loop to execute some command in parallel in a way that optimizes the CPU usage (parallel cleverly decides to wait for jobs to complete before starting new jobs without flooding the machine). 2) use the parallel --pipe to parallelize the processing of huge files. 3) combine 1) and 2). And then there are all the other things that parallel can do for you.
                  Last edited by maasha; 02-14-2013, 10:53 AM.

                  Comment

                  • ersgupta
                    Member
                    • Jun 2011
                    • 26

                    #10
                    I have been using it for the past 6-8 months. I feel very happy when I am able to run my jobs using parallel, because just saves a hell lot of time. Actually it helps in best utilization of the computational facilities you have.

                    Here is an example of the time that I save normally:
                    If I have to convert around 8 sam files to bam files, say it generally takes 8min for one file conversion. In serial it would take 64min, but when I run on cluster using GNU parallel, it just takes ~8min.

                    Comment

                    • maasha
                      Senior Member
                      • Apr 2009
                      • 153

                      #11
                      Over at Biostars there is this tool description.

                      Comment

                      • yaximik
                        Senior Member
                        • Apr 2011
                        • 199

                        #12
                        Originally posted by maasha View Post
                        Over at Biostars there is this tool description.
                        Oh, that is a cool set of examples. Tnx!

                        Comment

                        • rflrob
                          Member
                          • May 2010
                          • 50

                          #13
                          Another nice thing about parallel is that it makes it easy to generate filenames in an intelligent way. Say you want to convert a bunch of bam files to sam files, you can easily do:

                          parallel 'samtools view -h -o {.}.sam {}' ::: *.bam

                          which does exactly what you want, instead of potentially ending up with .bam.sam or the like. That's just a trivial example (and possibly not correct, I never exactly remember the syntax), and there's a lot more you can do with it.

                          Comment

                          • yaximik
                            Senior Member
                            • Apr 2011
                            • 199

                            #14
                            I tried to run conversion between two assembly fomats using parallel and amos2ace, but got an error:
                            Code:
                            $ cat /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg | parallel --block 100M -k --pipe --recstart '{' --recend '}' amos2ace > /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.ace
                            substr outside of string at /usr/bin/parallel line 333.
                            Any idea what does this mean and how to fix the problem?

                            Comment

                            • maasha
                              Senior Member
                              • Apr 2009
                              • 153

                              #15
                              @yaximik

                              New questions in new threads. Do your homework first:

                              read this:



                              and then

                              man parallel

                              Notice the section:

                              Your bug report should always include:

                              · The error message you get (if any).

                              · The output of parallel --version. If you are not running
                              the latest released version you should specify why you
                              believe the problem is not fixed in that version.

                              · A complete example that others can run that shows the
                              problem. This should preferably be small and simple. A
                              combination of seq, cat, echo, and sleep can reproduce most
                              errors. If your example requires large files, see if you
                              can make them by something like seq 1000000 > file.

                              · The output of your example. If your problem is not easily
                              reproduced by others, the output might help them figure out
                              the problem.

                              If you suspect the error is dependent on your environment or
                              distribution, please see if you can reproduce the error on
                              one of these VirtualBox images:
                              VirtualBoxes - Free VirtualBox(R) Images files. Full list of files for VirtualBoxes - Free VirtualBox(R) Images, Appliances of free/open source operating systems for VirtualBox


                              Specifying the name of your distribution is not enough as you
                              may have installed software that is not in the VirtualBox
                              images.

                              If you cannot reproduce the error on any of the VirtualBox
                              images above, you should assume the debugging will be done
                              through you. That will put more burden on you and it is extra
                              important you give any information that help.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              30 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              39 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              62 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...