Skip to content

Install GNU parallel with Ansible

GNU parallel allows you to multiplex tasks, and possibly use more CPU resources to speed up the task at hand. This works if your task can be split down into multiple independent tasks which otherwise will be executed serially.

An example: you find files in a directory, and want to compress all of them:

find /path/to/directory -type f -exec bzip2 -9 {} \;

Above line will find all the files, and compress each of them, one after the other. Most modern systems have multiple CPU cores installed, but nevertheless above line will only use one of them. GNU parallel solves this by multiplexing the task, and starting multiple compress processes. Above line changes to:

find /path/to/directory -type f -print0 | parallel -0 --no-run-if-empty bzip2 -9 :::

By default, parallel will start as many parallel processes as CPU cores are available. The --jobs option can be used to specify a hardcoded number (as example: "8"), or limit the number depending on the number of available cores ("-2" will start 6 processes if 8 cores are available, "+2" will start 10 processes if 8 cores are available).

However when you start "parallel", it will nag you that you confirm that any time you use it for processing data for an academic article, you have to cite "parallel":

Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence this citation notice: run 'parallel --citation'.

That's ok if you use it manually, but in a server environment no one will ever see this note.

 

Continue reading "Install GNU parallel with Ansible"