Every time Ansible runs a Playbook, the first step (by default) is gathering facts about the target system:
This step is implicit, and it is not necessary (but possible) to add the gather facts step to every Playbook. The module which retrieves all the information is setup, and by default it tries to gather as much information about the target system as possible. When the
setup task is added as an extra step in the Playbook, the information about the destination system is refreshed and updated:
That might be necessary when a Playbook changed vital system settings.
Gathering the facts is a time-consuming process, and for a short Playbook it is quite possible that this is the longest-running task. And it’s repeated every time the Playbook runs.
Ansible provides Cache plugins which can store the gathered facts. If the system facts don’t change between Playbook runs, this will greatly speed up the runtime of Playbooks. The facts cache can be stored in JSON files, in a Redis DB, in a Memcache, and a few other options. The simplest way, without additional tools required, is the jsonfile cache. Central implementations like Redis or Memcache allow multiple Ansible controller hosts to use the same facts cache, whereas local caches like “JSON” are only available on a single host, and every Ansible controller must build and maintain it’s own cache.
For a performance test, I picked a Playbook which runs on 37 hosts, both physical hosts as well as virtual machines using ssh into LXC hosts. I run a very simple Playbook multiple times, and measure the runtime:
That’s just a simple shell command, and of course the implicit
setup task. I repeated the test 10 times without and with Ansible cache.
|Without Cache||With Cache|
For the 10 runs without cache, all Playbook runs are in the range of 37s to 70s, with an average of 46s. The spread is explainable: Ansible runs tasks on 10 hosts in parallel, and then moves on to the next host once a host finishes. I used a larger number of hosts (37) in this example, to show a longer Playbook run and don’t just test this on one or two hosts.
For the 10 runs with cache, only the first run (which populates the empty cache) needs as long as the average run without cache. All the other runs are less than 10s, and on average (over the 10 runs) the runtime is 13s. This average includes the first run with 54s. With more runs and the cache enabled, the overall average time will come closer or below 10s.
How to enable the cache?
As mentioned before, this example is using the
jsonfile cache. The following settings need to be added to
ansible.cfg in the Playbook directory:
Ansible has different ways when to gather facts:
implicit is the default, and will re-gather facts at the beginning of every Playbook run. This setting will ignore any facts cache. The inverse is
explicit, and will not gather facts until and unless explicitely requested by a
setup task. The
smart option will only gather facts if no cached facts are available.
gather_subset = all is Ansible default, and will gather all destination host details. This can be limited to a certain subset, if not all the information are required.
fact_caching = jsonfile defines the cache type, and
fact_caching_connection = facts.json specifies the directory name which Ansible will use to store the cache. The directory
facsts.json will be created in the Playbook directory, if no absolute path is specified.
fact_caching_timeout = 86400 specifies that the gathered data is valid for one day (86400 seconds). After that time, Ansible will gather new data from the destination hosts. This number should depend on how often the Playbook runs, and how often critical system changes are applied.
Wiping the cache
Cleaning the cache (and forcing Ansible to re-read the facts) is easy: it is enough to delete the
facts.json directory. Ansible will recreate it during the next Playbook run.
Disabling the “setup” step
In Ansible it is possible to completely skip the
gathering step, and don’t retrieve information about the destination host. This might be necessary, as example before the Python interpreter is installed on the host. Such initial steps are usually executed using the raw module, and obviously without any running Python it’s not possible to gather facts before the Playbook run. Therefore on a per-Playbook basis the implicit facts gathering step can be disabled:
Using the Ansible facts cache is an easy way to speedup Playbook runs. The time to gather the facts can easily be the longest-running task in a Playbook.