Every time Ansible runs a Playbook, the first step (by default) is gathering facts about the target system:
|
|
This step is implicit, and it is not necessary (but possible) to add the gather facts step to every Playbook. The module which retrieves all the information is setup, and by default it tries to gather as much information about the target system as possible. When the setup
task is added as an extra step in the Playbook, the information about the destination system is refreshed and updated:
|
|
That might be necessary when a Playbook changed vital system settings.
Gathering the facts is a time-consuming process, and for a short Playbook it is quite possible that this is the longest-running task. And it’s repeated every time the Playbook runs.
Ansible provides Cache plugins which can store the gathered facts. If the system facts don’t change between Playbook runs, this will greatly speed up the runtime of Playbooks. The facts cache can be stored in JSON files, in a Redis DB, in a Memcache, and a few other options. The simplest way, without additional tools required, is the jsonfile cache. Central implementations like Redis or Memcache allow multiple Ansible controller hosts to use the same facts cache, whereas local caches like “JSON” are only available on a single host, and every Ansible controller must build and maintain it’s own cache.
Performance
For a performance test, I picked a Playbook which runs on 37 hosts, both physical hosts as well as virtual machines using ssh into LXC hosts. I run a very simple Playbook multiple times, and measure the runtime:
|
|
That’s just a simple shell command, and of course the implicit setup
task. I repeated the test 10 times without and with Ansible cache.
Without Cache | With Cache | |
---|---|---|
First run | 37.6s | 54.6s |
Last run | 48.9s | 8.5s |
Count | 10 | 10 |
Min | 37.6s | 7.0s |
Max | 70.3s | 54.6s |
Average | 46.32s | 13.87s |
For the 10 runs without cache, all Playbook runs are in the range of 37s to 70s, with an average of 46s. The spread is explainable: Ansible runs tasks on 10 hosts in parallel, and then moves on to the next host once a host finishes. I used a larger number of hosts (37) in this example, to show a longer Playbook run and don’t just test this on one or two hosts.
For the 10 runs with cache, only the first run (which populates the empty cache) needs as long as the average run without cache. All the other runs are less than 10s, and on average (over the 10 runs) the runtime is 13s. This average includes the first run with 54s. With more runs and the cache enabled, the overall average time will come closer or below 10s.
How to enable the cache?
As mentioned before, this example is using the jsonfile
cache. The following settings need to be added to ansible.cfg
in the Playbook directory:
|
|
Ansible has different ways when to gather facts: implicit
is the default, and will re-gather facts at the beginning of every Playbook run. This setting will ignore any facts cache. The inverse is explicit
, and will not gather facts until and unless explicitely requested by a setup
task. The smart
option will only gather facts if no cached facts are available.
The gather_subset = all
is Ansible default, and will gather all destination host details. This can be limited to a certain subset, if not all the information are required.
fact_caching = jsonfile
defines the cache type, and fact_caching_connection = facts.json
specifies the directory name which Ansible will use to store the cache. The directory facsts.json
will be created in the Playbook directory, if no absolute path is specified.
Finally fact_caching_timeout = 86400
specifies that the gathered data is valid for one day (86400 seconds). After that time, Ansible will gather new data from the destination hosts. This number should depend on how often the Playbook runs, and how often critical system changes are applied.
Wiping the cache
Cleaning the cache (and forcing Ansible to re-read the facts) is easy: it is enough to delete the facts.json
directory. Ansible will recreate it during the next Playbook run.
Disabling the “setup” step
In Ansible it is possible to completely skip the gathering
step, and don’t retrieve information about the destination host. This might be necessary, as example before the Python interpreter is installed on the host. Such initial steps are usually executed using the raw module, and obviously without any running Python it’s not possible to gather facts before the Playbook run. Therefore on a per-Playbook basis the implicit facts gathering step can be disabled:
|
|
Summary
Using the Ansible facts cache is an easy way to speedup Playbook runs. The time to gather the facts can easily be the longest-running task in a Playbook.