Local cachíng of Ansible Facts
Every time Ansible runs a Playbook, the first step (by default) is gathering facts about the target system:
PLAY [all-systems]
TASK [Gathering Facts]
ok: [host1]
ok: [host2]
This step is implicit, and it is not necessary (but possible) to add the gather facts step to every Playbook. The module which retrieves all the information is "setup", and by default it tries to gather as much information about the target system as possible. When the "setup" task is added as an extra step in the Playbook, the information about the destination system is refreshed and updated:
tasks:
- name: Refresh destination information
setup:
That might be necessary when a Playbook changed vital system settings.
Gathering the facts is a time-consuming process, and for a short Playbook it is quite possible that this is the longest-running task. And it's repeated every time the Playbook runs.
Ansible provides Cache plugins which can store the gathered facts. If the system facts don't change between Playbook runs, this will greatly speed up the runtime of Playbooks. The facts cache can be stored in JSON files, in a Redis DB, in a Memcache, and a few other options. The simplest way, without additional tools required, is the "jsonfile" cache. Central implementations like Redis or Memcache allow multiple Ansible controller hosts to use the same facts cache, whereas local caches like "JSON" are only available on a single host, and every Ansible controller must build and maintain it's own cache.
Performance
For a performance test, I picked a Playbook which runs on 37 hosts, both physical hosts as well as virtual machines using ssh into LXC hosts. I run a very simple Playbook multiple times, and measure the runtime:
---
- hosts: all-systems
become: yes
tasks:
- name: Uptime test
command: uptime
That's just a simple shell command, and of course the implicit "setup" task. I repeated the test 10 times without and with Ansible cache.
Without Cache | With Cache | |
---|---|---|
First run | 37.6s | 54.6s |
Last run | 48.9s | 8.5s |
Count | 10 | 10 |
Min | 37.6s | 7.0s |
Max | 70.3s | 54.6s |
Average | 46.32s | 13.87s |
For the 10 runs without cache, all Playbook runs are in the range of 37s to 70s, with an average of 46s. The spread is explainable: Ansible runs tasks on 10 hosts in parallel, and then moves on to the next host once a host finishes. I used a larger number of hosts (37) in this example, to show a longer Playbook run and don't just test this on one or two hosts.
For the 10 runs with cache, only the first run (which populates the empty cache) needs as long as the average run without cache. All the other runs are less than 10s, and on average (over the 10 runs) the runtime is 13s. This average includes the first run with 54s. With more runs and the cache enabled, the overall average time will come closer or below 10s.
How to enable the cache?
As mentioned before, this example is using the "jsonfile" cache. The following settings need to be added to "ansible.cfg" in the Playbook directory:
gathering = smart
gather_subset = all
fact_caching = jsonfile
fact_caching_connection = facts.json
fact_caching_timeout = 86400
Ansible has different ways when to gather facts: "implicit" is the default, and will re-gather facts at the beginning of every Playbook run. This setting will ignore any facts cache. The inverse is "explicit", and will not gather facts until and unless explicitely requested by a "setup" task. The "smart" option will only gather facts if no cached facts are available.
The "gather_subset = all" is Ansible default, and will gather all destination host details. This can be limited to a certain subset, if not all the information are required.
"fact_caching = jsonfile" defines the cache type, and "fact_caching_connection = facts.json" specifies the directory name which Ansible will use to store the cache. The directory "facsts.json"will be created in the Playbook directory, if no absolute path is specified.
Finally "fact_caching_timeout = 86400" specifies that the gathered data is valid for one day (86400 seconds). After that time, Ansible will gather new data from the destination hosts. This number should depend on how often the Playbook runs, and how often critical system changes are applied.
Wiping the cache
Cleaning the cache (and forcing Ansible to re-read the facts) is easy: it is enough to delete the "facts.json" directory. Ansible will recreate it during the next Playbook run.
Disabling the "setup" step
In Ansible it is possible to completely skip the "gathering" step, and don't retrieve information about the destination host. This might be necessary, as example before the Python interpreter is installed on the host. Such initial steps are usually executed using the "raw" module, and obviously without any running Python it's not possible to gather facts before the Playbook run. Therefore on a per-Playbook basis the implicit facts gathering step can be disabled:
- hosts: all-systems
gather_facts: False
become: yes
Summary
Using the Ansible facts cache is an easy way to speedup Playbook runs. The time to gather the facts can easily be the longest-running task in a Playbook.
Comments
Display comments as Linear | Threaded