Local cachíng of Ansible Facts

Posted by ads' corner on Sunday, 2020-03-29
Posted in [Ansible][Linux]

Every time Ansible runs a Playbook, the first step (by default) is gathering facts about the target system:

1
2
3
4
5
PLAY [all-systems]

TASK [Gathering Facts]
ok: [host1]
ok: [host2]

This step is implicit, and it is not necessary (but possible) to add the gather facts step to every Playbook. The module which retrieves all the information is setup, and by default it tries to gather as much information about the target system as possible. When the setup task is added as an extra step in the Playbook, the information about the destination system is refreshed and updated:

1
2
3
  tasks:
    - name: Refresh destination information
      setup:

That might be necessary when a Playbook changed vital system settings.

Gathering the facts is a time-consuming process, and for a short Playbook it is quite possible that this is the longest-running task. And it’s repeated every time the Playbook runs.

Ansible provides Cache plugins which can store the gathered facts. If the system facts don’t change between Playbook runs, this will greatly speed up the runtime of Playbooks. The facts cache can be stored in JSON files, in a Redis DB, in a Memcache, and a few other options. The simplest way, without additional tools required, is the jsonfile cache. Central implementations like Redis or Memcache allow multiple Ansible controller hosts to use the same facts cache, whereas local caches like “JSON” are only available on a single host, and every Ansible controller must build and maintain it’s own cache.

Performance

For a performance test, I picked a Playbook which runs on 37 hosts, both physical hosts as well as virtual machines using ssh into LXC hosts. I run a very simple Playbook multiple times, and measure the runtime:

1
2
3
4
5
6
7
8
---

- hosts: all-systems
  become: yes

  tasks:
    - name: Uptime test
      command: uptime

That’s just a simple shell command, and of course the implicit setup task. I repeated the test 10 times without and with Ansible cache.

Without CacheWith Cache
First run37.6s54.6s
Last run48.9s8.5s
Count1010
Min37.6s7.0s
Max70.3s54.6s
Average46.32s13.87s

For the 10 runs without cache, all Playbook runs are in the range of 37s to 70s, with an average of 46s. The spread is explainable: Ansible runs tasks on 10 hosts in parallel, and then moves on to the next host once a host finishes. I used a larger number of hosts (37) in this example, to show a longer Playbook run and don’t just test this on one or two hosts.

For the 10 runs with cache, only the first run (which populates the empty cache) needs as long as the average run without cache. All the other runs are less than 10s, and on average (over the 10 runs) the runtime is 13s. This average includes the first run with 54s. With more runs and the cache enabled, the overall average time will come closer or below 10s.

How to enable the cache?

As mentioned before, this example is using the jsonfile cache. The following settings need to be added to ansible.cfg in the Playbook directory:

1
2
3
4
5
gathering = smart
gather_subset = all
fact_caching = jsonfile
fact_caching_connection = facts.json
fact_caching_timeout = 86400

Ansible has different ways when to gather facts: implicit is the default, and will re-gather facts at the beginning of every Playbook run. This setting will ignore any facts cache. The inverse is explicit, and will not gather facts until and unless explicitely requested by a setup task. The smart option will only gather facts if no cached facts are available.

The gather_subset = all is Ansible default, and will gather all destination host details. This can be limited to a certain subset, if not all the information are required.

fact_caching = jsonfile defines the cache type, and fact_caching_connection = facts.json specifies the directory name which Ansible will use to store the cache. The directory facsts.json will be created in the Playbook directory, if no absolute path is specified.

Finally fact_caching_timeout = 86400 specifies that the gathered data is valid for one day (86400 seconds). After that time, Ansible will gather new data from the destination hosts. This number should depend on how often the Playbook runs, and how often critical system changes are applied.

Wiping the cache

Cleaning the cache (and forcing Ansible to re-read the facts) is easy: it is enough to delete the facts.json directory. Ansible will recreate it during the next Playbook run.

Disabling the “setup” step

In Ansible it is possible to completely skip the gathering step, and don’t retrieve information about the destination host. This might be necessary, as example before the Python interpreter is installed on the host. Such initial steps are usually executed using the raw module, and obviously without any running Python it’s not possible to gather facts before the Playbook run. Therefore on a per-Playbook basis the implicit facts gathering step can be disabled:

1
2
3
- hosts: all-systems
  gather_facts: False
  become: yes

Summary

Using the Ansible facts cache is an easy way to speedup Playbook runs. The time to gather the facts can easily be the longest-running task in a Playbook.


Categories: [Ansible] [Linux]