Monitor additional details in Telegraf with the "Exec" input filter

Posted by ads' corner on Thursday, 2020-06-25
Posted in [Ansible][Influxdb][Linux][Restic]

After installing Telegraf and hooking up everything into InfluxDB, I was missing the status of my backups. Every system here creates encrypted backups every night, and stores them on a central NAS, and off-site. But I want to know statistics about the backups, and see if something is not working.

I’m using Restic for the backups (will blog about this another time). However Telegraf does not support Restic directly, I need a few workarounds. This blog post however is not directly about monitoring the backups, but about how to write your own plugin for Telegraf.

Telegraf allows to include external plugins using the [Exec Input Plugin](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec). The theory is easy:

  • Telegraf will read the STDOUT coming from the plugin
  • The plugin can send data in different formats (CSV, JSON, or InfluxDB line protocol)

Since I’m using InfluxDB, I opted for the line protocol. Every data set is a single line:

  • The first word is the measurement (backup in my case) - this ends up being the table name in InfluxDB
  • A space
  • The second set are the tags, in key=value format, and multiple tags are comma-separated (example: host=sunlight,name=home)
  • A space
  • The third set is the actual data, again in key=value format (example: size=123456i,status=true,age=98765i)
  • A space
  • The timestamp in Nanoseconds (Usually it’s ok if this is the Unix timestamp - Epoch - and multiplied by 1000000000)

The data field uses different data types, and the value is followed by a type indicator:

  • Integer: size=123456i
  • Float (the default): temperature=27
  • String: path=/home
  • Boolean: status=true or status=false

Even if the data field looks like an integer, but has no i suffix, InfluxDB will assume it’s a floating point number.

This is coded into a Python script, in /usr/bin/restic-telegraf.py. Now this must be hooked up into Telegraf.

Telegraf Configuration

As mentioned in my previous blog post about Telegraf, the Playbook I’m using can re-create the telegraf.conf from scratch, and it configures the Input and Output plugins. I have to add the exec Input plugin to the list in telegraf-config.yml, and I introduce a new flag monitor_backup - because multiple scripts can use the exec plugin:

1
2
3
4
input_filters: "cpu:mem:disk:diskio:swap:net:system:postfix:processes:smart:temp:wireless:exec"

# also include "exec" in "input_filters"!
monitor_backup: True

The exec plugin uses the [[inputs.exec]] section in the configuration, however the command list is pre-populated with 3 demo scripts. That’s a problem, because the Ansible ini_file plugin can’t replace this, and will produce an invalid configfile. I’m already loading the content of the configfile into $etc_telegraf_telegraf_conf, and use this to check if the demo scripts are still in the file. If that is true, I replace the entire command section with the replace module before moving on to the actual configuration:

1
2
3
4
5
6
7
8
9
# the initial demo configuration spawns multiple lines
# that's not working with the "ini_file" plugin
# set this to an empty list
- name: Remove default Exec command configuration
  replace:
    path: "/etc/telegraf/telegraf.conf"
    regexp: 'commands = \[[\s\S]*?\]'
    replace: 'commands = []'
  when: etc_telegraf_telegraf_conf.find('/usr/bin/mycollector') != -1

After that problem is moved out of the way, I can configure the plugin:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
- name: Update telegraf.conf Exec settings
  ini_file:
    path: "/etc/telegraf/telegraf.conf"
    section: "{{ item.section }}"
    option: "{{ item.option }}"
    value: "{{ item.value }}"
    state: "{{ item.state }}"
  loop:
    - { section: "[inputs.exec]", option: "commands", value: ' ["/usr/bin/restic-telegraf.py"]', state: present }
    - { section: "[inputs.exec]", option: "timeout", value: '"5s"', state: present }
    - { section: "[inputs.exec]", option: "data_format", value: '"influx"', state: present }
    - { section: "[inputs.exec]", option: "interval", value: '"300s"', state: present }
    - { section: "[inputs.exec]", option: "name_suffix", value: '""', state: present }
  notify:
    - restart telegraf

Format is changed to influx, and I increase the interval time to 5 minutes: I don’t really need to check the backup every 10 seconds.

The entire block is wrapped into a where condition which checks if the backup monitoring is enabled:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
- block:
  - name: Check if the exec plugin is included
    fail:
      msg: "Please include the 'exec' input_filters plugin!"
    when: telegraf_config.input_filters.find('exec') == -1

  # the initial demo configuration spawns multiple lines
  # that's not working with the "ini_file" plugin
  # set this to an empty list
  - name: Remove default Exec command configuration
    replace:
      path: "/etc/telegraf/telegraf.conf"
      regexp: 'commands = \[[\s\S]*?\]'
      replace: 'commands = []'
    when: etc_telegraf_telegraf_conf.find('/usr/bin/mycollector') != -1

  - name: Update telegraf.conf Exec settings
    ini_file:
      path: "/etc/telegraf/telegraf.conf"
      section: "{{ item.section }}"
      option: "{{ item.option }}"
      value: "{{ item.value }}"
      state: "{{ item.state }}"
    loop:
      - { section: "[inputs.exec]", option: "commands", value: ' ["/usr/bin/restic-telegraf.py"]', state: present }
      - { section: "[inputs.exec]", option: "timeout", value: '"5s"', state: present }
      - { section: "[inputs.exec]", option: "data_format", value: '"influx"', state: present }
      - { section: "[inputs.exec]", option: "interval", value: '"300s"', state: present }
      - { section: "[inputs.exec]", option: "name_suffix", value: '""', state: present }
    notify:
      - restart telegraf

  when: telegraf_config.monitor_backup == True

That’s it, the backup status data is now feeded into InfluxDB.


Categories: [Ansible] [Influxdb] [Linux] [Restic]