Monitor additional details in Telegraf with the "Exec" input filter
After installing Telegraf and hooking up everything into InfluxDB, I was missing the status of my backups. Every system here creates encrypted backups every night, and stores them on a central NAS, and off-site. But I want to know statistics about the backups, and see if something is not working.
I'm using Restic for the backups (will blog about this another time). However Telegraf does not support Restic directly, I need a few workarounds. This blog post however is not directly about monitoring the backups, but about how to write your own plugin for Telegraf.
Telegraf allows to include external plugins using the "Exec Input Plugin". The theory is easy:
- Telegraf will read the STDOUT coming from the plugin
- The plugin can send data in different formats (CSV, JSON, or InfluxDB line protocol)
Since I'm using InfluxDB, I opted for the line protocol. Every data set is a single line:
- The first word is the measurement ("backup" in my case) - this ends up being the table name in InfluxDB
- A space
- The second set are the tags, in "key=value" format, and multiple tags are comma-separated (example: "host=sunlight,name=home")
- A space
- The third set is the actual data, again in "key=value" format (example: "size=123456i,status=true,age=98765i")
- A space
- The timestamp in Nanoseconds (Usually it's ok if this is the Unix timestamp - Epoch - and multiplied by 1000000000)
The data field uses different data types, and the value is followed by a type indicator:
- Integer: size=123456i
- Float (the default): temperature=27
- String: path="/home"
- Boolean: status=true or status=false
Even if the data field looks like an integer, but has no "i" suffix, InfluxDB will assume it's a floating point number.
This is coded into a Python script, in "/usr/bin/restic-telegraf.py". Now this must be hooked up into Telegraf.
Telegraf Configuration
As mentioned in my previous blog post about Telegraf, the Playbook I'm using can re-create the "telegraf.conf" from scratch, and it configures the Input and Output plugins. I have to add the "exec" Input plugin to the list in "telegraf-config.yml", and I introduce a new flag "monitor_backup" - because multiple scripts can use the "exec" plugin:
input_filters: "cpu:mem:disk:diskio:swap:net:system:postfix:processes:smart:temp:wireless:exec"
# also include "exec" in "input_filters"!
monitor_backup: True
The "exec" plugin uses the "[[inputs.exec]]" section in the configuration, however the "command" list is pre-populated with 3 demo scripts. That's a problem, because the Ansible ini_file plugin can't replace this, and will produce an invalid configfile. I'm already loading the content of the configfile into $etc_telegraf_telegraf_conf, and use this to check if the demo scripts are still in the file. If that is true, I replace the entire "command" section with the "replace" module before moving on to the actual configuration:
# the initial demo configuration spawns multiple lines
# that's not working with the "ini_file" plugin
# set this to an empty list
- name: Remove default Exec command configuration
replace:
path: "/etc/telegraf/telegraf.conf"
regexp: 'commands = \[[\s\S]*?\]'
replace: 'commands = []'
when: etc_telegraf_telegraf_conf.find('/usr/bin/mycollector') != -1
After that problem is moved out of the way, I can configure the plugin:
- name: Update telegraf.conf Exec settings
ini_file:
path: "/etc/telegraf/telegraf.conf"
section: "{{ item.section }}"
option: "{{ item.option }}"
value: "{{ item.value }}"
state: "{{ item.state }}"
loop:
- { section: "[inputs.exec]", option: "commands", value: ' ["/usr/bin/restic-telegraf.py"]', state: present }
- { section: "[inputs.exec]", option: "timeout", value: '"5s"', state: present }
- { section: "[inputs.exec]", option: "data_format", value: '"influx"', state: present }
- { section: "[inputs.exec]", option: "interval", value: '"300s"', state: present }
- { section: "[inputs.exec]", option: "name_suffix", value: '""', state: present }
notify:
- restart telegraf
Format is changed to "influx", and I increase the interval time to 5 mintes: I don't really need to check the backup every 10 seconds.
The entire block is wrapped into a where condition which checks if the backup monitoring is enabled:
- block:
- name: Check if the exec plugin is included
fail:
msg: "Please include the 'exec' input_filters plugin!"
when: telegraf_config.input_filters.find('exec') == -1
# the initial demo configuration spawns multiple lines
# that's not working with the "ini_file" plugin
# set this to an empty list
- name: Remove default Exec command configuration
replace:
path: "/etc/telegraf/telegraf.conf"
regexp: 'commands = \[[\s\S]*?\]'
replace: 'commands = []'
when: etc_telegraf_telegraf_conf.find('/usr/bin/mycollector') != -1
- name: Update telegraf.conf Exec settings
ini_file:
path: "/etc/telegraf/telegraf.conf"
section: "{{ item.section }}"
option: "{{ item.option }}"
value: "{{ item.value }}"
state: "{{ item.state }}"
loop:
- { section: "[inputs.exec]", option: "commands", value: ' ["/usr/bin/restic-telegraf.py"]', state: present }
- { section: "[inputs.exec]", option: "timeout", value: '"5s"', state: present }
- { section: "[inputs.exec]", option: "data_format", value: '"influx"', state: present }
- { section: "[inputs.exec]", option: "interval", value: '"300s"', state: present }
- { section: "[inputs.exec]", option: "name_suffix", value: '""', state: present }
notify:
- restart telegraf
when: telegraf_config.monitor_backup == True
That's it, the backup status data is now feeded into InfluxDB.
Comments
Display comments as Linear | Threaded