Posted by
ads' corner
on
Wednesday, 2020-06-24 Posted in [Ansible][Influxdb][Linux]
I have an InfluxDB up and running in my network, and decided to monitor all (well, all possible - the QNAP seems to be a problem) devices. That’s quite easy to do by installing Telegraf as a server agent, and add some configuration. Everything is deployed using Ansible, so I can re-use the same Playbook for many devices.
Let’s assume I have a laptop which I want to monitor. The hostname is sunlight (it’s not, I’m just using this as an example).
InfluxDB
First I need to create a database and credentials in InfluxDB.
At this point I can decide if I:
have everything in a single database, or use a separate database for monitoring the devices - I decided to use a separate database home
make it easy to have one account for all devices, or use separate accounts per device - I decided to roll out separate accounts
just go with ALL permissions, or is it enough to have WRITE permission for the device account - if the database is already created, the account only needs WRITE
The following files are in the InfluxDB Playbook directory:
-r-------- 1 mon mon 13 Jun 17 17:46 credentials/influxdb-admin-password.txt
-r-------- 1 mon mon 6 Jun 17 17:45 credentials/influxdb-admin-username.txt
-r-------- 1 mon mon 5 Jun 23 00:44 credentials/influxdb-sunlight-dbname.txt
-r-------- 1 mon mon 13 Jun 23 00:43 credentials/influxdb-sunlight-password.txt
-r-------- 1 mon mon 8 Jun 23 00:44 credentials/influxdb-sunlight-username.txt
-r-------- 1 mon mon 5 Jun 23 00:44 credentials/influxdb-home-dbname.txt
-r-------- 1 mon mon 13 Jun 23 02:02 credentials/influxdb-home-password.txt
-r-------- 1 mon mon 8 Jun 23 00:44 credentials/influxdb-home-username.txt
credentials/influxdb-sunlight-dbname.txt and credentials/influxdb-home-dbname.txt hold the same database name: home, but I use different files in case I have to separate this in the future.
In InfluxDB I ensure that authentication is enabled:
That’s it for InfluxDB, and the server side. Now for the laptop.
Telegraf: Sunlight
In Ansible I outsource the configuration into a variable file, so I can re-use the same Playbook. Another way to do it is to have a Role, and define the variables all in your hosts.cfg.
The telegraf-config.yml:
1
2
3
4
5
6
7
8
9
10
11
12
---hostname:"sunlight"input_filters:"cpu:mem:disk:diskio:swap:net:system:postfix:processes:smart:temp:wireless"output_filters:"influxdb"# the space at the beginning is required, otherwise Ansible# parses the string as a listinfluxdb_url:' ["http://192.168.0.21:8086"]'influxdb_database:"{{ lookup('file', playbook_dir + '/credentials/influxdb-sunlight-dbname.txt') }}"influxdb_username:"{{ lookup('file', playbook_dir + '/credentials/influxdb-sunlight-username.txt') }}"influxdb_password:"{{ lookup('file', playbook_dir + '/credentials/influxdb-sunlight-password.txt') }}"
This defines the hostname for this devices, the list of input and output filters I want to use, and the InfluxDB credentials. For $influxdb_url there needs to be a space at the beginning of the string, details are here.
The only time the file is deleted by the Playbook is when the telegraf package is installed. Make sure the state=present, not state=latest - otherwise this will also be triggered when the package is updated.
By making sure the /etc/telegraf/telegraf.conf file is re-created from scratch, one can wipe this file at any time, and have the Playbook take care of the situation. This comes handy when additional input filters are added. Re-creating the configfile:
- name:Figure out if configfile existsstat:path:"/etc/telegraf/telegraf.conf"register:etc_telegraf_telegraf_conf# create a new telegraf.conf- block:# configfile does not exist# can only happen when the configfile was moved away right after# the package was installed# generate a configfile using "telegraf config"- name:Generate telegraf.confshell:"telegraf --input-filter {{ telegraf_config.input_filters }} --output-filter {{ telegraf_config.output_filters }} config > /etc/telegraf/telegraf.conf"# will fail when the config is not alright- name:Test generated configcommand:telegraf --config /etc/telegraf/telegraf.conf --testregister:test_configwhen:etc_telegraf_telegraf_conf.stat.exists != True
The block above will also test the configfile, and fail early if the file is faulty. In theory a temp file should be used which is first created, then tested and if valid it will replace the configfile. But things work smooth in my setup and I don’t really need this extra step.
Configuration
Next I’m reading the configuration file into a variable. That’s necessary because some of the input and output filters need to verify if specific content is already/still in the file. Especially the exec input filter needs this step, but that’s for another blog post.
After this step, the $etc_telegraf_telegraf_conf holds the current content of the file. Keep in mind that any of the following steps which change telegraf.conf will not update the content of the variable.
In a first step I’m changing a couple of settings to make sure that Telegraf has a larger buffer available to store data temporarily. This is a laptop, and occasionally it’s not in the home network. Want to keep as much data around as possible. This only works if the laptop is not rebooted, but that is a rare occasion anyway. I’m also explicitely defining the hostname which is used to tag the data in InfluxDB. Don’t want to rely on Telegraf to figure out what hostname it is, and maybe get fooled by auto-assigned hostnames from a DHCP service.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# set a couple default settings# especially increase buffers to ensure more data is stored if# the device is not able to reach the InfluxDB service- name:Update telegraf.conf default settingsini_file:path:"/etc/telegraf/telegraf.conf"section:"{{ item.section }}"option:"{{ item.option }}"value:"{{ item.value }}"state:"{{ item.state }}"loop:- {section:"agent", option:"hostname", value:'"{{ telegraf_config.hostname }}"', state:present }- {section:"agent", option:"metric_buffer_limit", value: '1000000', state:present }- {section:"agent", option:"metric_batch_size", value: '10000', state:present }notify:- restart telegraf
This part is only run if influxdb is in the list of output filters. It also turns off automatic database creation. That part is not necessary because the database already exists, and because the user has only WRITE permissions (which can’t be used to create a database).
Since I have a local Postfix installed on the laptop, I also monitor the queue:
The Telegraf agent runs as an unprivileged user - this user must be able to see the Postfix queue. That’s somehow possible with group rights, but easy to do with ACLs. Ansible has the acl module which is used to set these permissions - but only if postfix is included in the input filter list.