Skip to content

Icinga Director and disk checks for fuse mountpoints

When I rolled out my new Icinga2 installation, and added disk checks for all laptops, I ran into a small problem: there is a fuse mountpoint for logged in users which only the user can read. Apparently it has something to do with Flatpack.

cat /proc/mounts | grep doc
/dev/fuse /run/user/1000/doc fuse rw,nosuid,nodev,relatime,user_id=1000,group_id=1000 0 0

By default, the Icinga2 ITL has a number of file system types excluded for the "check_disk" check, even some special fuse types, but plain "fuse" is not among them. Kind of makes sense, a fuse mountpoint can be anything, and you don't want to exclude all of them by default.

This results in the following error message when the check is rolled out on our laptops:

Plugin Output
DISK CRITICAL - /run/user/1000/doc is not accessible: Permission denied

Fortunately the fix is rather easy:

 

 

Continue reading "Icinga Director and disk checks for fuse mountpoints"

How to configure notifications in Icinga2 Director

I'm using Icinga2 for a long time, but recently installed a new system and using Director for the first time. I know how to configure notifications in Icinga2 config files, but getting them working in Director (with Director options only) is a bit of a challenge.

Here is a step-by-step to get simple mail notifications working. From there it should be easier to configure more advanced notifications.

 

Continue reading "How to configure notifications in Icinga2 Director"

Monitor software version changes with Huginn

Huginn is a great piece of software, but the documentation is ... a bit sparse. Especially when it comes to details of the agents. I'm going to blog about a couple more more examples in the future.

For another project I'm using Leaflet, a JavaScript library for rendering maps in a browser. New versions are released occasionally, and I want to know when it's time to update the project website. Huginn can do that.

 

Continue reading "Monitor software version changes with Huginn"

Monitor website status with Huginn

After setting up Huginn, and implementing the actions on my todo list, I had a look at the available agents and started thinking what else they can be useful for.

One of the ideas I came up with is monitoring if a website is available, or has some trouble. I already have a monitoring system in place, but it's a nice exercise to learn more about the other agents.

 

Continue reading "Monitor website status with Huginn"

Grafana: select host for a dashboard

InfluxDB is running on a Raspberry Pi in my home network (with separate attached disk), and I installed a Grafana on top of it, to visualize crucial data.

In Grafana it is possible to define a variable for a dashboard and this variable can query the data source and use the returned list of values. Let's say the variable is $host, then the data query can use:

WHERE host =~ /^$host$/

and limit the current dashboard to the selected host. Also the variable will provide a select field at the top of the dashboard, which allows selecting the system one wants to see:

Now usually - according to the documentation - a "SHOW TAG VALUES" in the data source should be sufficient However as it is, this did not work for me, and the query came back empty:

> show tag values from system with key = host

Looks like I'm not the only one with this problem.

 

Luckily there is a way around with another query:

select distinct("host") from (select "host","load1" from system)

The result:

> select distinct("host") from (select "host","load1" from system)
name: system
time distinct
---- --------
0    host1
0    host2
0    host3
0    host4
0    host5
0    host6

Grafana ignores the "time" column and uses the second column for the host list. Voila.

Monitor additional details in Telegraf with the "Exec" input filter

After installing Telegraf and hooking up everything into InfluxDB, I was missing the status of my backups. Every system here creates encrypted backups every night, and stores them on a central NAS, and off-site. But I want to know statistics about the backups, and see if something is not working.

I'm using Restic for the backups (will blog about this another time). However Telegraf does not support Restic directly, I need a few workarounds. This blog post however is not directly about monitoring the backups, but about how to write your own plugin for Telegraf.

 

Continue reading "Monitor additional details in Telegraf with the "Exec" input filter"

Install Telegraf using Ansible

I have an InfluxDB up and running in my network, and decided to monitor all (well, all possible - the QNAP seems to be a problem) devices. That's quite easy to do by installing Telegraf as a server agent, and add some configuration. Everything is deployed using Ansible, so I can re-use the same Playbook for many devices.

 

Continue reading "Install Telegraf using Ansible"

Add InfluxDB settings in Telegraf using Ansible: [WARNING]: The value [...] (type list) in a string field was converted to "[...]" (type string)

I'm in the process of updating my entire home setup, and integrate everything properly. Part of this process is to automate everything, and use Ansible Playbooks to deploy devices and configurations.

Today: install Telegraf and send data to InfluxDB

Along the way something broke, and Ansible doesn't really like me anymore. But let's start at the beginning.

In the Telegraf configuration in "/etc/telegraf/telegraf.conf" one can specify output plugins. One of them (probably the most used one) is "InfluxDB". The InfluxDB instance(s) are specified as a [...] list. In Ansible I somehow need to have this list as a string, and write it into the configuration file. This happens:

TASK [Update telegraf.conf InfluxDB settings] ***************************************************
changed: [localhost] => (item={'section': '[outputs.influxdb]', 'option': 'urls', 'value': ['http://192.168.xxx.xxx:8086'], 'state': 'present'})
[WARNING]: The value ['http://192.168.xxx.xxx:8086'] (type list) in a string field was converted to "['http://192.168.xxx.xxx:8086']" (type string). If this does not look like what you expect, quote the entire value to ensure it does not change.

Looks nasty ...

 

Continue reading "Add InfluxDB settings in Telegraf using Ansible: [WARNING]: The value [...] (type list) in a string field was converted to "[...]" (type string)"

Monitor ChromeCast status in openHAB

I really like to monitor things, to catch issues early on. In our home we have a couple ChromeCasts, both Audio and Video. They are all connected to the openHAB system, Once in a while they stop working, and need to be restarted (unplugged and plugged in again). Unfortunately you usually only find that out when you want to stream something, and wonder why either the ChromeCast does not show up in the device list, or does show up but does not accept the media.

Therefore I decided to monitor the devices in openHAB.

 

Continue reading "Monitor ChromeCast status in openHAB"

Raspberry Pi watchdog for openHAB

The openHAB display in the kitchen is still the problem child. Occasionally it just stops, other times it does not refresh the HABpanel, even though it has a connection to the openHAB server. Then there is the problem with the network card in the Pi. And - ok, that's a server-side problem - occasionally the weather stops updating. All in all that's a lot of trouble for a display which is just supposed to run standalone.

In the latest iteration I looked into activating the integrated hardware watchdog in the Raspberry Pi. Checking the temperature it never goes above ~55°C celcius, even though the display is in an almost closed frame and can't exchange much heat with the environment. But nevertheless occasionally the Pi just halts, and stops operating.

 

Continue reading "Raspberry Pi watchdog for openHAB"

Reboot the Raspberry Pi on network failures (brcmfmac: brcmf_cfg80211_scan: scan error -110)

In one of my earlier blog posts I reported that occasionally the HABpanel will disconnect from the server. Turns out it's not HABpanel, but it's the Pi itself which is causing the trouble. Part of the problem why it took me so long to investigate is that the display is in the kitchen, and someone had to have a look and spot the small red error message. To work around that problem, I hooked the device up in the network monitoring, and had an alarm triggered when the device is not reachable. Sure enough, that happens occasionally.

Because I moved /var/log to a small RAM disk to avoid wearing out the SDcard, all logs are lost once the device is rebooted. Had to bring keyboard and mouse to the kitchen in order to save the logfiles once the device was no longer reachable over the network.

 

Continue reading "Reboot the Raspberry Pi on network failures (brcmfmac: brcmf_cfg80211_scan: scan error -110)"