Raspberry Pi watchdog for openHAB
The openHAB display in the kitchen is still the problem child. Occasionally it just stops, other times it does not refresh the HABpanel, even though it has a connection to the openHAB server. Then there is the problem with the network card in the Pi. And - ok, that's a server-side problem - occasionally the weather stops updating. All in all that's a lot of trouble for a display which is just supposed to run standalone.
In the latest iteration I looked into activating the integrated hardware watchdog in the Raspberry Pi. Checking the temperature it never goes above ~55°C celcius, even though the display is in an almost closed frame and can't exchange much heat with the environment. But nevertheless occasionally the Pi just halts, and stops operating.
All Raspberry Pi come with an integrated hardware watchdog, which can be used to trigger a reboot if the device is not responsible. The way this works is that the Linux kernel offers a device, which must be updated every few seconds by an userland application:
crw------- 1 root root 10, 130 May 4 04:00 /dev/watchdog
There are software versions of this, and hardware versions. Obviously the software version requires at least a running kernel, whereas the hardware version can trigger a reboot even if the operating system stopped entirely. The mechanism is only activated after the first write into this devices - this avoids a reboot loop if no application is able to update the trigger. But it is also a race condition: if the userland never comes up, this reboot is never triggered.
After installing and enabling the watchdog, at least the display in the kitchen is now up and running all the time. Progress ...
The Raspbian OS comes with the "watchdog" package. The Ansible Playbook can install and configure everything.
- name: Install watchdog packages
apt:
name:
- watchdog
state: present
register: watchdog_installed
A couple settings are necessary, all of them go into /etc/watchdog.conf:
- name: Update /etc/watchdog.conf
lineinfile:
dest: /etc/watchdog.conf
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
state: "{{ item.state }}"
with_items:
- { regexp: '^#? *watchdog-device', line: 'watchdog-device = /dev/watchdog', state: present }
- { regexp: '^#? *watchdog-timeout', line: 'watchdog-timeout = 15', state: present }
- { regexp: '^#? *realtime', line: 'realtime = yes', state: present }
- { regexp: '^#? *priority', line: 'priority = 1', state: present }
notify:
- restart watchdog
The "restart watchdog" handler is simply a handler with a "service" call, and goes into the Ansible "handlers" section.
- name: restart watchdog
service:
name: watchdog
state: restarted
"watchdog" is able to monitor more than just the OS. It can also monitor certain PIDs, or applications, or the system load ect. But that is beyond my use case here.
Note: certain installation instructions might require loading the "bcm2835_wdt" or "bcm2708_wdog" kernel module. In recent kernels this driver is already compiled into the kernel, and no module must be loaded.
Comments
Display comments as Linear | Threaded