Skip to content

Raspberry Pi watchdog for openHAB

The openHAB display in the kitchen is still the problem child. Occasionally it just stops, other times it does not refresh the HABpanel, even though it has a connection to the openHAB server. Then there is the problem with the network card in the Pi. And - ok, that's a server-side problem - occasionally the weather stops updating. All in all that's a lot of trouble for a display which is just supposed to run standalone.

In the latest iteration I looked into activating the integrated hardware watchdog in the Raspberry Pi. Checking the temperature it never goes above ~55°C celcius, even though the display is in an almost closed frame and can't exchange much heat with the environment. But nevertheless occasionally the Pi just halts, and stops operating.

 

All Raspberry Pi come with an integrated hardware watchdog, which can be used to trigger a reboot if the device is not responsible. The way this works is that the Linux kernel offers a device, which must be updated every few seconds by an userland application:

crw------- 1 root root  10, 130 May  4 04:00 /dev/watchdog

There are software versions of this, and hardware versions. Obviously the software version requires at least a running kernel, whereas the hardware version can trigger a reboot even if the operating system stopped entirely. The mechanism is only activated after the first write into this devices - this avoids a reboot loop if no application is able to update the trigger. But it is also a race condition: if the userland never comes up, this reboot is never triggered.

After installing and enabling the watchdog, at least the display in the kitchen is now up and running all the time. Progress ...

The Raspbian OS comes with the "watchdog" package. The Ansible Playbook can install and configure everything.

- name: Install watchdog packages
  apt:
    name:
      - watchdog
    state: present
  register: watchdog_installed

A couple settings are necessary, all of them go into /etc/watchdog.conf:

- name: Update /etc/watchdog.conf
  lineinfile:
    dest: /etc/watchdog.conf
    regexp: "{{ item.regexp }}"
    line: "{{ item.line }}"
    state: "{{ item.state }}"
  with_items:
    - { regexp: '^#? *watchdog-device', line: 'watchdog-device = /dev/watchdog', state: present }
    - { regexp: '^#? *watchdog-timeout', line: 'watchdog-timeout = 15', state: present }
    - { regexp: '^#? *realtime', line: 'realtime = yes', state: present }
    - { regexp: '^#? *priority', line: 'priority = 1', state: present }
  notify:
    - restart watchdog

The "restart watchdog" handler is simply a handler with a "service" call, and goes into the Ansible "handlers" section.

- name: restart watchdog
  service:
    name: watchdog
    state: restarted

"watchdog" is able to monitor more than just the OS. It can also monitor certain PIDs, or applications, or the system load ect. But that is beyond my use case here.

Note: certain installation instructions might require loading the "bcm2835_wdt" or "bcm2708_wdog" kernel module. In recent kernels this driver is already compiled into the kernel, and no module must be loaded.

Trackbacks

No Trackbacks

Comments

Display comments as Linear | Threaded

No comments

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
To leave a comment you must approve it via e-mail, which will be sent to your address after submission.
Form options