Skip to content

Avoid "wear out" of SSD-cards in an openHAB system

You might know that problem: the brand new SSD in your system is super fast, but after a good time using it, the card is dead. Unlike spinning disks, which usually fail over time, and show I/O errors by blocks, SSD cards are prone to a problem called "Wear leveling". Blocks which are written more often will "wear out", and become unresponsible. More writes increase this risk. And a typical openHAB system does a number of writes all the time: every time an external status changes, it's written to the event log. By default the syslog is written to disk as well, and then there is a myriad of systemd services, writing status information into files.


For my first openHAB test installation I did not spent much attention to this problem. I however made sure that I installed everything using Ansible, and not in manual steps. So when the first card died, I was able to spin up the system again in a matter of hours (Ansible took over after the base system was initialized). Now it was time to do something against the "Wear leveling".

After spending some time on research, I decided on 3 steps:

  • Remove swapfiles: the Raspberry Pi has 1 GB of RAM, and does not swap
  • Update the systemd journal configuration: make storage volatile, and reduce logfile size
  • Move /var/tmp and /var/log to a RAM disk: these two directories are hit most my I/O writes


Remove swapfiles


This part is fairly easy: openHABian has a package preinstalled which I just had to remove.

- name: Remove packages
    name: "{{ item }}"
    state: absent
    purge: yes
    - dphys-swapfile

First I missed the "purge=yes" line, but when I later inspected the logfiles, I found out that not purging but just deinstalling the package leaves the systemd service entry around - which then just throws more errors into the logfile.


Update systemd journal configuration


By default, the journal is written to disk. That can be changed to "volatile", then it is kept in memory. I also reduced the size of the log - I'm mostly only interested in the last few entries anyway.

- name: check if systemd is used
    path: /etc/systemd/journald.conf
  register: journald_conf_exists

- name: Update /etc/systemd/journald.conf
    dest: /etc/systemd/journald.conf
    regexp: "{{ item.regexp }}"
    line: "{{ item.line }}"
    state: "{{ item.state }}"
    create: yes
    - { regexp: '^#? *Storage', line: 'Storage=volatile', state: present }
    - { regexp: '^#? *SystemMaxUse', line: 'SystemMaxUse=50M', state: present }
    - { regexp: '^#? *SystemMaxFileSize', line: 'SystemMaxFileSize=25M', state: present }
    - journald_conf_exists.stat.exists == True
    - restart systemd-journald

And the handler:

    - name: restart systemd-journald
        name: systemd-journald
        state: restarted


Move logs to a RAM disk


This was the complicated part, for several reasons. First of all, if everything lives on a RAM disk, the logs are gone once the system is rebooted. So it's an assessment if I need persistent logs, or not. Looking back at my usage history of the logs, I found that again I'm mostly only interested in the most recent logs. And if something goes wrong with the system: I can just setup another one. I already proved that this works.

The second problem to take into account is the size of the RAM disk, or disks in my case. systemd is storing "stuff" in /var/tmp, so I wanted to move that on a disk. After some observations, I found that the directory is never really big, so 10 MB should be sufficient.

The biggest problem is /var/log, which can grow quite a bit. In my case it stays around 20-30 MB, until logrotate jumps in and cleans up. This can probably be tuned more, but I'm not really interested in too much fine tuning there. I decided on a 50 MB RAM disk for /var/log.

Let's create the /etc/fstab entries:

- name: Update /etc/fstab
    dest: /etc/fstab
    line: "{{ item.line }}"
    state: "{{ item.state }}"
    create: yes
    - { line: 'tmpfs     /var/tmp        tmpfs   size=10M,nodev,nosuid,noatime,mode=1777     0  0', state: present }
    - { line: 'tmpfs     /var/log        tmpfs   size=50M,nodev,nosuid,noatime,mode=0755     0  0', state: present }
    - cleanout old logs
    - restart system

When something is changed here, I just reboot to activate the changes. The handlers:

    - name: cleanout old logs
      shell: rm -f /var/log/*.gz /var/log/apt/*.gz /var/log/openhab2/events.log /var/log/openhab2/openhab.log /var/log/samba/*.gz /var/log/unattended-upgrades/*.gz
        warn: false

    - name: restart system
      shell: ( /bin/sleep 5 ; shutdown -r now "Ansible triggered" ) &
      async: 30
      poll: 0
      ignore_errors: true

When the system came back online, everything seems to be working. On first sight. After checking the entire log since the reboot (the entries before that are no longer available), I found that a number of services reports problems. Mainly because files and directories in /var/log are missing now.

To fix that problem, I wrote a quick systemd service, which is fired before affected services come up (the "Before" line in the unit file), but fired after the RAM disk is mounted (the "After" line in the unit file).

Description=Create logfile directory (on RAM fs)
Before=openhab2.service samba.service nmbd.service smbd.service

ExecStart=/bin/mkdir -p /var/log/openhab2
ExecStart=/bin/chown openhab:openhabian /var/log/openhab2
ExecStart=/bin/chmod 0775 /var/log/openhab2
ExecStart=/usr/bin/setfacl -m u::rwx,g::rwx,o::r-x /var/log/openhab2
ExecStart=/usr/bin/setfacl -d -m u::rwx,g::rwx,o::r-x /var/log/openhab2
ExecStart=/bin/mkdir -p /var/log/samba
ExecStart=/bin/chown root:adm /var/log/samba
ExecStart=/bin/chmod 0750 /var/log/samba
ExecStart=/bin/mkdir -p /var/log/samba/cores /var/log/samba/cores/smbd /var/log/samba/cores/nmbd
ExecStart=/bin/chown root:root /var/log/samba/cores /var/log/samba/cores/smbd /var/log/samba/cores/nmbd
ExecStart=/bin/chmod 0700 /var/log/samba/cores /var/log/samba/cores/smbd /var/log/samba/cores/nmbd
ExecStart=/bin/mkdir -p /var/log/lightdm
ExecStart=/bin/chown root:root /var/log/lightdm
ExecStart=/bin/chmod 0711 /var/log/lightdm
ExecStart=/bin/chown root:root /var/log/lightdm
ExecStart=/bin/mkdir -p /var/log/sysstat
ExecStart=/bin/chown root:root /var/log/sysstat
ExecStart=/bin/chmod 0755 /var/log/sysstat
ExecStart=/bin/mkdir -p /var/log/apt
ExecStart=/bin/chown root:root /var/log/apt
ExecStart=/bin/chmod 0755 /var/log/apt
ExecStart=/bin/mkdir -p /var/log/unattended-upgrades
ExecStart=/bin/chown root:adm /var/log/unattended-upgrades
ExecStart=/bin/chmod 0750 /var/log/unattended-upgrades
ExecStart=/usr/bin/touch /var/log/lastlog
ExecStart=/bin/chown root:utmp /var/log/lastlog
ExecStart=/bin/chmod 0664 /var/log/lastlog


Last but not least, need to upload and enable this service:

- name: Install logdir service
    src: files/openhab-logdir.service
    dest: /etc/systemd/system/openhab-logdir.service
    owner: root
    group: root
    mode: 0664
  register: logdir_service

- name: Enable logdir service
    name: openhab-logdir.service
    enabled: yes
    state: started

- name: Restart logdir service
    name: openhab-logdir.service
    state: restarted
  when: logdir_service.changed

The name "logdir" is based on the first few lined, when I just tried to fix the openHAB log directory. More problems were found after that.



All in all this running fine for a few days already. I'm keeping an eye on the system, and at some point need to integrate it into my monitoring as well.


No Trackbacks


Display comments as Linear | Threaded

No comments

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
To leave a comment you must approve it via e-mail, which will be sent to your address after submission.
Form options