You might know that problem: the brand new SSD in your system is super fast, but after a good time using it, the card is dead. Unlike spinning disks, which usually fail over time, and show I/O errors by blocks, SSD cards are prone to a problem called Wear leveling. Blocks which are written more often will “wear out”, and become unresponsible. More writes increase this risk. And a typical openHAB system does a number of writes all the time: every time an external status changes, it’s written to the event log. By default the syslog is written to disk as well, and then there is a myriad of systemd services, writing status information into files.
For my first openHAB test installation I did not spent much attention to this problem. I however made sure that I installed everything using Ansible, and not in manual steps. So when the first card died, I was able to spin up the system again in a matter of hours (Ansible took over after the base system was initialized). Now it was time to do something against the “Wear leveling”.
After spending some time on research, I decided on 3 steps:
- Remove swapfiles: the Raspberry Pi has 1 GB of RAM, and does not swap
- Update the systemd journal configuration: make storage volatile, and reduce logfile size
/var/logto a RAM disk: these two directories are hit most my I/O writes
This part is fairly easy: openHABian has a package preinstalled which I just had to remove.
First I missed the
purge=yes line, but when I later inspected the logfiles, I found out that not purging but just deinstalling the package leaves the systemd service entry around - which then just throws more errors into the logfile.
Update systemd journal configuration
By default, the journal is written to disk. That can be changed to “volatile”, then it is kept in memory. I also reduced the size of the log - I’m mostly only interested in the last few entries anyway.
And the handler:
Move logs to a RAM disk
This was the complicated part, for several reasons. First of all, if everything lives on a RAM disk, the logs are gone once the system is rebooted. So it’s an assessment if I need persistent logs, or not. Looking back at my usage history of the logs, I found that again I’m mostly only interested in the most recent logs. And if something goes wrong with the system: I can just setup another one. I already proved that this works.
The second problem to take into account is the size of the RAM disk, or disks in my case. systemd is storing “stuff” in
/var/tmp, so I wanted to move that on a disk. After some observations, I found that the directory is never really big, so 10 MB should be sufficient.
The biggest problem is
/var/log, which can grow quite a bit. In my case it stays around 20-30 MB, until logrotate jumps in and cleans up. This can probably be tuned more, but I’m not really interested in too much fine tuning there. I decided on a 50 MB RAM disk for
Let’s create the
When something is changed here, I just reboot to activate the changes. The handlers:
When the system came back online, everything seems to be working. On first sight. After checking the entire log since the reboot (the entries before that are no longer available), I found that a number of services reports problems. Mainly because files and directories in
/var/log are missing now.
To fix that problem, I wrote a quick systemd service, which is fired before affected services come up (the
Before line in the unit file), but fired after the RAM disk is mounted (the
After line in the unit file).
[Unit] Description=Create logfile directory (on RAM fs) After=var-log.mount Before=openhab2.service samba.service nmbd.service smbd.service [Service] Type=oneshot ExecStart=/bin/mkdir -p /var/log/openhab2 ExecStart=/bin/chown openhab:openhabian /var/log/openhab2 ExecStart=/bin/chmod 0775 /var/log/openhab2 ExecStart=/usr/bin/setfacl -m u::rwx,g::rwx,o::r-x /var/log/openhab2 ExecStart=/usr/bin/setfacl -d -m u::rwx,g::rwx,o::r-x /var/log/openhab2 ExecStart=/bin/mkdir -p /var/log/samba ExecStart=/bin/chown root:adm /var/log/samba ExecStart=/bin/chmod 0750 /var/log/samba ExecStart=/bin/mkdir -p /var/log/samba/cores /var/log/samba/cores/smbd /var/log/samba/cores/nmbd ExecStart=/bin/chown root:root /var/log/samba/cores /var/log/samba/cores/smbd /var/log/samba/cores/nmbd ExecStart=/bin/chmod 0700 /var/log/samba/cores /var/log/samba/cores/smbd /var/log/samba/cores/nmbd ExecStart=/bin/mkdir -p /var/log/lightdm ExecStart=/bin/chown root:root /var/log/lightdm ExecStart=/bin/chmod 0711 /var/log/lightdm ExecStart=/bin/chown root:root /var/log/lightdm ExecStart=/bin/mkdir -p /var/log/sysstat ExecStart=/bin/chown root:root /var/log/sysstat ExecStart=/bin/chmod 0755 /var/log/sysstat ExecStart=/bin/mkdir -p /var/log/apt ExecStart=/bin/chown root:root /var/log/apt ExecStart=/bin/chmod 0755 /var/log/apt ExecStart=/bin/mkdir -p /var/log/unattended-upgrades ExecStart=/bin/chown root:adm /var/log/unattended-upgrades ExecStart=/bin/chmod 0750 /var/log/unattended-upgrades ExecStart=/usr/bin/touch /var/log/lastlog ExecStart=/bin/chown root:utmp /var/log/lastlog ExecStart=/bin/chmod 0664 /var/log/lastlog RemainAfterExit=true [Install] WantedBy=default.target
Last but not least, need to upload and enable this service:
logdir is based on the first few lined, when I just tried to fix the openHAB log directory. More problems were found after that.
All in all this running fine for a few days already. I’m keeping an eye on the system, and at some point need to integrate it into my monitoring as well.