You might know that problem: the brand new SSD in your system is super fast, but after a good time using it, the card is dead. Unlike spinning disks, which usually fail over time, and show I/O errors by blocks, SSD cards are prone to a problem called Wear leveling. Blocks which are written more often will “wear out”, and become unresponsible. More writes increase this risk. And a typical openHAB system does a number of writes all the time: every time an external status changes, it’s written to the event log. By default the syslog is written to disk as well, and then there is a myriad of systemd services, writing status information into files.
For my first openHAB test installation I did not spent much attention to this problem. I however made sure that I installed everything using Ansible, and not in manual steps. So when the first card died, I was able to spin up the system again in a matter of hours (Ansible took over after the base system was initialized). Now it was time to do something against the “Wear leveling”.
After spending some time on research, I decided on 3 steps:
- Remove swapfiles: the Raspberry Pi has 1 GB of RAM, and does not swap
- Update the systemd journal configuration: make storage volatile, and reduce logfile size
- Move
/var/tmp
and/var/log
to a RAM disk: these two directories are hit most my I/O writes
Remove swapfiles
This part is fairly easy: openHABian has a package preinstalled which I just had to remove.
|
|
First I missed the purge=yes
line, but when I later inspected the logfiles, I found out that not purging but just deinstalling the package leaves the systemd service entry around - which then just throws more errors into the logfile.
Update systemd journal configuration
By default, the journal is written to disk. That can be changed to “volatile”, then it is kept in memory. I also reduced the size of the log - I’m mostly only interested in the last few entries anyway.
|
|
And the handler:
|
|
Move logs to a RAM disk
This was the complicated part, for several reasons. First of all, if everything lives on a RAM disk, the logs are gone once the system is rebooted. So it’s an assessment if I need persistent logs, or not. Looking back at my usage history of the logs, I found that again I’m mostly only interested in the most recent logs. And if something goes wrong with the system: I can just setup another one. I already proved that this works.
The second problem to take into account is the size of the RAM disk, or disks in my case. systemd is storing “stuff” in /var/tmp
, so I wanted to move that on a disk. After some observations, I found that the directory is never really big, so 10 MB should be sufficient.
The biggest problem is /var/log
, which can grow quite a bit. In my case it stays around 20-30 MB, until logrotate jumps in and cleans up. This can probably be tuned more, but I’m not really interested in too much fine tuning there. I decided on a 50 MB RAM disk for /var/log
.
Let’s create the /etc/fstab
entries:
|
|
When something is changed here, I just reboot to activate the changes. The handlers:
|
|
When the system came back online, everything seems to be working. On first sight. After checking the entire log since the reboot (the entries before that are no longer available), I found that a number of services reports problems. Mainly because files and directories in /var/log
are missing now.
To fix that problem, I wrote a quick systemd service, which is fired before affected services come up (the Before
line in the unit file), but fired after the RAM disk is mounted (the After
line in the unit file).
[Unit]
Description=Create logfile directory (on RAM fs)
After=var-log.mount
Before=openhab2.service samba.service nmbd.service smbd.service
[Service]
Type=oneshot
ExecStart=/bin/mkdir -p /var/log/openhab2
ExecStart=/bin/chown openhab:openhabian /var/log/openhab2
ExecStart=/bin/chmod 0775 /var/log/openhab2
ExecStart=/usr/bin/setfacl -m u::rwx,g::rwx,o::r-x /var/log/openhab2
ExecStart=/usr/bin/setfacl -d -m u::rwx,g::rwx,o::r-x /var/log/openhab2
ExecStart=/bin/mkdir -p /var/log/samba
ExecStart=/bin/chown root:adm /var/log/samba
ExecStart=/bin/chmod 0750 /var/log/samba
ExecStart=/bin/mkdir -p /var/log/samba/cores /var/log/samba/cores/smbd /var/log/samba/cores/nmbd
ExecStart=/bin/chown root:root /var/log/samba/cores /var/log/samba/cores/smbd /var/log/samba/cores/nmbd
ExecStart=/bin/chmod 0700 /var/log/samba/cores /var/log/samba/cores/smbd /var/log/samba/cores/nmbd
ExecStart=/bin/mkdir -p /var/log/lightdm
ExecStart=/bin/chown root:root /var/log/lightdm
ExecStart=/bin/chmod 0711 /var/log/lightdm
ExecStart=/bin/chown root:root /var/log/lightdm
ExecStart=/bin/mkdir -p /var/log/sysstat
ExecStart=/bin/chown root:root /var/log/sysstat
ExecStart=/bin/chmod 0755 /var/log/sysstat
ExecStart=/bin/mkdir -p /var/log/apt
ExecStart=/bin/chown root:root /var/log/apt
ExecStart=/bin/chmod 0755 /var/log/apt
ExecStart=/bin/mkdir -p /var/log/unattended-upgrades
ExecStart=/bin/chown root:adm /var/log/unattended-upgrades
ExecStart=/bin/chmod 0750 /var/log/unattended-upgrades
ExecStart=/usr/bin/touch /var/log/lastlog
ExecStart=/bin/chown root:utmp /var/log/lastlog
ExecStart=/bin/chmod 0664 /var/log/lastlog
RemainAfterExit=true
[Install]
WantedBy=default.target
Last but not least, need to upload and enable this service:
|
|
The name logdir
is based on the first few lined, when I just tried to fix the openHAB log directory. More problems were found after that.
Conclusion
All in all this running fine for a few days already. I’m keeping an eye on the system, and at some point need to integrate it into my monitoring as well.