Restic backup
Was asked quite a few times how I do my backups with Restic.
For more than 10 years I was using "Duplicity" for backups, but in 2019 I changed to Restic. The main reason for the change was that Duplicity still can't handle "Big Data", as in: larger directories. In 2009 someone opened an issue on the Duplicity bugtracker, and this problem still exists as of today. For about two years I was shifting around the problem, excluding files, trying to make the sigfile smaller. But at some point I decided that it is enough and I need to change the tool.
Duplicity knows two backup modes: "full backup" and "incremental backup". Once in a while you take a full backup, and then you add incremental backups to that full backup. In order to restore a certain backup you need the full backup and the incremental backups. Therefore my go-to mode was to always have two full backups and a couple incremental backups in-between. Even if something goes wrong with the latest full backup, I can still go back to the previous full backup (of course with some changes lost, but that's still better than nothing). When taking a new full backup, the oldest one is only deleted when the new one is completed. Accordingly when a new incremental backup is created, it's a new set of files. Removing the backup removes all the files from this incremental backup. That worked well, but needed scheduling. Over time I wrote a wrapper script around Duplicity, which did schedule new full and incremental backups.
Restic works in a different way. There is no concept of "full backup" and "incremental backup". Basically every backup is a full backup, and Restic figures out which files changed, got deleted, or added. Also it does deduplication: if files are moved around, or appear multiple times, they are not added multiple times into the backup. Deduplication is something which Duplicity can't do. But because Restic can do deduplication, there is no common set of files which belong to a single snapshot. Data blobs from one backup can stay in the repository forever, removing snapshots might not remove any files at all.
Restic on the other hand needs "prune" to remove old data. A snapshot can be removed according to the policy specified, but this does not remove the data from the backup directory. A "prune" run will go over the data and remove any block which is no longer needed.
My first question - after figuring out which other backup tool to use: shall I replicate the wrapper script, or try something else? Given that the backup doesn't need complex scheduling, I decided against writing a complex wrapper. And since I am now deploying all devices with Ansible, I decided to integrate this into my Playbooks, and deploy a set of shell scripts. The goal was to have a small number of dedicated scripts doing the daily backup work, and another set of "helper" scripts which I can use to inspect the backup, modify it, or restore something.
My main goals for this: "small number of programs/scripts" (Unix style: each tool does one job), "rapid development" (don't spend weeks writing another scheduler), "rapid deployment" (re-run Playbooks and let Ansible deploy this to all devices).
Continue reading "Restic backup"