Skip to content

PGSQL Phriday #002: PostgreSQL Backup and Restore

Thanks to Ryan Booz we now have the #PGSQLPhriday blogging series. The second edition is about backups. And of course also about restore.

Backups? Do you need that?

If your data is not important, you certainly don't need to bother about backups. Then again, why do you even store the data in the first place?

For anyone else, especially everyone who runs business critical or mission critical applications, having a backup is important. Equally important, but sometimes ignored, is the ability to restore the backup.

The requirements for backups are manifold:

  • How quickly must you be able to restore the backup (SLA)?
  • Do you need to be able to restore every transaction (catch all changes) or is a snapshot (backup freuency, hourly, daily, weekly, monthly) sufficient?
  • How can the backup integrity be verified? 
  • Where do store the backup, and make sure that a disaster (as example: lost of data center) does not affect the backup?
  • Who can restore the backup, what is the plan for that?
  • ...

Your #PGSQLPhriday task

Describe how you do backups for your PostgreSQL databases.

Which tool(s) are you using, where do you store backups, how often do you do backups?
Are there any recommendations you can give the reader how to improve their backups?
Any lesser known features in your favorite backup tool?
Any new and cool features in a recently released version?

Bonus question: Is pg_dump a backup tool?

Continue reading "PGSQL Phriday #002: PostgreSQL Backup and Restore"

Changes to the public schema in PostgreSQL 15 and how to handle upgrades

In September 2021, a patch for the upcoming PostgreSQL version 15 was committed which introduces a visible change for users: the CREATE privilege for the public schema is no longer set by default. This is a recommendation from CVE-2018-1058.

What does that mean for the average (non-superuser) user?

In PostgreSQL 14:

postgres=# SELECT version();
                                                version                                                
-------------------------------------------------------------------------------------------------------
 PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

postgres=# CREATE ROLE unprivileged WITH LOGIN;
CREATE ROLE

postgres=# CREATE DATABASE priv_test;
CREATE DATABASE

postgres=# \c priv_test 
You are now connected to database "priv_test" as user "ads".

priv_test=# \dn+ public
                       List of schemas
  Name  | Owner | Access privileges |      Description       
--------+-------+-------------------+------------------------
 public | ads   | ads=UC/ads       +| standard public schema
        |       | =UC/ads           | 
(1 row)

We see that PUBLIC (the second line in Access privileges) has USAGE (U) and CREATE (C) privileges. A regular user can create a table in the public schema:

priv_test=# SET SESSION ROLE unprivileged;
SET
priv_test=> SHOW search_path;
   search_path   
-----------------
 "$user", public
(1 row)

priv_test=> CREATE TABLE priv_test (id INT);
CREATE TABLE
priv_test=> \dp priv_test 
                               Access privileges
 Schema |   Name    | Type  | Access privileges | Column privileges | Policies 
--------+-----------+-------+-------------------+-------------------+----------
 public | priv_test | table |                   |                   | 
(1 row)

 

This is how the public schema privileges look in PostgreSQL 15:

postgres=# SELECT version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 15beta3 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

postgres=# CREATE ROLE unprivileged WITH LOGIN;
CREATE ROLE

postgres=# CREATE DATABASE priv_test;
CREATE DATABASE

postgres=# \c priv_test 
You are now connected to database "priv_test" as user "ads".

priv_test=# \dn+ public
                                       List of schemas
  Name  |       Owner       |           Access privileges            |      Description       
--------+-------------------+----------------------------------------+------------------------
 public | pg_database_owner | pg_database_owner=UC/pg_database_owner+| standard public schema
        |                   | =U/pg_database_owner                   | 
(1 row)

The C (CREATE) is missing. And the CREATE TABLE for a regular user is no longer working by default:

priv_test=# SET SESSION ROLE unprivileged;
SET
priv_test=> SHOW search_path;
   search_path   
-----------------
 "$user", public
(1 row)

priv_test=> CREATE TABLE priv_test (id INT);
ERROR:  permission denied for schema public
LINE 1: CREATE TABLE priv_test (id INT);

But how is this change handled in upgrades?

 

 

Continue reading "Changes to the public schema in PostgreSQL 15 and how to handle upgrades"

PostgreSQL Upgrades are hard!

Together with Lætitia Avrot and Nikolay Samokhvalov I was invited to participate in a Community Panel (YouTube video) about PostgreSQL Upgradability at Postgres Vision 2022. The panel was moderated by Bruce Momjian and initiated and organized by Jimmy Angelakos. Bruce did talk with each of us before, which helped a lot to guide the discussion in the right direction. The recording of the panel discussion is available on the Postgres Vision website.

During this panel each of us provided examples for how easy or complicated PostgreSQL upgrades still are.
 

 

Continue reading "PostgreSQL Upgrades are hard!"

Delete directories recursive on Hetzner Storage Box

Among other external solutions, I store some data on Storage Boxes from Hetzner. The Storage Box allows you to have sub-accounts, so for every server and system storing data there, I use a separate account. For each sub-account, one can select a subdirectory where the data is stored, and the sub-account then can only see this data. The Admin account can see all data, and see all directories.

The usual way I access the Storage Box from other systems is by using sftp protocol from ssh (don't confuse this with the "other" sftp). That's all good, until I remove a sub-account and want to delete the subdirectory with the data. The server doesn't know "rm -r" for recursive deletion, which means I have to traverse into every directory, delete all files, then delete the empty directories. And the encrypted backup I'm using is creating plenty of subdirectories.

Or I find a better tool.

 

Continue reading "Delete directories recursive on Hetzner Storage Box"

Restic backup

Was asked quite a few times how I do my backups with Restic.

For more than 10 years I was using "Duplicity" for backups, but in 2019 I changed to Restic. The main reason for the change was that Duplicity still can't handle "Big Data", as in: larger directories. In 2009 someone opened an issue on the Duplicity bugtracker, and this problem still exists as of today. For about two years I was shifting around the problem, excluding files, trying to make the sigfile smaller. But at some point I decided that it is enough and I need to change the tool.

Duplicity knows two backup modes: "full backup" and "incremental backup". Once in a while you take a full backup, and then you add incremental backups to that full backup. In order to restore a certain backup you need the full backup and the incremental backups. Therefore my go-to mode was to always have two full backups and a couple incremental backups in-between. Even if something goes wrong with the latest full backup, I can still go back to the previous full backup (of course with some changes lost, but that's still better than nothing). When taking a new full backup, the oldest one is only deleted when the new one is completed. Accordingly when a new incremental backup is created, it's a new set of files. Removing the backup removes all the files from this incremental backup. That worked well, but needed scheduling. Over time I wrote a wrapper script around Duplicity, which did schedule new full and incremental backups.

Restic works in a different way. There is no concept of "full backup" and "incremental backup". Basically every backup is a full backup, and Restic figures out which files changed, got deleted, or added. Also it does deduplication: if files are moved around, or appear multiple times, they are not added multiple times into the backup. Deduplication is something which Duplicity can't do. But because Restic can do deduplication, there is no common set of files which belong to a single snapshot. Data blobs from one backup can stay in the repository forever, removing snapshots might not remove any files at all.

Restic on the other hand needs "prune" to remove old data. A snapshot can be removed according to the policy specified, but this does not remove the data from the backup directory. A "prune" run will go over the data and remove any block which is no longer needed.

My first question - after figuring out which other backup tool to use: shall I replicate the wrapper script, or try something else? Given that the backup doesn't need complex scheduling, I decided against writing a complex wrapper. And since I am now deploying all devices with Ansible, I decided to integrate this into my Playbooks, and deploy a set of shell scripts. The goal was to have a small number of dedicated scripts doing the daily backup work, and another set of "helper" scripts which I can use to inspect the backup, modify it, or restore something.

My main goals for this: "small number of programs/scripts" (Unix style: each tool does one job), "rapid development" (don't spend weeks writing another scheduler), "rapid deployment" (re-run Playbooks and let Ansible deploy this to all devices).

 

Continue reading "Restic backup"

Restic upgrade on Debian Buster

A while ago I switched backups from "Duplicity" to "Restic". About time: I was using Duplicity for many years (I think I started using it around 2010, long before "Restic" became available) and it served me well. But recently I ran into more and more issues, especially with archives getting larger and larger. There is an 11 years old open bug in the Duplicity bugtracker, which describes a showstopper for backing up larger archives. And it doesn't look like this will be solved anytime soon. Therefore it was time for something new.

Since I'm rolling out my backups with Ansible, it was relatively easy to create a set of scripts for Restic which use almost the same infrastructure as the old Duplicity backups. That works as expected on all our laptops. But the Raspberry Pi, which does the fileserver backups, seem to had a problem. Backups took way longer than before, jumped from 30-60 minutes (depending on the amount of changes) to constantly around 10 hours.

After some investigation (means: --verbose --verbose --verbose debugging), it turns out that Restic identifies most of the files as new, even though they did not change at all. Some background information: the Raspberry mounts the QNAP fileserver using the SMB3 protocol. The "mount -t cifs" uses the "serverino" option, but apparently that is not enough to provide a stable inode number. And if the inode for a file changes, Restic assumes it is a new file.

On the bright side, because the content of the files do not change, the deduplication still works, and no additional content is added to the backup. The size of the backup does not increase. Still, Restic fetches all the data from the server, and that takes a long time.

 

Continue reading "Restic upgrade on Debian Buster"

How to reset your KDE (without deleting everything else)

I'm a (more or less) happy KDE user, ever since the KDE 3 days. Before that, I used fvwm2 for a long time, but that is a different story. It also happens that I never really reinstalled my home directory - the oldest files I can find are from 1997, and that is pretty much when I switched from an old Slackware system with self-compiled updates, to something with a more modern distribution. That means, that all the time from 1997 to today, I carry the same /home/ads across my computers. The home directory grew from a few MB to 133 GB today (maybe I should clean it up, but then again it's cheaper to buy a bigger harddisk).

It also means, that I never deleted my KDE config, even when upgrading to KDE 4 or Plasma.

 

Continue reading "How to reset your KDE (without deleting everything else)"

How I do backup of my mobile devices

Recently I was asked how I do backups of my mobile devices. The discussion started when I told that I "survived" a bricked device without data loss.

Disclaimer: I work (well, was working, before we got spinned of into a subsidiary) for EMC. Part of EMCs portfolio is "backup" and for sure I learned a lot from my employer. All my own devices and servers are backed up, some of them multiple times. Backups, devices and communication is encrypted. I'm not using public services (Dropbox, Drive, ...) for the backups.

So, what happened? One Friday afternoon, my Samsung S3 bricked itself. It was connected to the charger on my desk, the display flashed for a moment and then the device was dead. Reset, reboot, remove battery - nothing helped. No access to the data on the device.


Continue reading "How I do backup of my mobile devices"

Die Wolke hat es schon wieder getan

Einen Anbieter zentralisierter Dienste habe ich in meinem letzten Posting über Probleme mit genau diesen Diensten übersehen: RIM - Research in Motion. Die Firma, die den BlackBerry herstellt, jenes Gerät, dass Zugang zu den eigenen E-Mails von überallher verspricht.

Um dies zu gewährleisten, hat sich RIM quasi unentbehrlich gemacht: die Kommunikation des BlackBerry läuft über Server von RIM, die Nachrichten werden dort passend aufbereitet und auf das Gerät geschoben (Push, statt dem sonst üblichen Abrufen/Pollen der Nachrichten).

Nun hat so ein Konzept eine große Schwachstelle: den Anbieter. Dumm halt, wenn genau dort nicht genügend Kapazitäten zur Verfügung stehen oder irgendwelche Server ausfallen. Dann ist man offline, trotz all der großmundigen Versprechen des verkäufers.

Selbiges ist T-Mobile in den USA passiert, selbiges ist Palm mit dem supertollen Backup-Konzept des Palm Pre passiert. Und nun hat es zum wiederholten Male auch RIM erwischt und die Nutzer der mobilen Endgeräte standen mit einem Stück wenig sinnvollen Elektronikschrott da. Zum Glück ist in das Gerät zusätzlich ein Mobiltelefon eingebaut, so dass man nun wenigstens noch anrufen kann.

Daran zeigt sich wieder einmal, dass Anbieter von zentralisierten Diensten damit zwar gutes Geld verdienen wollen, die Kunden aber um solche Konzepte tunlichst einen sehr weiten Bogen machen sollten.