Uh, what is this #PGSQLPhriday thing about? Everyone is writing blog posts about this today … For details, please read the introduction from Ryan Booz.
This month’s topic: pick your three favorite or most important best pratices for #PostgreSQL. Tell them in a form that two are true, one is a lie. See if you can spot which one is true and which one is false. The resolution is at the end of this blog posting.
Let’s dive in: These days it’s rare that I start with a fresh PostgreSQL installation which is not in any way already modified. Whenever we install a new database at work, it’s already coming with a templated configuration, backup (if necessary: despite running many hundred databases not all of them need a backup), monitoring, and pre-configured access. Even the database on my private laptop is configured using Ansible. But that’s a good example by itself: what are the steps I find important enough to have a Playbook for. My top three list:
- Change a few most important configuration parameters, then restart the database
- Use
pg_dump
for backups - Create necessary accounts
Which one is the lie?
The top three
Configuration Parameters
In a database which is not used as a production system (the one on my laptop, as example), there are only a few settings you want to change for decent performance. As example, the local database is used for a couple Django systems, like for developing features and patches for the PostgreSQL Europe conference system. It holds a copy of the contact database for the Regional Contact for Germany. It also holds databases for a couple private projects I’m working on. All in all nothing big, but still enough that I want to tune the database a bit. Here are the parameters I’m changing:
listen_addresses
: changed to “localhost, 192.168.30.1
”, because I’m using virtual machines and make the database available to the network used for the VMs.shared_buffers
: changed to “256MB
”, this is enough for the projects I’m running, but your setup might vary.work_mem
: changed to"64MB"
, this usually results in better query performance, again a setting which depends on your applications.random
_page_cost
: changed to1.0
(same asseq_page_cost
), my laptop has a fast SSD, and virtually any systems we use at work also use either SSD or NVRAM, or both. There is no difference between sequential access and random access anymore, therefore it doesn’t make sense to have different values for both settings.timezone
: I like my stuff in my local time zone (Europe/Berlin
), at least on my private computer. This makes it easy to use PostgreSQL as a world clock and quite often I useSELECT '<date/time>'::TIMESTAMPTZ AT TIME ZONE '<insert another time zone here>'
to quickly figure out a time for someone else, like when I have to create an invite.lc_*
: There are a couple of LC settings, and I have them in either German or English:lc_messages = 'en_US.UTF-8'
: all my devices are set to English languagelc_monetary = 'de_DE.UTF-8'
: but I mostly deal with German currencylc_numeric = 'de_DE.UTF-8'
: and German numberslc_time = 'de_DE.UTF-8'
: and most importantly German times
include_dir
: enabled, and set to'conf.d'
, because all settings above are inconf.d/postgresql.deploy.conf
, installed by Ansible
Backups
Most, but not all databases here have backups, and some kind of retention for the backups. There are a few people who will vigorously claim that pg_dump is not a backup tool. I beg to differ, and it works great. Gives you a consistent snapshot of your data which can be restored using either psql or pg_restore. Occasionally you also need pg_dumpall to dump the global objects, but in most of my cases these are installed using a Playbook or some other kind of automation or template, and can rather be restored using a different method.
Accounts
As mentioned in the backup section, most accounts both at work and on my private infrastructure are created using automation tools. This ensures that credentials are the way they are supposed to be, can be quickly added, replicated, or removed. For my private stuff I’m mostly using Ansible, and the very cool Community.Postgresql collection. In order to use the collection, the target host needs to have the psycopg2 libraries installed. This also happens using Ansible. The “Talks” section on this blog has the slides for one of my talks: “Managing PostgreSQL with Ansible”, this talk explains how one can use Ansible to manage the credentials.
The Resolution
Did you spot which one is a lie? Well, it’s hard, because it’s only a half lie, but the backup section is it.
A backup using pg_dump
might be sufficient for you. But in many cases, especially in business environments, a simple snapshot as it is created by pg_dump
is not enough.
The very rich PostgreSQL ecosystem provides a couple of alternatives which offer seamless backup and point-in-time restore. To name a few:
- pgBackRest
- Barman
- WAL-G
- and others …
Additionally, many environments not only require point-in-time backups and restore, but also high availability. There are great tools for this as well, but that is for another blog posting.
Thank You
I would like to thank Ryan Booz for organizing the #PGSQLPhriday initiative.