Skip to content

PGSQL Phriday #001: Two truths and a lie

Uh, what is this #PGSQLPhriday thing about? Everyone is writing blog posts about this today ... For details, please read the introduction from Ryan Booz.

This month's topic: pick your three favorite or most important best pratices for #PostgreSQL. Tell them in a form that two are true, one is a lie. See if you can spot which one is true and which one is false. The resolution is at the end of this blog posting.

Let's dive in: These days it's rare that I start with a fresh PostgreSQL installation which is not in any way already modified. Whenever we install a new database at work, it's already coming with a templated configuration, backup (if necessary: despite running many hundred databases not all of them need a backup), monitoring, and pre-configured access. Even the database on my private laptop is configured using Ansible. But that's a good example by itself: what are the steps I find important enough to have a Playbook for. My top three list:

  1. Change a few most important configuration parameters, then restart the database
  2. Use pg_dump for backups
  3. Create necessary accounts

Which one is the lie?

 

The top three

Configuration Parameters

In a database which is not used as a production system (the one on my laptop, as example), there are only a few settings you want to change for decent performance. As example, the local database is used for a couple Django systems, like for developing features and patches for the PostgreSQL Europe conference system. It holds a copy of the contact database for the Regional Contact for Germany. It also holds databases for a couple private projects I'm working on. All in all nothing big, but still enough that I want to tune the database a bit. Here are the parameters I'm changing:

  • listen_addresses: changed to "localhost, 192.168.30.1", because I'm using virtual machines and make the database available to the network used for the VMs.
  • shared_buffers: changed to "256MB", this is enough for the projects I'm running, but your setup might vary.
  • work_mem: changed to "64MB", this usually results in better query performance, again a setting which depends on your applications.
  • random_page_cost: changed to "1.0" (same as seq_page_cost), my laptop has a fast SSD, and virtually any systems we use at work also use either SSD or NVRAM, or both. There is no difference between sequential access and random access anymore, therefore it doesn't make sense to have different values for both settings.
  • timezone: I like my stuff in my local time zone ('Europe/Berlin'), at least on my private computer. This makes it easy to use PostgreSQL as a world clock and quite often I use "SELECT '<date/time>'::TIMESTAMPTZ AT TIME ZONE '<insert another time zone here>'" to quickly figure out a time for someone else, like when I have to create an invite.
  • lc_*: There are a couple of LC settings, and I have them in either German or English:
    • lc_messages = 'en_US.UTF-8': all my devices are set to English language
    • lc_monetary = 'de_DE.UTF-8': but I mostly deal with German currency
    • lc_numeric = 'de_DE.UTF-8': and German numbers
    • lc_time = 'de_DE.UTF-8': and most importantly German times
  • include_dir: enabled, and set to 'conf.d', because all settings above are in conf.d/postgresql.deploy.conf, installed by Ansible

Backups

Most, but not all databases here have backups, and some kind of retention for the backups. There are a few people who will vigorously claim that pg_dump is not a backup tool. I beg to differ, and it works great. Gives you a consistent snapshot of your data which can be restored using either psql or pg_restore. Occasionally you also need pg_dumpall to dump the global objects, but in most of my cases these are installed using a Playbook or some other kind of automation or template, and can rather be restored using a different method.

Accounts

As mentioned in the backup section, most accounts both at work and on my private infrastructure are created using automation tools. This ensures that credentials are the way they are supposed to be, can be quickly added, replicated, or removed. For my private stuff I'm mostly using Ansible, and the very cool Community.Postgresql collection. In order to use the collection, the target host needs to have the psycopg2 libraries installed. This also happens using Ansible. The "Talks" section on this blog has the slides for one of my talks: "Managing PostgreSQL with Ansible", this talk explains how one can use Ansible to manage the credentials.

The Resolution

Did you spot which one is a lie? Well, it's hard, because it's only a half lie, but the backup section is it.

A backup using pg_dump might be sufficient for you. But in many cases, especially in business environments, a simple snapshot as it is created by pg_dump is not enough.

The very rich PostgreSQL ecosystem provides a couple of alternatives which offer seamless backup and point-in-time restore. To name a few:

Additionally, many environments not only require point-in-time backups and restore, but also high availability. There are great tools for this as well, but that is for another blog posting.

Thank You

I would like to thank Ryan Booz for organizing the #PGSQLPhriday initiative.

 

Trackbacks

No Trackbacks

Comments

Display comments as Linear | Threaded

No comments

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
To leave a comment you must approve it via e-mail, which will be sent to your address after submission.
Form options