Skip to content

PGSQL Phriday #008 – pg_stat_statements

The topic for this month's PGSQL Phriday blogging challenge is: pg_stat_statement. And Michael Christofides gave me a perfect opener in his invitation.

For anyone who doesn't know, I'm running a weekly interview series with people from the PostgreSQL community. It's called "PostgreSQL Person of the Week". One of the questions in the default set I give everyone is:

What is your favorite PostgreSQL extension?

And guess what the answer is: by far everyone's favorite is pg_stat_statements!

What does this extension do? It tracks statistics for planning and execution for queries run by users. This can be used to find long-running queries, users who run too many or too heavy queries, or just generate statistics about the workload. In short: very useful data. And the extension itself does not need a lot of resources. Even better.

This extension is so popular that it has double the interview mentions than the next one (which is PostGIS - by itself also a very popular extension). From the slides I occasionally present at conferences or meetups:

PostgreSQL has a lot of extensions, head over to the PostgreSQL Extension network (PGXN) which is operated by David E. Wheeler to find out about around 360 extensions. Currently in my interviews I have 51 different extensions mentioned.

This extension is so useful that people say:

  • Julia Gugel: "I like pg_stat_statements as it helps a lot with performance troubleshooting."
  • Daniel Westermann: "pg_stat_statements, because it is just required if you want to troubleshoot performance related issues. I still wonder why it comes as an extension and is not there by default."
  • Lætitia Avrot: "Of course, I advise my customers to use pg_stat_statements to monitor their performance"
  • Alexander Kukushkin: "The pg_stat_statements extension is something that everyone must enable for performance monitoring and troubleshooting."
  • Flavio Gurgel: "I cannot live without pg_stat_statements, I think it’s mandatory for server optimisation."
  • Anthony Nowocien: "pg_stat_statements and I will be glad to see it in core."

This is just a small selection of quotes, but this shows that everyone loves pg_stat_statements. I encourage you to head over and read more interviews. There's plenty of insight from community members.

PGSQL Phriday #007 – Triggers for tracking changes in a table

This month's #PGSQLPhriday is hosted by Lætita Avrot, and she asks about triggers.

History time. Shortly after I started using PostgreSQL, I had a need to track changes in tables. Back then - this was the early version 7.x days - there was no such option available. I set out to write a tool for it. The logical choice to do that was to pick triggers to implement this. Today the world is different, PostgreSQL gained replication, and along with this, one can hook tools into the replication and stream all the changes. Back then there was no replication.

The second thing I discovered was that pl/pgSQL can't do the job. That was a rather big disappointment. My idea was to have one function which can be used with a trigger, the function figures out the columns and writes the changes into a separate table. However one can't access arbitrary column names in NEW and OLD in pl/pgSQL trigger functions. Something like this doesn't work:

columnname1 := "created_at";
columnname2 := "changed_at";

NEW.$columnname1 := OLD.$columnname1;
NEW.$columnname2 := OLD.$columnname2;

In pl/pgSQL, you have to "know" the column names in advance. Depesz recently posted some workarounds for this, but these options also have not been available back then.

Which made me write the tool in C. This at least allowed me to access the NEW and OLD values, and recoed changes. The tool is called "table_log", I also presented it at the first PGDay in Prato, Italy in 2007, and originally it was hosted on pgfoundry. This site is also long gone, I later copied the code to GitHub. But I also know that PostgreSQL 9.x had some internal changes which renders the tool non-working. However because the entire ecosystem had improved, and other tools are available, I did not update the old code anymore.

My conclusion: Triggers were one of the first "advanced" features I used in PostgreSQL, and I like them very much. They allowed me to implement an audit feature I need.

PGSqlPhriday #006: One Thing You Wish You Knew While Learning PostgreSQL: psql commands

For this month's #PGSQLPhridayGrant Fritchey asks: What is the one thing you wish you knew while you learn PostgreSQL.

My preferred client for PostgreSQL is psql, and while it is very powerful I like a few features a most:

  • \timing
  • \watch

Both are internal commands in psql, \timing is there for a very long time, but got improved at some point. \watch came later, but is also there for a couple years now.

 

Continue reading "PGSqlPhriday #006: One Thing You Wish You Knew While Learning PostgreSQL: psql commands"

PostgreSQL 95

Someone at FOSDEM 2023 asked the question: "What happens when PostgreSQL rolls over the version number to 95? Will this cause problems like back then in Windows?"

What does that even mean? When Microsoft released the version after Windows 98, they opted for naming it Windows 10, not Windows 9. Because apparently a lot of code out there checks if the string starts with "Windows 9" and then assumes that the OS is one of the very old ones. This might not be the only such problem, as another blog article by Microsoft shows. Apparently they used "3.95" for the Windows 95 internal version, because lazy programmers.

You may also remember that there is already a "Postgres95", released around 1994 (shortly before Windows 95 was released). This was before the name was "PostgreSQL", and this is the first version which had SQL support. Before that, QUEL was the query language.

Enough history. And even though the release of PostgreSQL 95 is still a few years out, this raises two questions:

  1. Does today's source code work (aka: compile, and pass tests) with version number 95?
  2. Does your application work against version PostgreSQL 95?

 

Continue reading "PostgreSQL 95"

Relational and Non-relational Data: PGSQL Phriday #005

Ryan Lambert asks in this month's PGSQL Phriday:

  • What non-relational data do you store in Postgres and how do you use it?
  • Have you attempted non-relational uses of Postgres that did not work well? What was the problem?
  • What are the biggest challenges with your data, whatever its structure?
  • Bonus: How do you define non-relational data?

Looking over the many databases we operate at work, I can say that we are mostly a "relational shop". Almost every database we have uses relational structures, and does not attempt to go in on "full NoSQL", or "store non-relational data". Many of the databases are on the larger size, often in the hundreds of GB or even several TBs. We have good people in our dev teams who understand both PostgreSQL and SQL, and can develop a proper schema for the data, and write performant queries.

There are a few exceptions though, but at least for us it's only a small number. Whenever we retrieve or get data which comes in from web requests, we may store this as JSON, and then we use JSONb. Quite often however it's rather "storage as JSON", not "querying the JSON in the database". It's just more convenient to keep the data in JSON format all the way, instead of transforming it several times.

 

Continue reading "Relational and Non-relational Data: PGSQL Phriday #005"

PGSQL Phriday #004: PostgreSQL and Software Development

Henrietta Dombrovskaya asks about Software Development and PostgreSQL. That's a very broad topic.

Not being a big developer for myself, and only occasionally submitting patches here and there, I nevertheless have a vast amount of data available on the topic: the "PostgreSQL Person of the Week" interviews! There are a few questions in the interviews which relate to this topic:

  • Could you describe your PostgreSQL development toolbox?
  • Which skills are a must have for a PostgreSQL developer/user?
  • Do you use any git best practices, which makes working with PostgreSQL easier?
  • Do you think PostgreSQL has a high entry barrier?
  • What is your advice for people who want to start PostgreSQL developing - as in, contributing to the project. Where and how should they start?

And having 143 published interviews at this point, there is really interesting data about this topic. I picked two points which aim directly at developers:

 

 

Continue reading "PGSQL Phriday #004: PostgreSQL and Software Development"

PGSQL Phriday #003: What is the PostgreSQL Community to me

Pat Wright is running this month’s #PGSQLPhriday, and the topic is “What is the PostgreSQL Community to you?

The answer for me is very versatile, and covers quite a lot. Took me a moment to gather the facets, and likely I’m still missing some parts.
 

 

 

Continue reading "PGSQL Phriday #003: What is the PostgreSQL Community to me"

PGSQL Phriday #002: PostgreSQL Backup and Restore

Thanks to Ryan Booz we now have the #PGSQLPhriday blogging series. The second edition is about backups. And of course also about restore.

Backups? Do you need that?

If your data is not important, you certainly don't need to bother about backups. Then again, why do you even store the data in the first place?

For anyone else, especially everyone who runs business critical or mission critical applications, having a backup is important. Equally important, but sometimes ignored, is the ability to restore the backup.

The requirements for backups are manifold:

  • How quickly must you be able to restore the backup (SLA)?
  • Do you need to be able to restore every transaction (catch all changes) or is a snapshot (backup freuency, hourly, daily, weekly, monthly) sufficient?
  • How can the backup integrity be verified? 
  • Where do store the backup, and make sure that a disaster (as example: lost of data center) does not affect the backup?
  • Who can restore the backup, what is the plan for that?
  • ...

Your #PGSQLPhriday task

Describe how you do backups for your PostgreSQL databases.

Which tool(s) are you using, where do you store backups, how often do you do backups?
Are there any recommendations you can give the reader how to improve their backups?
Any lesser known features in your favorite backup tool?
Any new and cool features in a recently released version?

Bonus question: Is pg_dump a backup tool?

Continue reading "PGSQL Phriday #002: PostgreSQL Backup and Restore"

PGSQL Phriday #001: Two truths and a lie

Uh, what is this #PGSQLPhriday thing about? Everyone is writing blog posts about this today ... For details, please read the introduction from Ryan Booz.

This month's topic: pick your three favorite or most important best pratices for #PostgreSQL. Tell them in a form that two are true, one is a lie. See if you can spot which one is true and which one is false. The resolution is at the end of this blog posting.

Let's dive in: These days it's rare that I start with a fresh PostgreSQL installation which is not in any way already modified. Whenever we install a new database at work, it's already coming with a templated configuration, backup (if necessary: despite running many hundred databases not all of them need a backup), monitoring, and pre-configured access. Even the database on my private laptop is configured using Ansible. But that's a good example by itself: what are the steps I find important enough to have a Playbook for. My top three list:

  1. Change a few most important configuration parameters, then restart the database
  2. Use pg_dump for backups
  3. Create necessary accounts

Which one is the lie?

 

Continue reading "PGSQL Phriday #001: Two truths and a lie"

Changes to the public schema in PostgreSQL 15 and how to handle upgrades

In September 2021, a patch for the upcoming PostgreSQL version 15 was committed which introduces a visible change for users: the CREATE privilege for the public schema is no longer set by default. This is a recommendation from CVE-2018-1058.

What does that mean for the average (non-superuser) user?

In PostgreSQL 14:

postgres=# SELECT version();
                                                version                                                
-------------------------------------------------------------------------------------------------------
 PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

postgres=# CREATE ROLE unprivileged WITH LOGIN;
CREATE ROLE

postgres=# CREATE DATABASE priv_test;
CREATE DATABASE

postgres=# \c priv_test 
You are now connected to database "priv_test" as user "ads".

priv_test=# \dn+ public
                       List of schemas
  Name  | Owner | Access privileges |      Description       
--------+-------+-------------------+------------------------
 public | ads   | ads=UC/ads       +| standard public schema
        |       | =UC/ads           | 
(1 row)

We see that PUBLIC (the second line in Access privileges) has USAGE (U) and CREATE (C) privileges. A regular user can create a table in the public schema:

priv_test=# SET SESSION ROLE unprivileged;
SET
priv_test=> SHOW search_path;
   search_path   
-----------------
 "$user", public
(1 row)

priv_test=> CREATE TABLE priv_test (id INT);
CREATE TABLE
priv_test=> \dp priv_test 
                               Access privileges
 Schema |   Name    | Type  | Access privileges | Column privileges | Policies 
--------+-----------+-------+-------------------+-------------------+----------
 public | priv_test | table |                   |                   | 
(1 row)

 

This is how the public schema privileges look in PostgreSQL 15:

postgres=# SELECT version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 15beta3 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

postgres=# CREATE ROLE unprivileged WITH LOGIN;
CREATE ROLE

postgres=# CREATE DATABASE priv_test;
CREATE DATABASE

postgres=# \c priv_test 
You are now connected to database "priv_test" as user "ads".

priv_test=# \dn+ public
                                       List of schemas
  Name  |       Owner       |           Access privileges            |      Description       
--------+-------------------+----------------------------------------+------------------------
 public | pg_database_owner | pg_database_owner=UC/pg_database_owner+| standard public schema
        |                   | =U/pg_database_owner                   | 
(1 row)

The C (CREATE) is missing. And the CREATE TABLE for a regular user is no longer working by default:

priv_test=# SET SESSION ROLE unprivileged;
SET
priv_test=> SHOW search_path;
   search_path   
-----------------
 "$user", public
(1 row)

priv_test=> CREATE TABLE priv_test (id INT);
ERROR:  permission denied for schema public
LINE 1: CREATE TABLE priv_test (id INT);

But how is this change handled in upgrades?

 

 

Continue reading "Changes to the public schema in PostgreSQL 15 and how to handle upgrades"

2 + 2 * 4 = 16?

Jens Kubieziel recently posted a Twitter poll: 2 + 2 * 4

The available options:

  • 16
  • 15
  • 14
  • 13

Notably absent is the correct result (at least for our ordinary mathematics). This poll is a No-Win Situation: you can't win, because the correct result is not available as an option.

Nevertheless I had an idea, but for this I had to wait until the poll closes.

2 + 2 <strong> 4 = 16

 

Continue reading "2 + 2 * 4 = 16?"

ctid and other PostgreSQL table internals

While attending a conference (yay) the discussion turned to system columns, and especially ctid. This was in context of backups, and the physical position of data on disk. The question was if that can be used to determine if an incremental backup is required, and if it is enough to just copy that field where the ctid points to. Not quite, but I’d like to use this question to describe the system columns more in detail.

In a PostgreSQL table one may find - depending on the server version - the following additional system columns:

  • tableoid
  • xmin
  • xmax
  • cmin
  • cmax
  • ctid
  • oid

Most of the columns exist in every table, but are excluded when doing a SELECT *. However one can include the column(s) in a query, and then PostgreSQL will return the values for the columns as well.

 

 

Continue reading "ctid and other PostgreSQL table internals"

PostgreSQL Upgrades are hard!

Together with Lætitia Avrot and Nikolay Samokhvalov I was invited to participate in a Community Panel (YouTube video) about PostgreSQL Upgradability at Postgres Vision 2022. The panel was moderated by Bruce Momjian and initiated and organized by Jimmy Angelakos. Bruce did talk with each of us before, which helped a lot to guide the discussion in the right direction. The recording of the panel discussion is available on the Postgres Vision website.

During this panel each of us provided examples for how easy or complicated PostgreSQL upgrades still are.
 

 

Continue reading "PostgreSQL Upgrades are hard!"

PostgreSQL Project @ GSoC 2022

As is a good tradition, the PostgreSQL Project participates in Google Summer of Code (GSoC). Last year we submitted 7 projects for 7 students - and got all 7 projects accepted. This year we got quite a few more good proposals from students, and more mentors are helping. Guess what? Google accepted all 12 proposals!

Google modified the program again. For 2021 they cut the time for every project in half, to accommodate for the at-home work during the Covid-19 pandemic. This turned out to be suboptimal, and many larger projects need more time. This year students can choose between “medium” (175 hours) and “large” (350 hours) projects. This gives everyone a chance to scope the project accordingly.

 

 

Continue reading "PostgreSQL Project @ GSoC 2022"