Skip to content

Not so equal texts in PostgreSQL - and how to compare texts in a more elaborate way

Christophe Pettus posted an interesting challenge here. Two strings which on the surface look equal, but if you ask PostgreSQL to compare them, they are not equal.

Now let me start with a note: Twitter totally screws this challenge up.

How so? Although the two strings are different in the original, when posting this to Twitter the strings are made equal. Where is the fun in that?

I asked Christophe for the original query:

INSERT INTO t VALUES (E'Zo\u0065\u0301', E'Zo\u00e9');

And you end up with the following texts in the table:

  a  |  b  
 Zoé | Zoé
(1 row)

If you translate the UTF-8 strings into hex, you get "0x5a 0x6f 0x65 0xcc 0x81" and "0x5a 0x6f 0xc3 0xa9". Clearly they are different.

However if you convert the two strings from the Tweet, you get "0x5a 0x6f 0xc3 0xa9" and "0x5a 0x6f 0xc3 0xa9". Same string. Poor Twitter.

Checking the hex values was actually one of my first ideas when I spotted this challenge. But nevertheless based on my experience from my "Data Types in PostgreSQL" and "Advanced Data Types in PostgreSQL" talks, I figured it should be possible to "solve" this puzzle even if the strings are in fact equal.

Buckle up! We are about to dive deep into how extendible PostgreSQL really is!


Continue reading "Not so equal texts in PostgreSQL - and how to compare texts in a more elaborate way"

GSoC 2021 completed

The Google Summer of Code 2021 for the PostgreSQL Project is wrapped up. The timeline this year was shortened to half, compared to previous years. That’s good, because smaller projects can be worked on, and students have a chance to cope with a changing environment at home and university. On the other hand, the shorter time doesn’t allow diving into more complex projects. Nevertheless, with the help of all mentors, six students successfully concluded their projects.


Continue reading "GSoC 2021 completed"

PostgreSQL Project @ GSoC 2021

Wow! The PostgreSQL Project got all 7 proposals accepted into Google Summer of Code 2021!

This year Google changed the participation terms a bit, and cut the time for the students in half. This is supposed to help students who can’t work full-time from home, especially in light of the global pandemic situation. It also means smaller projects, which are easier to handle even for students new to the project.

The PostgreSQL project got a great number of initial applications (29), and we talked with many of the students about refining their proposals. 27 out of the 29 applications were finally submitted by the students. Some are duplicates, some are clearly just copied from somewhere, but many propose good ideas.

After talking with available mentors, and “recruiting” a few more, we settled on 7 final applications, and submitted them to Google.

As usual many of the proposals are not directly developing code for core PostgreSQL, but work on tools and applications from the PostgreSQL ecosystem. Expect some great output over the following months.

Make Ansible "postgresql_ping" fail if the database does not exist

Ansible has a very useful module "postgresql_ping" which checks connectivity to the database server. I'm using it in quite a few Playbooks as first step just to ensure that the database server is present - this fails early if there is a problem which otherwise just prevents the rest of the Playbook to work properly.

TASK [Check if database is available]
[WARNING]: PostgreSQL server is unavailable: could not connect to server: No such file or directory         Is the server running locally and accepting         connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
fatal: []: FAILED! => {"changed": false, "failed_when_result": true, "is_available": false, "server_version": {}}


However this module does not check if the database exists, just if the server is reachable. Example Playbook:

- name: Check if database is available
    db: "testdb"
  become: yes
  become_user: postgres

When I run the Playbook:

TASK [Check if database is available]
[WARNING]: PostgreSQL server is unavailable: FATAL:  database "testdb" does not exist
ok: []


As you can see, the database "testdb" does not exist. Which for the module is a reason to raise a warning, but not  a reason to fail.

One possible solution is to let this module do it's work, and add a "postgresql_db" call next, which ensures that the database is created. But not every Playbook is supposed to create and populate a database, and has all the required parameters (owner, encoding, template ect) available. Therefore it would be nice if "postgresql_ping" fails early if the database in question doesn't exist. That's possible, with two more lines of code:

- name: Check if database is available
    db: "testdb"
  become: yes
  become_user: postgres
  register: ping_database
  failed_when: ping_database.warnings is defined

And the Playbook run:

TASK [Check if database is available]
[WARNING]: PostgreSQL server is unavailable: FATAL:  database "testdb" does not exist
fatal: []: FAILED! => {"changed": false, "failed_when_result": true, "is_available": false, "server_version": {}}

Together with "any_errors_fatal: True" this ends the entire Playbook early enough before I have to debug the problem later on.

Google Summer of Code 2020 - Intermediate status update

The three PostgreSQL projects for this year’s Google Summer of Code are on track, and making good progress. All projects expect to finish on time.

Performance Farm

The data gathering for performance farm members is completed, as well as the new implementation for the JSON data transfer. The project iteratively updated it’s goals, and adjusted for newly identified UI issues.

Current work centers around making the website more pretty and useful, as well as reducing the number of used JavaScript libraries. The next step is presenting the work to the PostgreSQL Community for broader feedback.

PL/Java build system

The PL/Java project has just merged (PR #288) the first major pull request of new code from GSoC, creating a new plugin for the Maven build system that allows its actions to be guided by script snippets clearly exposed in the build files.

The same effect was formerly achieved by a workable but brittle combination of an existing Maven plugin that could handle most of the build requirements with another plugin that was able to run Ant, which was able to run scripts. That resulted in a non-ideal division of labor, where a good deal of build logic was hidden away inside plugins, while some parts were exposed in script out of necessity, rather than because they were interesting or likely to need adjustment.

This pull request proves the concept of a new plugin where the hardcoded Java portions are the uninteresting building blocks, and the overall logic of the build is clearly exposed in script.

For now, the new plugin is used to retire the maven-javadoc-plugin and remove the constraints it had imposed on the project's javadocs (such as the need for absolute URLs for intermodule references, making the resulting tree hard to preview or relocate).

Work continues to reimplement the C native build and retire the nar-maven-plugin and maven-antrun-plugin, to be delivered in a future PR.

WAL-G Performance

We’ve just completed the decoupling of the complex WAL-G internal class. Thanks to it, the new functionality developed in July for a more intelligent backup creation process can now be safely integrated. This feature involves major changes so it requires time to verify that everything is working as expected. We plan to finish the integration in parallel with working on other features.

Currently, we are working on merging the new series of commands for the WAL archives that have been uploaded to storage. These commands will allow end users to analyze the storage for any missing WAL segments that may prevent performing a PITR. Also, Dan now is in the process of implementing the last feature and he expects to finish it on time.

Thanks to all mentors for the status update!

Google Summer of Code 2020 started

The PostgreSQL Project participates in Google Summer of Code (GSoC) 2020, with 3 projects. After the “Community Bonding” period finished last week, we are now in the active development phase - “Coding” as Google calls it. All three projects make good progress!


Performance Farm

The project defined a number of milestones, and evaluated the current database structure. Modifications are required on this front, and will be applied over the following days. Also the structure for sending data from the client to the project server is re-evaluated and modified. The student started with the database design modifications, and also with documenting the changes and terminology used.


PL/Java build system

Thanks to the ongoing work setting up the continuous integration, PL/Java's master branch - which will become the 1.6 release - is now getting regular CI builds against several PostgreSQL and Java versions on amd64 Ubuntu and Mac OS X, and the student has moved on to setting up the same for Windows. We had a goal to enable test options in the CI builds that were otherwise impractically strict, and identify ways to filter the output down to a manageable volume which is exposing real issues. Through a combination of fixes to some real PL/Java warnings, and a small state machine now keeping known non-PL/Java ones out of the log, the project now builds -Xcheck:jni clean. The first actual bug found through the students work got fixed before the bonding period ended - the bug had been there for fifteen years.

The later part of the work will involve more straight-up coding, to replace a Maven plugin now used in PL/Java's build that isn't quite suited to the need. The proposal outlined a few reasons for that and the preliminary work has already uncovered more reasons. It was already a goal of that work to improve the signal-to-noise ratio of diagnostics from that plugin, so already solving a similar problem for -Xcheck:jni was a good warmup.


WAL-G performance

First 2 weeks are going almost as we planned, the student started working on the first task during the phase of proposals. In the first week he updated his PR and we merged it. The first feature already works. This week he updated it with several improvements (mostly refactorings) and started working on test coverage of the first feature and also making some drafts on the second feature. Now we have discussion about design details of the second feature. I hope it will help to implement it according to the current plan.


Thank you to all mentors for the status update!

Related Projects

The PostgreSQL main website has a new page: "Related Projects".

This page lists the projects which help running and maintaining the PostgreSQL project, the infrastructure, and other things like the translations for press releases. For each project it lists links to the source, as well as information where to send updates, patches, or input.

If you want to get involved in one of the projects, that's your starting point. If a project is missing, please send a note or a patch to pgsql-www.

Many thanks to Jonathan S. Katz for polishing the patch, and making it look nice!

PostgreSQL @ FOSDEM 2020

The PostgreSQL Project is present with a booth at FOSDEM ever since 2007. Since 2008 we organize a Devroom, since 2013 we have our own PGDay on the Friday before FOSDEM. This year marks the 8th FOSDEM PGDay.

This blog post presents useful information about the PGDay, the booth and Devroom at FOSDEM.


Continue reading "PostgreSQL @ FOSDEM 2020"