Now let me start with a note: Twitter totally screws this challenge up.
How so? Although the two strings are different in the original, when posting this to Twitter the strings are made equal. Where is the fun in that?
I asked Christophe for the original query:
INSERT INTO t VALUES (E'Zo\u0065\u0301', E'Zo\u00e9');
And you end up with the following texts in the table:
SELECT * FROM t; a | b -----+----- Zoé | Zoé (1 row)
If you translate the UTF-8 strings into hex, you get "0x5a 0x6f 0x65 0xcc 0x81" and "0x5a 0x6f 0xc3 0xa9". Clearly they are different.
However if you convert the two strings from the Tweet, you get "0x5a 0x6f 0xc3 0xa9" and "0x5a 0x6f 0xc3 0xa9". Same string. Poor Twitter.
Checking the hex values was actually one of my first ideas when I spotted this challenge. But nevertheless based on my experience from my "Data Types in PostgreSQL" and "Advanced Data Types in PostgreSQL" talks, I figured it should be possible to "solve" this puzzle even if the strings are in fact equal.
Buckle up! We are about to dive deep into how extendible PostgreSQL really is!
In my "Advanced Data Types in PostgreSQL" talk in the last chapter I explain that it is possible to extend PostgreSQL with your own data types, and corresponding functions, operators and operator classes. Let's put this knowledge to good use and make equal strings not equal.
Looking at the query it seems easy, all I need is:
- A new data type (similar to string)
- A new operator for "=", which in fact makes the strings not equal
How hard can it be ...
In order to save my regular databases and not accidentally screw up the catalog, I create a new database where all the following tests take place:
\set ON_ERROR_STOP false \c postgres DROP DATABASE IF EXISTS "Zoé"; CREATE DATABASE "Zoé"; \c "Zoé" \set ON_ERROR_STOP true \set ECHO queries CREATE SCHEMA "Zoé"; SET search_path TO "Zoé";
I name the new database "Zoé" - because I can. And to make it look nice, I also create a "Zoé" schema. Because why not. This also helps to show where I use my own functions and objects, because they are always schema-qualified with "Zoé".
New data type
A new data type in PostgreSQL needs a couple support functions, input and output, receive and send. That's the bare minimum.
Lucky me, I'm mimicking the TEXT type, so I can re-use all the text functions for this.
CREATE FUNCTION "Zoé".varchar2_in(cstring) RETURNS varchar2 AS 'textin' LANGUAGE internal STRICT; CREATE FUNCTION "Zoé".varchar2_out(varchar2) RETURNS cstring AS 'textout' LANGUAGE internal STRICT; CREATE FUNCTION "Zoé".varchar2_recv(internal) RETURNS varchar2 AS 'textrecv' LANGUAGE internal STRICT; CREATE FUNCTION "Zoé".varchar2_send(varchar2) RETURNS bytea AS 'textsend' LANGUAGE internal STRICT;
Time to create the new type. Let me know if you recognize the name ...
CREATE TYPE "Zoé".varchar2 ( INTERNALLENGTH = variable, INPUT = "Zoé".varchar2_in, OUTPUT = "Zoé".varchar2_out, RECEIVE = "Zoé".varchar2_recv, SEND = "Zoé".varchar2_send, COLLATABLE = true, CATEGORY = 'S', PREFERRED = true, ALIGNMENT = int4, STORAGE = extended );
As you can see, this uses the newly created support functions, which I placed into my dedicated schema.
At this point I can already use this type, create the table, and insert data:
CREATE TABLE "Zoé".t (a "Zoé".varchar2, b "Zoé".varchar2 ); INSERT INTO "Zoé".t VALUES ('Zoé', 'Zoé');
Checking the content:
SELECT * FROM "Zoé".t; a | b -----+----- Zoé | Zoé (1 row)
However I can not yet compare the two columns:
SELECT * FROM "Zoé".t WHERE a = b; SELECT * FROM "Zoé".t WHERE a = b; ERROR: operator does not exist: varchar2 = varchar2 LINE 1: SELECT * FROM "Zoé".t WHERE a = b; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
PostgreSQL has no idea how to compare two strings of my new type. I need to create the "=" operator for my new type.
In fact I not only need to create the "=" operator, but also the "<>" operator, because the "=" operator needs a negator.
In order to create these two operators, I also need two functions which can compare two of my varchar2 values, and tell PostgreSQL if they are equal or not. This is where the change happens.
The original functions for TEXT:
CREATE FUNCTION pg_catalog.texteq (TEXT, TEXT) RETURNS boolean AS 'texteq' LANGUAGE internal STRICT; CREATE FUNCTION pg_catalog.textne (TEXT, TEXT) RETURNS boolean AS 'textne' LANGUAGE internal STRICT;
My functions for the new data type:
CREATE FUNCTION "Zoé".texteq ("Zoé".varchar2, "Zoé".varchar2) RETURNS boolean AS 'textne' LANGUAGE internal STRICT; CREATE FUNCTION "Zoé".textne ("Zoé".varchar2, "Zoé".varchar2) RETURNS boolean AS 'texteq' LANGUAGE internal STRICT;
Do you spot the difference?
Now that I have the support functions, I can create the operators:
CREATE OPERATOR "Zoé".<> ( FUNCTION = textne, LEFTARG = "Zoé".varchar2, RIGHTARG = "Zoé".varchar2, COMMUTATOR = OPERATOR("Zoé".<>), NEGATOR = OPERATOR("Zoé".=), RESTRICT = neqsel, JOIN = neqjoinsel ); CREATE OPERATOR "Zoé".= ( FUNCTION = texteq, LEFTARG = "Zoé".varchar2, RIGHTARG = "Zoé".varchar2, COMMUTATOR = OPERATOR("Zoé".=), NEGATOR = OPERATOR("Zoé".<>), MERGES, HASHES, RESTRICT = eqsel, JOIN = eqjoinsel );
SELECT * FROM "Zoé".t WHERE a = b; a | b ---+--- (0 rows)
Since I just switched the operators, I can turn the query around and ask if the strings are not equal:
SELECT * FROM "Zoé".t WHERE a != b; a | b -----+----- Zoé | Zoé (1 row)
Switching the texteq and textne functions for my new type tells PostgreSQL that the strings are not equal, when in fact they are. Another option which works equally "well" is that I create the texteq and nextne functions with the "correct" internal functions, but switch these functions when I create the operators. This will give the same result.
The extensibility of PostgreSQL gave me the freedom to design the results I wanted to see, switching true and false around, and making you believe that strings are not equal when in fact they are.
That of course is just a quick example developed as an answer to the challenge. However it shows that PostgreSQL can really be expanded to handle any kind of needs you have for your applications.