During Nordic PGDay 2015, I attended Heikki’s talk about “pg_rewind”. Someone in the audience asked, if it’s possible to roll a PostgreSQL database forward, ahead of the master. The answer was that currently it’s not possible, but there is no technical reason why this should not work.
So I decided to have a look. Turns out, it’s surprisingly easy to do.
There is an obvious problem: where to get the information from. If the slave is ahead of the master, the master can’t (yet) have the required information. It only arrives at a later time. What on first sight looks like it’s an unsolvable problem, is already solved in the physics world. Physicists named this phenomenon “retrocausality”. Entangled particles can not only travel in space, but also in time. Details are explained in this article from 2014.
To make this work in PostgreSQL, you need a new block device in your OS (I only tested it with Linux so far):
|
|
The major number for the new device must be negative, to tell the Linux kernel that data is coming from the future. The minor number specifies the number of seconds you want to peek into the future. Sorry, the seconds value is hardcoded as of now. The resulting device should look like this:
br--r----- 1 root tty -1, -30 Apr 2 00:00 /dev/quantum
There is a hardcoded limit of -255 seconds. It seems to be impossible to reach more than 255 seconds into the future, must have something to do with the entangling. It’s also impossible to write into this device, the written information just disappears into the quantum world.
Now that the block device is there, all you have to do is replace the WAL files coming from the master with 16 MB
files extracted from /dev/quantum
. restore_command
must look like the following script:
|
|
The restore_command
itself:
restore_command = '/home/ads/bin/quantum_restore.sh %f "%p"'
You see, we can ignore the source filename (first parameter) in the script, because the data is already well known in the future. Only %p
is of interest, that’s the filename which PostgreSQL expects.
Future steps:
- Make this work with replication, not only with log shipping
- Make the number of seconds configurable and not bound to the minor device number
- Port it to all supported platforms