Skip to content

PGSQL Phriday #002: PostgreSQL Backup and Restore

Thanks to Ryan Booz we now have the #PGSQLPhriday blogging series. The second edition is about backups. And of course also about restore.

Backups? Do you need that?

If your data is not important, you certainly don't need to bother about backups. Then again, why do you even store the data in the first place?

For anyone else, especially everyone who runs business critical or mission critical applications, having a backup is important. Equally important, but sometimes ignored, is the ability to restore the backup.

The requirements for backups are manifold:

  • How quickly must you be able to restore the backup (SLA)?
  • Do you need to be able to restore every transaction (catch all changes) or is a snapshot (backup freuency, hourly, daily, weekly, monthly) sufficient?
  • How can the backup integrity be verified? 
  • Where do store the backup, and make sure that a disaster (as example: lost of data center) does not affect the backup?
  • Who can restore the backup, what is the plan for that?
  • ...

Your #PGSQLPhriday task

Describe how you do backups for your PostgreSQL databases.

Which tool(s) are you using, where do you store backups, how often do you do backups?
Are there any recommendations you can give the reader how to improve their backups?
Any lesser known features in your favorite backup tool?
Any new and cool features in a recently released version?

Bonus question: Is pg_dump a backup tool?


How does #PGSQLPhriday work?

  • Post your blog post about "Describe how you do backups for your PostgreSQL databases" by Friday, November 4th.
  • Include "PGSQL Phriday #002" in the title or first paragraph of the blog posting.
  • Link back to this blog posting, example: "ads asked in 'PGSQL Phriday #002' if pg_dump is a backup tool, and I strongly object, here's why".
  • Announce your blog post in one or any of these places:
  • Interact with other blog posts about this topic, maybe someone else has a different opinion about 'pg_dump' being a backup tool.

Looking forward to your blog posts!


No Trackbacks


Display comments as Linear | Threaded

CaptainKirkFan on :

Okay, we use pg_dump -s --file=schema_YYYYMMDD_HHNNSS.sql as a quick backup of our source code. We are in Development mode, and have NOT pushed our databased to a production server! Then we run a program we wrote (PgDDLGen) that parses this SQL file into a directory structure of schema objects, changing "CREATE XXX" to "CREATE OR REPLACE XXX", etc. As appropriate. This program tracks the Deletions. And does not overwrite any files that have not changed. (Yep, it reads them in, compares them, with special masking for sequences even), and then only writes them if changed. Then, the deleted files are REMOVED from Source Control And the changes committed to source control. FWIW, because of function overloading, and long type names, an INI file allows us to set abbreviations so we have file names (ONLY when there are overloaded routines) that include the signatures. (eg, "set_text_from(t,t,ts).sql"). We choose this because we use this same setup for the source database and have 20yrs of commit data, and have found it INVALUABLE to both reviewing changes, repairing, or quickly bringing the CODE portion of the DATABASE up to date. FWIW, this program also generates a CSV file that identifies EVERY object, it's DROP command, and the \ir that would recreate the object. And it creates 2 more files. One that creates all constraints and one that creates all indexes, again using \ir And this final piece allows us to easily DROP all constraints, indexes, data (using \gexec type queries). Then RELOAD all the data from dump files. And execute the commands required to Constrain/Index the data after the imports. Would I recommend this for a production backup? (First all, we are not grabbing ONE RECORD of data, so no!). Will this run in production and snapshot DAILY the changes made to the source code/table structures/etc? YES! Is it a backup? Yes. But not a DATA Backup. A Structure Backup. And it's perfectly suited for that! When we start "caring" about our data, and running things in staging. We will be using DIFFERENT tools for DATA Backup. We want (This is important, we define which features we need and want, while choosing between tools): - Continuous Backups - PTIR Recover over 48hrs (Maybe 72) - The Ability to restore a failover server to a specific time from the night before. And have it as a stand by. - Eventually consider having a Real-Time Standby Server for backup. Why? Because we want the Sr. Developer to be able to request from IT that they restore the STAGING DB to a specific time to the staging server, so something can be reviewed. We may also want to request it is done to a Virtual Machine Copy and held without updates so we can review when we get time, and without impacting Staging! Or simply to test the restore. Our final CHECK will be that at least daily, the backups are tested against the backup server. And that Weekly a random PITR is selected, and processed without errors into VM and then loosely validated. The reality is this: If you have NEVER restored your BACKUP. Then you do NOT know if you actually have BACKUPS! You are guessing. We've been at this for decades, and have witnessed things you would NEVER believe. That created corrupted backups everyone thought were fine, but were NOT restorable. It turns out, you ONLY KNOW if they are restorable if you can actually RESTORE them. (also, wouldn't it be nice to know that some TINY % of backups fail to restore? Oh... All of a sudden that 48-72hrs of PITR logs becomes a way to manually rework a backup if you needed it... Say it was the EOM one that is kept for 13 Months?)
Comments ()

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
To leave a comment you must approve it via e-mail, which will be sent to your address after submission.
Form options