PostgreSQL Lands Support For Incremental Backups

Written by Michael Larabel in Free Software on 21 December 2023 at 03:36 PM EST. 3 Comments

Merged yesterday to the Git code for the PostgreSQL database server is support for facilitating incremental backups.

Robert Haas a few months ago restarted work to implement incremental backup support for PostgreSQL, years after he originally pursued the feature to no avail at that time. But this time around things panned out and merged to PostgreSQL is support for producing incremental backups.

PostgreSQl logo

Haas explained in his proposal from June:

The basic design of this patch set is pretty simple, and there are three main parts. First, there's a new background process called the walsummarizer which runs all the time. It reads the WAL and generates WAL summary files. WAL summary files are extremely small compared to the original WAL and contain only the minimal amount of information that we need in order to determine which parts of the database need to be backed up. They tell us about files getting created, destroyed, or truncated, and they tell us about modified blocks. Naturally, we don't find out about blocks that were modified without any write-ahead log record, e.g. hint bit updates, but those are of necessity not critical for correctness, so it's OK. Second, pg_basebackup has a mode where it can take an incremental backup. You must supply a backup manifest from a previous full backup. We read the WAL summary files that have been generated between the start of the previous backup and the start of this one, and use that to figure out which relation files have changed and how much. Non-relation files are sent normally, just as they would be in a full backup. Relation files can either be sent in full or be replaced by an incremental file, which contains a subset of the blocks in the file plus a bit of information to handle truncations properly. Third, there's now a pg_combinebackup utility which takes a full backup and one or more incremental backups, performs a bunch of sanity checks, and if everything works out, writes out a new, synthetic full backup, aka a data directory.

Simple usage example:

pg_basebackup -cfast -Dx
pg_basebackup -cfast -Dy --incremental x/backup_manifest
pg_combinebackup x y -o z

This commit on Wednesday is what lands the PostgreSQL incremental backup support. It further explains the feature:

"To take an incremental backup, you use the new replication command UPLOAD_MANIFEST to upload the manifest for the prior backup. This prior backup could either be a full backup or another incremental backup. You then use BASE_BACKUP with the INCREMENTAL option to take the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST option to trigger this behavior.

An incremental backup is like a regular full backup except that some relation files are replaced with files with names like INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains additional lines identifying it as an incremental backup. The new pg_combinebackup tool can be used to reconstruct a data directory from a full backup and a series of incremental backups."

An exciting feature for the next major release, PostgreSQL 17. The PostgreSQL 17 database server stable debut is tentatively planned for next September so it will be interesting to see what other exciting changes make it for this next version.

3 Comments