Schedule - PGCon 2020

Ptrack 2.0: yet another block-level incremental backup engine

Date: 2020-05-27
Time: 13:00–13:45
Room: Stream 1
Level: Intermediate

Ptrack has been developed a few years ago and provided a way to track page-level changes of the PostgreSQL database data. This information was used for implementation of block-level incremental backups, which was bundled with pg_probackup. However, that implementation of in-core Ptrack engine had a number of major drawbacks:

Requirement to keep a lot of additional files (one extra fork per relation);
Extremely invasive in-core changes;
Tricky workarounds to avoid races, when taking a backup.

In this talk I am going to discuss block-level incremental backups and present a new incarnation of Ptrack — Ptrack 2.0. Being rewritten from the scratch it now uses a single shared hash table, which is mmap'ed in memory from the file on disk. All operations are made using atomics, so the map is completely lockless during the normal PostgreSQL operation. Ptrack map is written on disk at the end of checkpoint atomically block by block involving the CRC32 checksum calculation that is checked on the next whole map re-read after crash or restart. Due to the fixed size of the map there may be false positives (when some block is marked as changed without being actually modified), but not false negative results. This approach helps us to build simple, but yet durable and fast block-level incremental backups solution.

Schedule - PGCon 2020

Ptrack 2.0: yet another block-level incremental backup engine

Slides

Speaker