Room: DMS 1160
Feedback: Leave feedback
In PostgreSQL, recovery is done by replaying WAL records to database files. To maintain consistency of the replay, it has been done only by one thread in serialized manner and this is making recovery of huge databases very slow, resulting in very long downtime and large amount of lag for log-shipping replication, especially in write-intensive workloads.
The talk presents the idea how to run this WAL replay in paralle workers to shorten the duration for the recovery and shorten the lag.
We cannot run the apply simply in paralle. We have bunch of rules to maintain the recovery consistency. For example, for specific page, the WAL has to be applied in the written order and before applying transaction commit/abort/prepare, all the corresponding WAL records have to be applied.
The talk will give full set of such rules to maintain, presents implementatin architecture including worker process configuration, WAL record assignment to workers, synchronization among workers and current status of the codel development. The talk will present result of the experimental run and show potential performance gain.