PGCon2018 - 2.5

PGCon 2018
The PostgreSQL Conference

Masahiko Sawada
Day Talks - Day 1: Thursday - 2018-05-31
Room DMS 1110
Start time 15:15
Duration 00:45
ID 1202
Event type Lecture
Track Hacking
Language used for presentation English

VACUUM more efficient than ever

VACUUM is an important feature for PostgreSQL to reclaim old row version. The PostgreSQL community has made great progress in improving VACUUM, but for large installation that requires stable and good performance we still have to do something about VACUUM at some point. For instance, VACUUM on whole large and heavily updated table could take a long time for completion, which means taking the table lock for a long time. Also it might be canceled half way through. In this talk, I'll explain the state of VACUUM and recent changes to VACUUM, and toward more efficient VACUUM I'll talk about improvements of VACUUM with ideas and its results.

VACUUM is an important feature for PostgreSQL that uses MVCC to implement transaction isolation, to get rid of dead row version. If PostgreSQL doesn't have it, the number of dead row versions would grow without bound, and therefore the database size would grow without bound. Therefore PostgreSQL has to do VACUUM periodically and even under transaction processing. Making VACUUM more efficient is very important if one wants to get stable and good database system performance. PostgreSQL community has made great progress in improving VACUUM over the years, such as, introducing autovacuum, cost-based vacuum delay, visibility map, etc. With the latest PostgreSQL, small systems do not have to worry about VACUUM very much and even for larger systems there are less things to worry about now than earlier versions. However, considering that many systems that require stable and good performance have started using PostgreSQL, VACUUM still need to evolve to handle their performance requirements. For instance, for very large table, since vacuum is a bulk operation, vacuumming on a whole very large table could take a long time for completion. It seems worth thinking about the VACUUM of a large table could be performed parallelly or performs portion of table that likely has many garbage. Also, although recent VACUUM improvements were mainly related to heap, not much has been done with regard to index vacuuming. We can improve index vacuum by reducing unnecessary page scan.

In this talk, I'll share the state of VACUUM and recent changes. I'll also propose some ideas to enhance to VACUUM more: parallel vacuum, reduce scanning of unnecessary pages by index vacuum and I/O cost-effective VACUUM for very large table. For each idea I'll also share the result in the form of a proof-of-concept patch.