PGCon2017 - 20180510

PGCon 2017
The PostgreSQL Conference

Speakers
Markus Nullmeier
Schedule
Day Talks - Day 2 - 2017-05-26
Room DMS 1110
Start time 14:00
Duration 00:45
Info
ID 1080
Event type Lecture
Track Hacking
Language used for presentation English

OUZO for indexing sets

Accelerating queries to sets with GIN, GiST, and custom indexing extensions

Sets are apparently a useful data type for many kinds of applications. In PostgreSQL, sets may be emulated to some degree with its built-in array and JSONB data types. Corresponding containment (subset) queries can be already handled with built-in features of the GIN index type. Next to that, we will explore the performance gains enabled by custom set data types, and especially by custom operator classes for the GIN and GiST index types.

Then, we will introduce a specialized set data type using a run-length encoding scheme -- it is motivated by spatial tessellations used in astronomy. As neither GIN nor GiST offer a road to efficient indexing in this case, we use the pluggable access method machinery of PostgreSQL 9.6+ to create a new specialized index type, called "OUZO". This is essentially a boldly modified version of the pre-existing pluggable RUM access method, which in turn is an improved implementation of GIN. Finally, we discuss possible future improvements to OUZO.

Sets are apparently a useful data type for many kinds of applications. While PostgreSQL offers no built-in set data type, sets may be emulated to some degree with its built-in array and JSONB data types. Also, acceleration of respective containment (subset) queries is readily available as a built-in feature of the GIN index type. Starting with the above, we will then explore the performance gains enabled by custom set data types, and especially by custom operator classes for the GIN and GiST index types.

Then, we will introduce a specialized set data type implementation, where subsequent integers are compressed via run-length encoding -- it is motivated by spatial tessellations used in astronomy. There, neither GIN nor GiST as such may be used for efficient indexing. However, since PostgreSQL 9.6, it is possible to write your own full-featured index types, which may be plugged into the system as a regular PostgreSQL extension.

One such extension that was already openly available is RUM -- an improved implementation of GIN. We will present our modifications to RUM, called "OUZO", where the handling of the key (i. e., set element) data type allows for run-length encoding. This enables useful indexing of the above compressed-set data type. We also discuss the property of GIN, RUM, and OUZO (for the time being) that key items are never deleted, in view of its performance implications for OUZO, plus other possible future improvements to OUZO.