PGCon2018 - 2.5

PGCon 2018
The PostgreSQL Conference

Speakers
Teodor Sigaev
Schedule
Day Talks - Day 2: Friday - 2018-06-01
Room DMS 1110
Start time 16:00
Duration 00:45
Info
ID 1169
Event type Lecture
Track New Features
Language used for presentation English

Jsonb flexible indexing

Smart indexing with parameterized access methods operator classes.

Jsonb is a popular data type in PostgreSQL, it provides the web developers an ability to work with ubiquitous json inside the database and use all the power of proven relational database. Fast querying of jsonb data is a challenge for database and PostgreSQL provides several options for indexing jsonb. We present the new way of efficient indexing of jsonb, based on improvement of indexing infrastructure.

It's known, that json is a greedy data type, it may contains many auxiliary data not interesting for searching and that affects the size of index. Partial index will not helps, since it filters the rows before indexing, while we are interested in extracting of parts of jsonb. Functional indexes on specific keys could introduce too big overhead. We present an improvement of indexing infrastructure, which allows to control the index behaviour by passing parameters to operator class at index creation. For example, to index a user-defined subset of jsonb it is possible to pass to operator class the powerful path expression (either jsonpath of upcoming sql/json or jspath from jsquery extension), which can be used to extract the parts of jsonb tree. That makes index more effective and reduces the overhead of its maintaining.

Another use of parameterized operator classes is to allow a user to specify parameters instead of hard coding them, for example, the GiST signature size is currently hard coded inside the implementations of several opclasses (tsvector, hstore, intarray, pg_trgm, ltree), while it is natural to use different signature length for different data to have optimal size of index and its performance.

Full text search on parts of document can be improved by passing labels to the operator class and letting him index only specified parts of document, that allow to avoid currently used recheck of the rows returned by the index and consequently speedup query.