PGCon2015 - final

PGCon 2015
The PostgreSQL Conference

Speakers
Alexander Korotkov
Konstantin Knizhnik
Oleg Bartunov
Schedule
Day Talks - Day 3 - Thursday - 2015-06-18
Room DMS 1160
Start time 13:00
Duration 00:45
Info
ID 784
Event type Lecture
Track Hacking
Language used for presentation English

Rethinking JSONB

PostgreSQL 9.4 has introduced JSONB, a structured format for storing JSON, which provides many users with the new opportunity: an effective storing and querying JSON documents inside ACID relational database. While users have notice a great jsonb performance, their feedback also reveals some hidden problems with current jsonb implementation. We want to discuss different approaches to resolve aforementioned problems and present several proof-of-conceps, so we could rethink jsonb for 9.6.

The first problem of jsonb is its size overhead (4-5 times) in comparison with storing decomposed json data in multiple plain tables, whereas binary format of jsonb has very low storage overhead with regard to a plain text (<4%). The overhead comes mainly from redundant storing of keys names, which can be quite long, in each document. One possible solution could be a persistent key dictionary cached in shared memory. Such solution has several infrastructure problems and it wouldn't work in general case. For example, one can use keys names as values and dictionary may wouldn't fit shared memory because of high cardinality of keys.

The second problem of jsonb is its querying. According to current SQLfacilities in PostgreSQL user can search for array element inside using subselect and jsonbarrayelements function. However, such queries are quite awkward and lack indexing support. Another way for querying jsonb documents is contains (@>) operator which is compact and has indexing support. Also jsquery is very suitable using in check constraints over jsonb, which could validates document schema. For example, constraint "CHECK(jb @@ 'a is numeric and b is string and c = *'::jsquery)" insures that value of key "a" is integer, value of key "b" is string and key "c" exists.

However, jsquery is not extendable, while we need an elegant and extendable way to query json documents with indexing support. We propose to solve this problem by introducing special SQL constructions (ANYELEMENT, ANYVALUE, ANYKEY, ALL) to query json documents. Extendability comes from the ability to use any SQL-expression inside proposed constructions. Indexing suport for such queries is a challenge for PostgreSQL infrastructure, but we think it's feasible.

Another missing feature of jsonb is lack of suitable way to update it. Users have to implement kluges to do it. It's possible to develop a set of functions to provide flexible way to update jsonb or extend current SQL syntax to implement elegant syntax similar to ones for array update.

Our goal is to discuss different approaches to resolve aforementioned problems and present several proof-of-concepts, so we could rethink jsonb for 9.6.