PGCon2014 - 20140704

PGCon 2014
The PostgreSQL Conference

Speakers
Alexander Korotkov
Heikki Linnakangas
Oleg Bartunov
Schedule
Day Talks - Day 2 - Fri May 23 - 2014-05-23
Room Morisset 256
Start time 16:15
Duration 00:45
Info
ID 698
Event type Lecture
Track 9.4 Features
Language used for presentation English
Feedback

GIN - Stronger than ever

in 9.4 and further

This talk presents set of GIN advances in PostgreSQL 9.4 and further which brings GIN to new level of performance and extendability. Most important advances are: posting lists compression, fast-scan algorithm, storing additional information and index-based ranking.

This talk presents set of GIN advances:

  • Compression posting lists. Indexes become 2 times smaller without any work in opclass. pg_upgrade is supported, old indexes will be recompressed on the fly.
  • Fast scan algorithm. Fast scan allows GIN to skip parts of large posting trees during index scan. It dramatically improve performance of hstore and json search operators as well as FTS "frequentterm & rareterm" case. In order to use this improvement three-state logic support required in "consistent" opclass method.
  • Storing additional (opclass defined) information in posting lists. Usage of additional information for filtering enables new features for GIN opclasses: better phrase search, better array similarity search, inverse FTS search (search for tsqueries matching tsvector), inverse regex search (search for regexes matching string), better string similarity using positioned n-grams.
  • Index based ranking. This improvement allows GIN to return results in opclass defined manner. Most important application is returning results in relevance order for FTS which dramatically reduces IO load. But there are other applications like returns arrays in similarity order.

We present the results of benchmarks for FTS using several datasets (6 M and 15 M documents) and real-life load for PostgreSQL and Sphinx full-text search engines and demonstrate that improved PostgreSQL FTS (with all ACID overhead) outperforms the standalone Sphinx search engine.