PGCon2012 - 2012.02.26

PGCon 2012
The PostgreSQL Conference

Speakers
Alexander Korotkov
Schedule
Day Talks - 2 - 2012-05-18
Room DMS 1140
Start time 15:00
Duration 01:00
Info
ID 383
Event type Lecture
Track Advanced Features
Language used for presentation English
Feedback

Index support for regular expression search

Regular expressions (regex) are powerful tool for text processing. When dealing with large string collections it's important to search fast on that collections (i.e. search using index). Indexing for regex search is a quite hard task. This talk presents novel technique (and WIP patch [1] for PostgreSQL implementing it) for regex search using trigram indexes. Proposed technique provides more comprehensive trigram extraction than analogues, i.e. higher performance.

Existing techniques of index-based regex search mostly based on extractions continued text fractions from regex and perform substring search [2]. This talk presents novel technique of regex analysis which is based on automata transformation rather than original regex analysis. Superiority of proposed technique will be proved by examples and tests.

The talk would be organized as following:

  • Introduction.
    • Regular expressions
    • Finite automata
    • pg_trgm contrib module
  • Existing techniques for index-based regular expression search
  • Proposed technique
    • Description
    • Examples
    • Comparison with analogues
    • Limitations
  • Performance results