PGCon2017 - 0626

PGCon 2017
The PostgreSQL Conference

Oleg Ivanov
Day Talks - Day 1 - 2017-05-25
Room DMS 1160
Start time 10:00
Duration 00:45
ID 1086
Event type Lecture
Track Hacking
Language used for presentation English

Adaptive query optimization in PostgreSQL

The talk is devoted to the proposed improvement of PostgreSQL query optimizer. We call it adaptive query optimization (AQO), because it adapts to the data in the database and to the queries in the workload. The main idea is to analyze query execution statistics using machine learning methods. This allows to avoid repetition of query optimization mistakes. During the talk we also discuss PostgreSQL query optimizer.

SQL is known as a declarative language. That means that in a query user defines what operation to perform and the properties of data for this action. But the particular algorithm for performing this operation is constructed by PostgreSQL. This algorithm is called a plan, and the part of PostgreSQL which constructs it is called query optimizer. Since execution time of different paths of the same query may differ by many orders, the quality of query optimizer has crucial impact on the performance of PostgreSQL. We show that PostgreSQL query optimizer sometimes chooses by far not the fastest path.

Cardinality is the number of tuples proceeded in path node. We determined that the most significant query optimizer mistakes are caused by bad cardinality estimation. Currently one-dimensional histograms are used in PostgreSQL for cardinality estimation. Nevertheless, it is mathematically impossible to correctly handle data with correlated columns or dependent clauses in query using only these histograms. The subject of the presentation is our novel approach to cardinality estimation which uses machine learning methods. The approach is general, flexible, and easy to implement. Experimental evaluation shows that our approach significantly increases quality of cardinality estimation, and therefore increases PostgreSQL performance for some complicated queries. For example, some queries from TPC-DS qualification benchmark are executed 95 times faster with our method. In the presentation we also discuss advantages and disadvantages of our method, the ways to overcome disadvantages, and road-map for the further method improvement.