PGCon2017 - 0626

PGCon 2017
The PostgreSQL Conference

Rafia Sabih
Day Talks - Day 2 - 2017-05-26
Room DMS 1160
Start time 10:00
Duration 00:45
ID 1040
Event type Lecture
Track Hacking
Language used for presentation English

Multi-objective Query Optimization in PostgreSQL

Tradeoff between execution-time and resource-requirements!

In current design of PostgreSQL, the paradigm is to select the plan with minimum expected execution time. However, there are many cases where we may want to know the resource requirements of the query with respective performance graph and allocate them accordingly. For instance, if we are to rent a Virtual Machine (VM) on Infrastructure-as-a-service model of cloud platform and we knew that our query gives less than 5% performance improvement when provided with twice the resources, then we wouldn’t be willing to allocate such higher resources for that query, and would rent the VM accordingly.

Specifically, we consider how the traditional metric response time, can be augmented to incorporate hardware resources —like RAM (specifically workmem for PostgreSQL to decide the amount of per-node memory), number of cores (maxparallelworkersper_gather to decide the parallelism in query), etc. in the plan selection process. The challenge here is that on decreasing the resources, execution time of the query increases and vice-versa. In a nutshell, there is a tradeoff between execution-time and resource required, and our goal therefore is to find a sweet-spot between them, which we term as the ‘knee’ of the plot. In our study, we profile the behaviour of time versus resources and define knee as the location on the profile that has the minimum Euclidean distance to the origin.

In our prototype implementation we modified the query-optimizer module to keep a list of a carefully selected set of paths instead of only the cheapest path, and finally selecting the ‘knee’ path at the root node. It might seem like carrying a list of paths till the higher nodes would be quite costly, however, our experiments suggest otherwise since this set is found to be small at most of nodes. Particularly, if we consider the saving in resources that can be achieved with this modification compared to the increase in planning time, then it appears to be a minuscule price to pay for the benefits gained.