PGCon2010 - Final Release

PGCon 2010
The PostgreSQL Conference

Jan Urbański
Day Talks - 2 - 2010-05-21
Room DMS 1150
Start time 15:00
Duration 01:00
ID 211
Event type Lecture
Track Advanced Features
Language used for presentation English

Replacing GEQO

Join ordering via Simulated Annealing

Finding the optimal join order for an arbitrary number of relations is an NP-hard problem. For small queries applying exhaustive search is feasible, but the runtime and memory consumption make that approach impractical in many real-life applications.

PostgreSQL's answer to that problem is GEQO: the genetic query optimizer. It employs heuristics similar to those commonly used for solving the Travelling Salesman Problem. However, recent studies suggest other randomized algorithms could yield better results in shorter time.

One such approach, called Simulated Annealing, will be presented, along with a prototype implementation that you can load and try against your most monstrous queries.

A way of determining the join order for a query is picking a random solution of the problem and transforming it into similar solutions, also generated randomly, accepting only those that improve the quality of the final execution plan. This method has a severe drawback of being easily trapped in local minima, thus delivering an inferior plan to one that a different algorithm could possibly find.

Simulated Annealing (SA) tries address this issue by mimicking the annealing process used in metallurgy to increase the durability of materials. The idea is to heat the system to a certain temperature, then slowly cool it down, allowing it to settle in a state of minimum energy.

Practically, it means choosing a random solution and generating its neighbours, but also accepting plans that are worse than their ancestors, with probability dependant on the current system's "temperature". With the constant decrease of the temperature, this algorithm hopes to finally arrive at a state that has a reasonably low cost, while avoiding being stuck in a local minimum.

Applying Simulated Annealing in PostgreSQL means dealing with several issues:

  • finding a good starting plan
  • randomly generating valid plans in the presence of join order restrictions
  • comparing the costs of each generated plan
  • choosing the starting temperature
  • adapting the speed of temperature reduction

The proposed experimental module offers solutions to these issues and eventually intents to replace GEQO with a SA-based approach, while still being able to decently optimize even the most nightmarish queries out there.