Conference Schedule - PGCon 2022

Highly efficient interconnection for distributed PostgreSQL

Date: 2022-05-26
Time: 10:00–10:45
Room: Stream 3
Level: Intermediate

There are several ways to build a sharded database on top of distributed postgres instances. One of the most interesting and general approach is a built-in support for sharding. Historically postgres has fdw and partitioning features that can be used together to build a sharded database. There were concerns in the past about adoption them for a complete solution. We will review the current state of postgres fdw along with patches, that fix some significant bottlenecks and demonstrate the latest results in TPC-C and TPC-H benchmarks with comparison to existing sharding solutions.

Despite the promising results, our experiments with postgres fdw revealed the fundamental issues that are still exist and make it hard to build an efficient system for most of the workloads. We'll discuss these issues and show a general approach that solves them for a cluster of homogeneous postgres instances. In the same time it is based on fdw and partitioning and most of the changes are implemented as extensions. It consists of two independent components. The first part is a transport that allows to use only single connection between each of the nodes. It leads to M+N connections in the cluster in total instead of M*N where M is a number of client connections and N is a number of nodes. We'll show the implementation of such a multiplexing transport that achieves performance of 1 million pings/s between nodes as a single background worker process. The second part is an integration of postgres planner, executor and transaction support with the new transport to allow execution for all types of the queries. The implementation provides low latency for transactional queries. We achieved a single instance performance on two node cluster for simple queries with near linear scalability. The unmodified postgres fdw setup gives a single instance performance only on 8 nodes cluster in this test. Our approach also allows efficient execution of distributed analytics queries. This work is in progress and some results will be demonstrated in the talk. Both extensions will be open sourced and published on github.

Speaker

Dmitry Ursegov
Teodor Sigaev