PGCon2017 - 20180510

PGCon 2017
The PostgreSQL Conference

Speakers
Amit Langote
Etsuro Fujita
Kyotaro Horiguchi
Masahiko Sawada
Schedule
Day Talks - Day 2 - 2017-05-26
Room DMS 1160
Start time 13:00
Duration 00:45
Info
ID 1069
Event type Lecture
Track Hacking
Language used for presentation English

Towards Built-in Sharding in Community PostgreSQL

Built-in sharding will be an enabling technology to increase the adoption of community PostgreSQL in environments that have very large data sets and/or need high read/write scaling. In this talk we'll first state our approach to implementing built-in sharding in community PostgreSQL and show how to build basic sharding with in-core features in community PostgreSQL. We'll then describe distributed query processing we are working on for better sharding. We'll finally discuss future directions for more advanced built-in sharding in community PostgreSQL.

Built-in sharding will be an enabling technology to increase the adoption of community PostgreSQL in environments that have very large data sets and/or need high read/write scaling. To provide built-in sharding in community PostgreSQL, we've proposed implementing that by enhancing existing features such as inheritance and Foreign Data Wrappers (FDW) with the knowledge obtained through the development of Postgres-XC. So far we allowed foreign tables to participate in inheritance in PG9.5 and introduced declarative partitioning on top of that in PG10, which enables basic sharding with the PostgreSQL FDW by creating partitioned tables that split the data into partitions that are remote tables on multiple PostgreSQL servers. We also improved the PostgreSQL FDW to support remote joins, sorts, and updates in PG9.6 and remote aggregates in PG10, which will greatly improve the performance of queries on sharded tables.

In this talk we'll first state our approach to implementing built-in sharding in community PostgreSQL and show how to build basic sharding with in-core features in community PostgreSQL. We'll then describe distributed query processing we are working on for better sharding: partitioning-aware query optimization, asyncronous FDW execution, and distributed transaction support. Partitioning-aware query optimization will make it possible to support more remote operations such as partition-wise joins and partition-wise aggregates. Asyncronous FDW execution will improve the query performance further by enabling remote queries on multiple PostgreSQL servers to be initiated and processed independently. Distributed transaction support will provide atomic commit across multiple PostgreSQL servers (i.e., either all of remote queries involved in a transaction commit or all of them abort) by using the two-phase commit protocol. We'll finally discuss future directions for more advanced built-in sharding in community PostgreSQL.