PGCon2016 - 20180510

PGCon 2016
The PostgreSQL Conference

Speakers
Matthew Kelly
Schedule
Day Talks - Day 1 - 2016-05-19
Room DMS 1110
Start time 11:00
Duration 00:45
Info
ID 966
Event type Lecture
Track Scaling Out
Language used for presentation English

Unveiling pg_global_kv

A distributed key value store built on top Postgres

pgglobalkv is a distributed, multimaster, eventually consistent, UPSERT supporting, key value document store with expiry built on stock 9.3 using just PL/pgSQL, json, and postgres_fdw.

pgglobalkv is the perfect drop in solution for implementing session storage, and site personalization efficiently when reads and writes are coming from globally distributed data centers.

pgglobalpv also has support for doing atomic numeric operations which makes it a functional platform for doing online machine learning.

pgglobalkv was born out of a need to replace our existing site personalization infrastructure that was built on top of a certain NoSQL solution so we turned to Postgres, our datacenter workhorse, to solve this problem. We needed a multimaster replication that was far faster and less resource intensive than trigger based replication and we weren't comfortable running experimental versions of BDR in production yet. We wanted something built on top of a mature version of Postgres using only well tested core functionality.

Using only PL/pgSQL, the Postgres Foreign Data Wrapper, and a light weight daemon that runs next to Postgres, we were able to build a replication based on timestamps which could keep up with whatever the underlying Postgres server could handle. In our case we were able to push 15,000 writes per second (and far more reads) per machine. We've never needed more than one machine per datacenter per instance but pgglobalkv is topology agnostic in terms of it's performance characteristics and this allows it's performance to scale linearly with the number of machines.

Since we put our first pgglobalkv instance into production 7 months ago, it has taken 19.3 billion document writes while maintaining 660 million documents without a single production incident and we have plans of building more production critical infrastructure on top of it.

When open sourcing pgglobalkv, we are open sourcing all of the pieces on the Postgres side: all the replication code, the PL/pgSQL UPSERT plus conflict resolution, row expiry daemon, the server setup and configuration, and monitoring infrastructure. All that is missing is the application side code that hashes keys to pick an appropriate shard.

Obviously, we needed to make a number of tradeoffs to get the performance that we did. This talk will cover the strengths and weaknesses of pgglobablkv as a solution, as well as what features are currently missing to make it fully featured for every use case.