PGCon2019 - 3.4

PGCon 2019
The PostgreSQL Conference

Speakers
Min Wei
Schedule
Day Talks - Day 1 - 2019-05-30
Room DMS 1120
Start time 15:00
Duration 00:45
Info
ID 1377
Event type Meeting
Track Scaling Out
Language used for presentation English
Feedback

VeniceDB

a Peta-byte scale real time analytics service running Postgres on Azure

VeniceDB, a Peta-byte scale real time analytics service running Postgres on Azure

VeniceDB is a large scale OLAP service with a custom build of Postgres/Citus on Azure. Since the public talk at PostgresSV2018, the cluster has grown to 1PB to support Microsoft CoreOS executive decision dashboard that hosts all measures that cover from edge devices to cloud OS. VeniceDB has been in production for about a year, and went through a few major revisions while keeping the service running. During each revision, we keep tuning the data model and indexing to improve the data ingestion and query performance. We also migrated from Postgres10 to Postgres11, which helps to reduce the cluster cost by 30%.

This talk covers why we chose Postgres and Citus as the foundation, and how we build a unified storage to serve various measure data needs. During our journey, we not only replaced traditional MapReduce based cubing jobs but also replaced a columnar storage cluster. We will share our perspectives on row storage vs column storage, and how we compare with other OLAP solutions like Druid, etc. and our wish list for future Postgres improvements.