FireflyPGCon 20222022-05-242022-05-274https://www.pgcon.org/events/pgcon_2022/schedule/10:0001:00Stream 1Unconference - Session Pitches and SchedulingCoffee & light snacks will be availablel
Please read the links below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 1](https://www.pgcon.org/2022/stream1.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/328/11:1501:00Stream 2Unconference Room #2Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 2](https://www.pgcon.org/2022/stream2.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/340/11:1501:00Stream 1Unconference Room #1Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 1](https://www.pgcon.org/2022/stream1.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/339/11:1501:00Stream 3Unconference Room #3Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 3](https://www.pgcon.org/2022/stream3.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/341/12:1501:00Stream 1Unconference LunchThe room will remain open for the hallway track.https://www.pgcon.org/events/pgcon_2022/schedule/session/332/12:1501:00Stream 3Unconference LunchThe room will remain open for the hallway track.https://www.pgcon.org/events/pgcon_2022/schedule/session/342/12:1501:00Stream 2Unconference LunchThe room will remain open for the hallway track.https://www.pgcon.org/events/pgcon_2022/schedule/session/343/13:1501:00Stream 1Unconference Room #1Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 1](https://www.pgcon.org/2022/stream1.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/329/13:1501:00Stream 2Unconference Room #2Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 2](https://www.pgcon.org/2022/stream2.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/330/13:1501:00Stream 3Unconference Room #3Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 3](https://www.pgcon.org/2022/stream3.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/331/14:3001:00Stream 3Unconference Room #3Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 3](https://www.pgcon.org/2022/stream3.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/335/14:3001:00Stream 1Unconference Room #1Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 1](https://www.pgcon.org/2022/stream1.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/333/14:3001:00Stream 2Unconference Room #2Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 2](https://www.pgcon.org/2022/stream2.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/334/15:4501:00Stream 3Unconference Room #3Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 3](https://www.pgcon.org/2022/stream3.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/338/15:4501:00Stream 1Unconference Room #1Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 1](https://www.pgcon.org/2022/stream1.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/336/15:4501:00Stream 2Unconference Room #2Please read the link below for full details.
[What is an unconference?](https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ)
This session is live on [stream 2](https://www.pgcon.org/2022/stream2.php). The zoom meeting details will be posted later.https://www.pgcon.org/events/pgcon_2022/schedule/session/337/09:0000:45Stream 1Advanced database testing in CI/CD pipelinesUsually, CI/CD pipelines involve stateless components – nobody likes to wait dozens of minutes or hours while a pipeline is running. So in most cases, the maximum volume of data generated or copied during testing is limited to gigabytes.
In this talk, we will explore an unusual but very promising land: testing using thin clones. Database Lab Engine (DLE), developed by Postgres.ai, is an open-source tool that allows to:
- clone Postgres database of any size in just a few seconds using copy-on-write provided by ZFS or LVM, and
- run dozens of thin clones on a single machine, performing dozens of independent experiments, development, and testing activities.
These abilities open new horizons. Now it is possible to:
- have realistic tests using "recycled" full-size database clones,
- perform regression testing to see if some business-critical SQL query is going to have performance degradation after a proposed change,
- reach 100% coverage for automated testing of all DB changes using full-size databases.
I will present the most recent findings and experience of building CI/CD pipelines involving advanced DB testing in a fully automated fashion. I will also provide practical advice, covering various aspects of observability, security, and change management workflow.https://www.pgcon.org/events/pgcon_2022/schedule/session/267/Nikolay Samokhvalov09:0000:45Stream 3Performance improvement of filtering foreign tables using subquery for PostgreSQLThere are many cases where filtering is performed on one table based on the search results of another table.
In this case, it is possible to perform such queries by using JOIN clause. However, if each of tables is a foreign table, all data will be fetched from remote servers. As a result, because unnecessary data is also retrieved, so there is a problem that the expected performance cannot be obtained.
Therefore, I made a prototype to generate a query plan that can execute the subquery first when the join condition is specified by the IN clause using the subquery expression, instead of generating a simple joion plan such as hash join. I confirmed the effect of speeding up to several tens of times.
In this session, I will explain specific problems, solutions and changes, performance, and future works.https://www.pgcon.org/events/pgcon_2022/schedule/session/273/Shigeo Hirose09:0000:45Stream 2Database Disasters and How to Find ThemYou get the call at three in the morning: "The application is throwing 500 errors. We think the database is down." What do you do?
Database problems can come in a nearly-infinite range of types. We don't have infinite time, but we can talk about the most common ones, and go through a step-by-step process in how to diagnose them, repair them, and bring the system back up in record time. A careful and methodical approach is essential to not making a bad situation worse, and getting the database back on all four feet quickly.
We'll cover different kinds of service unavailability, data corruption, underlying host failures, and how to react to different scenarios. Use this advice to help build your run-book of how to react to those early-morning texts.https://www.pgcon.org/events/pgcon_2022/schedule/session/262/Christophe Pettus10:0000:45Stream 3Highly efficient interconnection for distributed PostgreSQLThere are several ways to build a sharded database on top of distributed postgres instances. One of the most interesting and general approach is a built-in support for sharding. Historically postgres has fdw and partitioning features that can be used together to build a sharded database. There were concerns in the past about adoption them for a complete solution. We will review the current state of postgres fdw along with patches, that fix some significant bottlenecks and demonstrate the latest results in TPC-C and TPC-H benchmarks with comparison to existing sharding solutions.
Despite the promising results, our experiments with postgres fdw revealed the fundamental issues that are still exist and make it hard to build an efficient system for most of the workloads. We'll discuss these issues and show a general approach that solves them for a cluster of homogeneous postgres instances. In the same time it is based on fdw and partitioning and most of the changes are implemented as extensions. It consists of two independent components. The first part is a transport that allows to use only single connection between each of the nodes. It leads to M+N connections in the cluster in total instead of M*N where M is a number of client connections and N is a number of nodes. We'll show the implementation of such a multiplexing transport that achieves performance of 1 million pings/s between nodes as a single background worker process. The second part is an integration of postgres planner, executor and transaction support with the new transport to allow execution for all types of the queries. The implementation provides low latency for transactional queries. We achieved a single instance performance on two node cluster for simple queries with near linear scalability. The unmodified postgres fdw setup gives a single instance performance only on 8 nodes cluster in this test. Our approach also allows efficient execution of distributed analytics queries. This work is in progress and some results will be demonstrated in the talk. Both extensions will be open sourced and published on github.https://www.pgcon.org/events/pgcon_2022/schedule/session/239/Dmitry UrsegovTeodor Sigaev10:0000:45Stream 1Logical replication in PostgreSQL is moving forwardPostgreSQL 15 and beyond will see a variety of new features such as row filtering, sequence replication, and improved error handling. Many more features are in progress for future versions. In this presentation, I want to lay out and discuss our roadmap for logical replication in PostgreSQL.
Topics include:
- row filtering
- column filtering
- sequence replication
- schema syncing
- error handling
- skipping transactions
- less reliance on superuser
- logical failover
- conflict handling
- perspectives on pglogical and BDR
Depending on progress in the intervening months, some of these features will be in PostgreSQL 15 by the time PGCon happens, so we can preview these features at that time. In any case, there will be a lengthy list of things we plan to work on beyond PostgreSQL 15, and of course we are looking for other community members to join these
efforts.https://www.pgcon.org/events/pgcon_2022/schedule/session/226/Peter Eisentraut10:0000:45Stream 2Logical Change Records, the logical WALsUnlike physical replication, which does not understand a "bit" of WAL record, Logical Replication interpretes WAL records and sends changes to the replica in a "logical" form. The process of converting a WAL record into a Logical Change Record (LCR in short) is time and resource consuming. Each WAL sender, publishing from a given set of publications, repeats this process producing same LCR stream for a given WAL stream. LCRs when saved to the disk can be used by many WAL senders, thus reducing CPU and memory consumption.
In this presentation, we will discuss the everything related to LCRs and how it helps in BDR to support hundreds of WAL senders.https://www.pgcon.org/events/pgcon_2022/schedule/session/231/Ashutosh Bapat12:0000:45Stream 3CREATE INDEX CONCURRENTLY implementation detailsFrom the very beginning of 2021 until late summer, I was fixing a bug in CREATE INDEX CONCURRENTLY. This bug manifests extremely seldom. I could not find any traces of it in mailing lists ever since the introduction of the feature in 8.2. Yet the bug affected our systems, so I decided to fix it. This is how the detective bug-hunting story started. In this story, I'll share details of the implementation of transactions and lock mechanics. This knowledge may be of interest to DBA or hackers who want to go deeper into source code.https://www.pgcon.org/events/pgcon_2022/schedule/session/256/Andrey Borodin12:0000:45Stream 2Neon, cloud-native storage backend for PostgreSQLNeon is a new storage backend for PostgreSQL. It separates storage from compute and leverages cheap cloud storage. It enables features like immediate restore to any point in time, branching, and running multiple read-only nodes against the same shared storage.
Neon is a work in progress. In this presentation, I will walk through the architecture and discuss the characteristics and the new capabilities that it enables, and the current status.https://www.pgcon.org/events/pgcon_2022/schedule/session/236/Heikki Linnakangas12:0000:45Stream 1Benchmarking your commits: The Postgres Performance FarmThis talk presents one of the newest addition to the Postgres projects: the Performance Farm, a full stack infrastructure running benchmarks on Postgres commits. The application consists in a script automatically cloning and building Postgres, to then run benchmarks and upload its results to a website, testing different branches and commits. Its aim is to help developers to understand performance changes and behavior after each modification to the code base, providing an overview on different indicators and trends. Started as Google Summer of Code idea over the span of three years, now the infrastructure is finally ready and its functioning and results can be presented - come and see how it works!https://www.pgcon.org/events/pgcon_2022/schedule/session/241/Ilaria Battiston13:0000:45Stream 2PostgreSQL HA with Pgpool-II and whats been happening in Pgpool-II lately....Pgpool-II has been around to complement PostgreSQL for over a decade and provides many features like connection pooling, failover, query caching, load balancing, and HA. High Availability (HA) is very critical to most enterprise applications, the clients need the ability to automatically reconnect with a secondary node when the primary node goes down.
This is where the Pgpool-II watchdog feature comes in, the core feature of Pgpool-II provides HA by eliminating the SPOF is the Watchdog. This watchdog feature has been around for a while, but it went through major overhauling and enhancements in recent releases. This talk aims to explain the watchdog feature, the recent enhancements that went into the watchdog, and describe how it can be used to provide PostgreSQL HA and automatic failover.
Finally, I will summarize the major features that have been added in the recent major release of Pgpool-II and what's in the pipeline for the next major release.https://www.pgcon.org/events/pgcon_2022/schedule/session/252/Muhammad Usama13:0000:45Stream 3Common DB schema change mistakesOne of the easiest ways to "create a heavy load from nothing" is to write a couple of DDL commands and deploy them to production without thorough analysis and testing.
In this talk, we will discuss several widespread mistakes that many developers and DBAs regularly make during active application development, when there is a need to change DB schema – from adding new database objects or columns to refactoring and optimization of the existing schema.https://www.pgcon.org/events/pgcon_2022/schedule/session/268/Nikolay Samokhvalov14:0000:45Stream 2Simplifying the TPC Benchmark C, an OLTP WorkloadSeveral fair use open source kits exist that emulate the venerable TPC Benchmark(TM) C (TPC-C). Running a full scale specification compliant TPC benchmark is not a trivial task but it is possible to simplify how the benchmark is executed. Yet it is still not necessarily straightforward how these kits should be used to get the most meaningful results from them.
The way this benchmark is meant to be run is not necessarily workstation or budget friendly. A properly executed large scale TPC-C benchmark can require a significant amount of hardware resources to emulate 100,000, 50 millions, or 500 million users.
As we are people working on improving PostgreSQL and evaluating its capabilities, we all would like something easier to run and less expensive to use.
We can cut down the demands for resources by driving the workload differently, but driving the workload differently creates a different system usage profile. In order to run a meaningful test we need to understand how this changes the characteristics of the system behavior.
Learn what it means to eliminate keying and thinking time, touching only specific parts of the database, and other ways to use this workload.https://www.pgcon.org/events/pgcon_2022/schedule/session/261/Mark Wong14:0000:45Stream 1Develop an Oracle-compatible database based on PostgresThere are many users who need to migrate their applications from Oracle to the open source Postgres, but in order to support the new database, customers often need to redevelop the application, which is inconvenient. If there is a database based on Postgres and compatible with most Oracle syntax and functions, it will be too convenient for customers. However, the official Postgres will not accept this kind of code submission. After all, Postgres is Postgres, and Oracle is Oracle.
So, let's make an Oracle compatible database.
This talk will introduce how to make a database compatible with Oracle syntax based on PG, and introduce the IvorySQL project.
This project is an open source project (Apache 2.0) led by Highgo Software, and currently has released version 1.x based on PG14.
Welcome everyone to develop this open source Oracle compatible database - IvorySQL, based on Postgres.https://www.pgcon.org/events/pgcon_2022/schedule/session/258/Grant Zhou15:0000:45Stream 1Breaking away from FREEZE and Wraparound.Vacuum is one of the most important features of the PostgreSQL system. This is because it not only recovers garbage space, but also manages transaction IDs and multi transaction IDs, called freezing, and reuses 32-bit XID space.
The PostgreSQL databases are protected from the problem of XID wraparound thanks to the freezing of vacuum. On the other hand, although Freeze Map, INDEX_CLEANUP option, the failsafe mode, and many other features have been introduced to enhance the freezing, I've still seen that freezing is still a source of annoyance to users and cases where it consumes disk I/O or the system becomes read-only due to XID wraparound.
This kind of problem has been discussed for a long time, along with various proposals such as the use of 64-bit XIDs. To resolve this problem, we need to consider the solution while considering possible side-effects, for example, on performance and disk compatibility, etc. Also, are freezing and 32-bit XID really bad ideas? Are there any better ideas?
In this session, I'll redefine the problems of freezing and XID wraparound in practice and explain feasible proposals with experimental results.https://www.pgcon.org/events/pgcon_2022/schedule/session/245/Masahiko Sawada15:0000:45Stream 3Citus ColumnarColumnar storage is crucial for analytic workloads because it improves scan speed, reduces IO, and has a smaller storage footprint. Often by 10X!
Citus Columnar is open source and brings columnar storage to Postgres. Even better, it's fully integrated in postgres as a pure extension using the Table AM API. That means that it's fully recognized by Postgres as a "normal" table.
This talk will address use cases, limitations, and there will be a live demo!https://www.pgcon.org/events/pgcon_2022/schedule/session/270/Jeff Davis11:1501:00Stream 2Triggers: How It Works in PostgreSQL InternalsTriggers are a basic feature of database systems and it can be used for audit logging, automatically populating other tables, checking or enforcing complex constraints, and so on. Actually, the foreign key constraint in PostgreSQL is relying on the triggers internally.This talk will explain how triggers work in PostgreSQL internals. I will cover how triggers are managed, how trigger functions are fired, how it is used internally, and the relationship to constrains. This will be helpful in understanding PostgreSQL's triggers in more details.https://www.pgcon.org/events/pgcon_2022/schedule/session/277/Yugo Nagata11:1501:00Stream 3Does anyone really need RAC?One of the main reasons we hear for not being able to migrate from Oracle to PostgreSQL is the fact that “We absolutely, definitively must have RAC”.
Despite its cost and complexity, RAC is widely used in response to a variety of (real or imagined) performance, high availability and/or scalability requirements. PostgreSQL does not have an equivalent to RAC. But does anyone really need it?
In his 2003 white paper “You probably don’t need RAC”, Mogens Norgaard asserts that “If you have a system that needs to be up and running a few seconds after a crash, you probably need RAC. If you cannot buy a big enough system to deliver the CPU power and or memory you crave, you probably need RAC… Otherwise, you probably don’t need RAC. Alternatives will usually be cheaper, easier to manage and quite sufficient”.
Does this statement still hold true and what are the alternatives that are available to us in PostgreSQL?https://www.pgcon.org/events/pgcon_2022/schedule/session/198/Karen Jex11:1501:00Stream 1Developing Postgres' Prefetching AlgorithmLearn more about developing and refining an algorithm for application storage prefetching.
Storage prefetching is a necessary component of Postgres' proposed new IO paradigm: direct and asynchronous IO. Though prefetching is already in use in limited contexts in Postgres (e.g. bitmap heap scan), without kernel readahead, Postgres must implement additional prefetching of its own.
This talk will cover the basics of storage prefetching, the process of developing and refining a prefetching algorithm, and the details of the prefetching algorithm proposed with the direct and asynchronous IO patch set.https://www.pgcon.org/events/pgcon_2022/schedule/session/274/Melanie Plageman12:1501:00Stream 2Practical use case for OrioleDBOrioleDB is a new engine for PostgreSQL providing a solution to its long-term problems including but not limited to: vacuum, xid wraparound, write amplification, poor scalability, and so on.
This talk covers the experience of OrioleDB usage in the gaming industry. We will talk about real-life performance gains, caveats and gotchas during migration to the new engine, and more.https://www.pgcon.org/events/pgcon_2022/schedule/session/282/Alexander KorotkovAliaksei Ramanau12:1501:00Stream 1Cloning Postgres databases on Kubernetes: how we designed database forks in a managed database-as-a-serviceDatabase forks, known as clones, bring in a simple way to copy a production database into the staging one or use the data for testing queries, allow selective point-in-time data recovery without disrupting the primary database and provide a way to experiment with production database configuration without downtime. In this talk, I’ll describe how we designed and built this feature based on open-source tools well known in the PostgreSQL community:
I’ll start by describing how we run PostgreSQL with TimescaleDB on Kubernetes, and how we used pgBackRest, Patroni, and various AWS/Kubernetes objects to implement database forking.
I’ll dive deep into our motivation to implement forks and talk about various types of forks - i.e point in time recovery, follower forks, snapshot-based forks, and how they can be implemented based on underlying open-source technologies.
Lasty, I’ll describe our approach to test backups recoverability and recover PostgreSQL clusters from the storage failures.https://www.pgcon.org/events/pgcon_2022/schedule/session/279/Oleksii Kliukin13:1501:00Stream 3I love them, you'll love them: the connection strings!PostgreSQL allows you to use a connection string to describe how to connect to
your database. Well, you might have already known that... But do you know that
depending on who wrote the different PostgreSQL client applications provided in
the client package, you will need to use a different syntax?
After explaining what are connection strings and why you really need them, we'll
explain how to use them with every PostgreSQL client application so everyone can
conclude it doesn't make sense. Then, we'll try to explore how we could make it
better for Postgres>16!https://www.pgcon.org/events/pgcon_2022/schedule/session/222/Lætitia Avrot13:1501:00Stream 2Improving PostgreSQL's mysterious SLRU subsystemAlongside relation data (tables and indexes), PostgreSQL also manages 7 different sets of "SLRU" files. This talk will go into detail about the most important ones for transaction management. Topics will include:
* Overview of the 7 SLRUs
* The commit log's design and history
* The multixact log's design and history
* Why don't other databases have those things?
* Access patterns
* Format, I/O and buffering
* How can we make all of this faster and more reliable?
* Proposals to unify with the main buffer pool
* Experimental resultshttps://www.pgcon.org/events/pgcon_2022/schedule/session/287/Thomas Munro14:3001:00Stream 2How to Build a Distributed & Secure Database Ecosystem with PostgreSQLAs the most popular open source relational database in the world, PostgreSQL keeps attracting the significant attention it deserves. With the ever increasing data storage and query requirements, new challenges are brought forward for horizontal elastic expansion and security of the PostgreSQL database.
How to provide existing PostgreSQL databases with incremental capabilities such as data sharding, data encryption and other functions is of great concern to many PostgreSQL users.
This session will focus on introducing how to empower PostgreSQL thanks to the ecosystem provided by Apache ShardingSphere - an open source distributed database, plus an ecosystem users and developers need for their database to provide a customized and cloud-native experience. ShardingSphere doesn't quite fit into the usual industry mold of a simple distributed database middleware solution. ShardingSphere recreates the distributed pluggable system, enabling actual user implementation scenarios to thrive and contributing valuable solutions to the community and the database industry.
The aim of ShardingSphere is the Database Plus concept.
Database Plus sets out to build a standard layer and an ecosystem layer above the fragmented database's basic services. A unified and standardized database usage specification provides for upper-level applications, and the challenges faced by businesses due to underlying databases fragmentation get minimized as much as possible. To link databases and applications, it uses traffic and data rendering and parsing. It provides users with enhanced core features, such as a distributed database, data security, database gateway, and stress testing.
ShardingSphere uses a pluggable kernel architecture for Database Plus. That means there's modularity, which provides flexibility for the user. Demos and notable use cases in production environments that are from the Asia equivalents of FAANG (Facebook, Amazon etc.) will be used to introduce the use and implementation of these functions for PostgreSQL databases.https://www.pgcon.org/events/pgcon_2022/schedule/session/309/Trista Pan14:3001:00Stream 1PostGIS family of ExtensionsThis talk will cover the various extensions packaged with PostGIS https://postgis.net. You'll learn what kinds of problems each extension is designed to solve with some live demos.
Extensions covered:
* postgis - examples of geometry processing, proximity analysis
* postgis_raster - using raster in conjunction with vector data
* postgis_topology - cleaning up data, maintaining pristine data sets
* address_standardizer - parsing addresses into individual elements
* postgis_tiger_geocoder - geocoding using US Census tiger data
* postgis_sfcgal - working with 3D geometries and 3D processing
If time available, showcase features coming in PostGIS 3.3https://www.pgcon.org/events/pgcon_2022/schedule/session/299/Regina Obe14:3001:00Stream 3Will Postgres Live Forever?This presentation explains how open source software can live for a very long time, and covers the differences between proprietary and open source software life cycles. It also covers the increased adoption of open source, and many of the ways that Postgres is innovating to continue to be relevant.https://www.pgcon.org/events/pgcon_2022/schedule/session/313/Bruce Momjian15:4501:00Stream 2Moving pg_basebackup ForwardAssuming no patches are reverted before release, PostgreSQL 15 will feature significant improvements to pg_basebackup and the server-side BASE_BACKUP command which implements it, including server-side compression and backup targets. In this presentation, I'll talk about the work I and others have done to improve the state of pg_basebackup for PostgreSQL 15, and give an overview of possibilities for the future. In particular, I'll give my opinion on the prospects for parallel and incremental backup: how likely is it that we will get such features, and what will they look like if we do?https://www.pgcon.org/events/pgcon_2022/schedule/session/301/Robert Haas15:4501:00Stream 3New TOAST in Town. One TOAST FITS ALL. - part 2Postgres handles long attributes in table by compressing and slicing them to the chunks, which stored in hidden corresponding relation, these technology called TOAST. Any access to such attribute requires lookup chunks using Btree index and then implicit joining to the hidden relation, which can cause a catastrophic slowdown of all queries to such attributes. Nowaday, users prefer to use JSON because of its flexible nature and ubiquitousness, they often operate with long JSON and have real performance issues. Unfortunately, current TOAST is too universal in regards to the types of data, it has no knowledge about internal structure of attribute an consider it as a black box. We propose a new pluggable TOAST API, which allows per data type optimizations of storage for long attributes. I will also show several examples of how to use the new TOAST and what benefits it brings, including streaming to table using appendable bytea and very efficient JSONB, which demonstrates orders of magnitude performance gain on selects and updates.https://www.pgcon.org/events/pgcon_2022/schedule/session/229/Oleg Bartunov15:4501:00Stream 1Thinking about the logical databaseWhen it comes to the design of the internal components of PostgreSQL, history matters. Earlier designs create ripples that affect later designs.
Extensible indexing created the need for VACUUM, since without VACUUM it is far from obvious how transaction rollback could ever work, at least with GiST and GIN indexes. Transaction rollback that is decoupled from the physical representation of data (compared to traditional designs based on two-phase locking) was necessary even before Postgres added multi-version concurrency control.
This talk will describe a conceptual framework for discussing whether something is an essential part of storing data transactionally from the point of view of users, or whether it is an inessential implementation detail of transaction management and storage, that could in principle be implemented in many different ways. The former can be categorized as belonging to the logical database, while the latter can be categorized as belonging to the physical database.
Recent improvements in how the standard B-Tree index access method performs garbage collection to control MVCC version bloat (authored by the speaker) drew upon these concepts. But almost any improvement to the on-disk representation of either tables or indexes has some kind of tension between the logical and physical database. The talk explores the "logical database, physical database" concepts by discussing this recent work, as well as pending work on free space management in the standard heap table access method.https://www.pgcon.org/events/pgcon_2022/schedule/session/308/Peter Geoghegan