PGCon 2010

PGCon 2010 The PostgreSQL Conference University of Ottawa Ottawa 2010-05-18 2010-05-22 5 Final Release III 09:00 00:30 09:00 03:00 DMS 1110 No More Waiting A Guide To PostgreSQL 9.0 Tutorial workshop en Another year, another PostgreSQL release, and once again this release packed full of new features. This talk will give an overview of the new features available in 8.5, and give you pointers to talks during the rest of the conference you'll want to focus on to get the most out of 8.5. Whether you develop apps inside or outside of the database, or you're the one who has to keep them running, this should be your first stop on the road to 8.5. Full description pending stable tree, but we'll have something for sure. Robert Treat http://xzilla.net http://omniti.com 13:00 03:00 DMS 1110 Server Health Check Give your Postgres server a checkup Tutorial workshop en Have you given your PostgreSQL database server a checkup lately? If not, you probably should. "Sick" database servers are easy to prevent if you take a few simple steps <i>before</i> your server comes down with something. Just as regular doctor's checkups help spot disease and chronic health problems in people before they become acute, a "health check" of your PostgreSQL servers will uncover and sometimes prevent database server problems before your website goes down. Vetran database server medic Josh Berkus will go through the various steps PostgreSQL Experts normally takes to make sure customer database servers are "healthy". A regular health check generally includes: * Configuration checkup * Hardware checkup * Resource usage checkup * OS Checkup * Application checkup * Slow query checkup After we've determined the server is "healthy" we then proceed with monitoring and change control setup to make sure it stays healthy, including: * monitoring the database server * monitoring the OS * configuration change management This tutorial will include the details, scripts and tools for all of the above checkups. Become a database medic yourself! Make your database servers healthy. Josh Berkus 09:00 03:00 DMS 1110 Realistic Load Testing HOWTO set up a realistic testing environment using PostgreSQL Functions and Python Tutorial workshop en Applications and databases need testing. But how can you get valid results for a fully integrated system Flight-Check test at realistic loads? This tutorial addresses the many challenges that arise in an application or database development to give confidence to you and your customers in presenting a production-ready product. After running into many obstacles in proving a recent enterprise product launch could handle the expected loads of our customers, Digitec, Inc. invested time in writing realistic Flight-Check tests using PostgreSQL functions and Python. The results of these tests gave confidence to the engineers, developers, and our customers that the entire system would be able to perform as designed. Topics covered during this tutorial include: * Shortfalls of FLOSS benchmark tests * Identifying the Project Test Components * Identifying Realistic Loads * Identifying Historical Data * Developing Tests and Procedures * PostgreSQL Functions for Tests * Python Scripts for Tests * Helpful Tools Zach Conrad 13:00 03:00 DMS 1110 PostgreSQL Access Controls (AuthN, AuthZ, Perms) Controlling Access to your database- Roles; Kerberos, LDAP, SSL, RADIUS(!); Database Permissions Tutorial workshop en An introduction and thorough review of access control in PostgreSQL. All access control will be covered, but special attention will be paid to new features and changes in 8.5. This will include both System Admin configuration specifics (hba.conf) and Database Admin permissions (GRANT system). PostgreSQL offers many options for controlling access, from authentication and log in to the role system and finally the hierarchy of authorization to specific resources. System Administrators and Database Administrators need to understand these complexities to ensure their system is both robust and secure. With 8.5 there have been some changes to existing options and new capabilities (RADIUS support). We will go through all of the authentication options that PostgreSQL offers, focusing on RADIUS (new in 8.5) and enterprise-wide authentication schemes (Kerberos, LDAP, SSL), then walk through setting up roles following best practices and privilege separation, and finally go through the privilege system from database-level down to column-level. Stephen Frost 15:00 03:30 Royal Oak registration Registration pickup The social way to register: at the pub Social other en Pick up your registration pack Stop by the Royal Oak Pub and get your registration pack. You'll help us avoid long line ups on Friday morning and you get to have a beer, and chat with your fellow attendees. We guarantee you'll spot someone famous. Dan Langille 10:00 01:00 DMS 1140 Built-in replication in PostgreSQL 9.0 Version 9.0 lecture en An introduction to the new built-in replication features in PostgreSQL 9.0, Hot Standby and Streaming Replication. A quick walk-through of setting up a hot standby server with streaming replication, and the options available to control it. Discussion on various trade-offs and pitfalls with Hot Standby. Heikki Linnakangas audio 11:30 01:00 DMS 1140 Efficient k-nn search with GiST and other development Hacking lecture en We present implementation of new GiST tree traverse strategy and efficient k-nn search based on this strategy. Also, we'd like to discuss new signature file based index (bloom index), it's implementation and possible improvements. There are many application where efficient k-nn (k-closest neighbourhood) search is very needed, for example, GIS, multimedia search. Currently, k-nn search in PostgreSQL usually emulated using repeated search with changing of"radius" of a query until the number of rows in result will satisfy query. We introduce new strategy of GiST tree traverse (in addition to the original depth-first), based on priority queue, which allows native implementation of efficient k-nn search. On the test database of POI (point of interests), which has 1034170 spots, we got about 300x perfomance gain due to k-nn search. The new feature of GiST doesn't introduce any incompatibilities, the only visible change is that consistent user-defined method now can return not just TRUE/FALSE, but - negative value, which means tuple doesn't match query (like FALSE in old implementation) - 0.0 means one of: - a zero distance (exact match) - a match for filtering clause, like a <@ or @> for point. - positive value, which means the method returns distance. In this case keyRecheck should be false!, since it's impossible to make right order with lossy values. GiST was teached to recognize which algorithm of tree traverse to use (depth-first, or distance based priority queue). In addition, we'd like to present and discuss our new signature file based bloom index. This index is useful if table has many attributes and queries can include their arbitary combinations. Traditional Btree index is faster than bloom index , but it'd require too many indexes to support all possible queries, while one need only one bloom index. Bloom index supports only equality comparison. Since it's a signature file, not a tree, it always should be readed fully, but sequentially, so search performance is constant and doesn't depends on a query. Implementation of Bloom filter (http://en.wikipedia.org/wiki/Bloom_filter) allows fast exclusion of non-candidate tuples. Since signature is a lossy representation of all indexed attributes, search results should be rechecked using heap information. Oleg Bartunov Teodor Sigaev First announcement of knngsit K-nn search in PostgreSQL Bloom index K-nn search on commitfest audio 13:30 01:00 DMS 1140 pg_statsinfo More useful statistics information for DBAs DBA lecture en NTT has developed "pg_statsinfo", that collects database activities and statistics automatically, and shows the information to DBAs in user-friendly shapes. Also pg_statsinfo can collect statistics from multiple DBs, so this tool makes it much easier to monitor the status of many DB servers PostgreSQL provides many useful statistics about DB activities and conditions via system views and contrib modules. But for many DBAs, it is difficult to see whether a DB has problems or not from original statistics information. This presentation will cover the following topics. - How pg_statsinfo collects statistics from PostgreSQL - Architecture of the reporting tool - Requirements for PostgreSQL core to collect more useful information Tatsuhito Kasahara audio 15:00 01:00 DMS 1140 LAPP/SELinux A secure web-application stack using SE-PostgreSQL Advanced Features lecture en Nowadays, many web applications are closely combined with database systems, using the database to provide various kinds of dynamic content. In these environments, you cannot just focus on individual applications, databases and the operating systems. You need to consider the whole system. This session describes why you should apply consistent and centralized access control policy, how SE-PostgreSQL can be utilized to improve web application security, and shows a working example of the stack named as LAPP/SELinux. There are two major issues in web application security that can be improved by using an approach like LAPP/SELinux. In most cases, a web-system consists of multiple layers called a stack, such as LAPP, and each layer of the stack has its own individual access control facilities. It is hard to maintain each of them to always apply consistent access control decisions without a centralized reference. SE-PostgreSQL applies its own access controls based on the SELinux security policy, in addition to the default PG privilege checks. Those policies are also applied on access attempts to the filesystem and other OS resources, so you will always see consistent access control decisions across the system. Another issue is the privileges of web application instances. When these are launched by a web server, they inherit the privileges of the server process. That makes it hard to enforce meaningful access controls, because the OS and DB cannot distinguish among individual users. This leaves you very exposed to bugs and vulnerabilities in your web applications. The Apache/SELinux plus module launches web application instances with individual privileges based on http authentication. Unlike application level checks, these are always applied prior to invocation of the web application, so you can't bypass them. Then when the application requests resources from the OS and DB, they can make their access control decisions based on the privileges assigned. We call this stack LAPP/SELinux. It enables web applications to run with the minimal privilege set appropriate for individual users/groups. We assume audiences are interested in security issues and have basic knowledge of access controls. We do not expect any previous knowledge of SELinux. KaiGai Kohei 16:30 01:00 DMS 1140 To ORM or not to ORM (that's the question) Exploring both DBA's and programmers point-of-view Applications lecture en ORMs (Object-to-Relational Mapping) are a must for programmers, while they are usually a nightmare for DBAs. At the same time, they are large and complex, and underpowered -compared to the database itself-. It's time to rethink ORMs, and let programmers receive input from the database community in a new strategy of collaboration where a new interface (say "API") between both is designed. Recently, ORMs (Object-to-Relational Mapping) are becoming controversial. Most programmers can't simply live without them, and argue that handwritten SQL is cumbersome and error-prone. On the other hand, DBAs cry about the terrible performance and inefficiencies they may induce in the database. Even some programmers state that ORMs are not able to fully exploit the power of the database. Worse, most ORMs are becoming increasingly large and complex, yet they fail to deliver (at least at the ORM abstraction level) what may be considered as basic to intermediate db capabilities, such as triggers, roles or table constraints. This failure of ORMs is also eroding databases' prestige, what in turn feeds a growing community that is advocating for eliminating SQL altogether. So, who's right? What is the future of ORMs? How should them evolve, if not disappear? We don't need to rethink the SQL-relational model --it simply works. What we need to rethink is the DBA-programmers interface (as if it were an API) so that ORMs may fully work. Time is running fast. We have to react. This talk may be best viewed as a call for collaboration between DBAs and programmers. It's a starting point to re-think ORMs and help save the SQL-relational world! (And the PostgreSQL community should have a lot to say about this, so let's do it!) Álvaro Hernández Tortosa audio 10:00 01:00 DMS 1150 Application-level Authorization via SET ROLE Working around connection pooling for permissions Applications lecture en Discussing why integrating application authorization with your database is a good idea, downsides to such integration, implementation gotchas, and finally covering an example implementation. In this talk, we'll be covering handling application-level authorization structures by using Postgres' built-in authorization and gross access control, and how to build a tiered, structured authorization tree in Postgres, both tables and the stored procedures that manipulate those tables. We will also be discussing pitfalls and limitations to using Postgres as an authorization provider, as well as the advantages to doing so. Finally, We will look at Vertically Challenged, an implementation of these ideas using the Python WSGI stack, and how VC uses Postgres to achieve its authorization requirements. Aurynn Shaw audio 11:30 01:00 DMS 1150 PostgreSQL in Mission-Critical Financial Systems Case study of PostgreSQL in the Multicanal project of the brazilian bank Caixa Economica Federal Case Studies lecture en Case study of PostgreSQL in the project "Multicanal" of the brazilian government bank Caixa Economica Federal - the day-to-day activities, challenges, solutions proposed and approved, high availability and high performance in the world's unique Free Software case in this type of public institution. Introduction to the "Multicanal" project, how it works, what it does and the importance for Caixa and the brazilian people. The inclusion of PostgreSQL and other Free and Open-Source Software on the project and Caixa at all. The role of 4Linux and Caixa in the project. Day-to-day challenges in the database administration. Database production and maintenance challenges. PostgreSQL tuning techniques involved in the systems. Tested and approved high availability techniques for PostgreSQL. Today's and future needs. Planned and achieved objectives. Flavio Gurgel audio 13:30 01:00 DMS 1150 Not Just UNIQUE Exclusion Constraints Version 9.0 lecture en UNIQUE is no longer unique among constraints. I authored "Exclusion Constraints" for PostgreSQL 8.5: a more general constraint mechanism that can enforce constraints such as "non-overlapping" as well as unique; and can enforce constraints on GiST or hash indexes as well as BTree. See why other constraint mechanisms are unsuitable for common business requirements -- like handling schedule conflicts -- and how the problems are solved by using Exclusion Constraints. Exclusion Constraints are a more general constraint enforcement mechanism than UNIQUE; new in PostgreSQL 8.5. The constraints specify the conditions under which two tuples conflict, and concurrent updates are resolved with the same semantics as UNIQUE. The existing UNIQUE constraints are a special case of Exclusion Constraints in which the two tuples conflict if all columns in the constraint are equal. Exclusion Constraints allow other operators to be specified. For instance, a reservation system may require that two tuples conflict if the room numbers are equal and the reservation periods overlap (as part of the demonstration, I make use of a user-defined PERIOD data type). Any operator can be specified as long as it is binary, boolean, commutative, and there's an operator class for the required index search (which is used to check for conflicts, much like the existing UNIQUE constraint mechanism). Exclusion Constraints are important because they are easy, scalable, flexible and general to many different business needs. See why alternatives and workarounds all have serious problems and limitations, and how they are solved by using Exclusion Constraints. Jeff Davis audio 15:00 01:00 DMS 1150 Forensic Analysis of Corrupted Databases What to do when things really hit the fan DBA lecture A look at some of the typical symptoms of corrupted databases, the usual culprits which cause problems, and a survey of strategies for correcting problems. Inspired by real reports of corruption on the Postgres mailling lists and with demonstrations of manually introducing and correcting corruption. Greg Stark audio 16:30 01:00 DMS 1150 The PostgreSQL Query Planner Performance lecture en Why does my query need a plan? Sequential scan vs. index scan. Join strategies. Join reordering. Using EXPLAIN. Row count and cost estimation. Things the query planner doesn't understand. Things that are nearly always slow. Redesigning your schema. Upcoming features and future work. Robert Haas audio 09:00 01:00 DMS 1160 Perspectives on NoSQL What NoSQL means to PostgreSQL and why PostgreSQL is YesQL. Plenary lecture en The NoSQL movement has captured the attention of many web developers, often times using the myth that SQL databases like PostgreSQL do not scale as well as newer technologies. We wil examine many of the more popular key/value store databases and illustrate the pros and cons of using a "NoSQL" database, examining the features of the more popular NoSQL alternatives in comparison to PostgreSQL. More importantly, we will address the impact of NoSQL technology at scale as it compares to PostgreSQL and ultimately discover why PostgreSQL is the YesQL alternative to today's upstart database technologies. Key/Value database stores are not new technology, but 2009 was the year of alternative "NoSQL" databases. In this talk, alternative databases such as CouchDB, Tokyo Tyrant, Redis, MongoDB, Cassandra and Project Voldemort will covered, providing a feature and performance comparisons to PostgreSQL. We will examine the technical and business impact of using alternative database technologies such as those listed and review their technical strengths and weaknesses. While the PostgreSQL content itself will be limited to how PostgreSQL compares to these technologies, the content should satisfy both developer and dba interest in the subject. In summary, we will review what PostgreSQL can learn from these newer projects and what needs to be done to broaden PostgreSQL's appeal to the many web development communities. Gavin M. Roy audio Slides (web) 10:00 01:00 DMS 1160 Postgres for non-Postgres people Getting to know the Postgres way Community lecture en Experience in one database system does not always make learning another one easy. Although Postgres is more SQL-compliant than just about anything else, there are plenty of quirks, features, and gotchas that you should be aware of. Postgres has an active and thriving community. We'll explore what makes the project unique, from developer philosophy to SQL features to advocacy, and everything in between. If you are coming from another database system, this will get you up to speed on the important differences between Postgres and everything else. Greg Sabino Mullane audio 11:30 01:00 DMS 1160 Hypothetical Indexes towards self-tuning in PostgreSQL Advanced Features lecture en We propose to add hypothetical (or virtual) indexes in order to offer both what-if querying and automatic index tuning. Hypothetical indexes are simulated index structures created solely in the database catalog. This type of index has no physical extension and, therefore, cannot be used to answer actual queries. The main benefit is to provide a means for simulating how query execution plans would change if the hypothetical indexes were actually created in the database. This feature is quite useful for database tuners and DBAs. Index selection tools, such as Microsoft's SQL Server Index Tuning Wizard, make use of hypothetical (or virtual) indexes in the database server to evaluate candidate index configurations. We have made some server extensions to PostgreSQL 8.* to include the notion of hypothetical indexes in the system. We have introduced three new commands: create hypothetical index, drop hypothetical index and explain hypothetical. After implementing the server extensions for hypothetical indexes, we could use it for future automatic indexing with PostgreSQL besides simples, yet useful, what-if queries. Sergio Lifschitz http://www.inf.puc-rio.br/~postgresql/ audio 13:30 01:00 DMS 1160 Exposing PostgreSQL Internals with User-Defined Functions Easing into PostgreSQL Hacking Hacking lecture en User-defined functions are one of the easiest ways to get started hacking on the PostgreSQL codebase and produce something useful in a short time. Watch a whole new trivial feature get added with one, and learn how to step over some of the more common confusing parts of the codebase along the way. PostgreSQL's user-defined function (UDF) mechanism is useful for all sorts of things. You can add your own custom C code to the database, for performance or extensibility reasons. You can expose database internals that you wouldn't otherwise be able to access. And it's a great way to get started hacking on PostgreSQL with quick results. This talk leads you through a quick tour of creating a UDF that exposes a useful bit of information about how you're using the shared_buffers cache on your system. Consider it a "hello, world" for writing a PostgreSQL patch that adds a tiny feature as a function. You'll learn some tricks for how to find useful code to borrow too. Knowing where some simple examples are at is most of the battle when getting started here. The material is based on several conversations about the most confusing PostgreSQL hacking basics with those completely new to that area, in hopes that you won't have to get stuck on the same things they did. You'll need a basic understanding of coding in C or similar languages to follow the examples, but not any previous exposure to the PostgreSQL code. Greg Smith audio 15:00 01:00 DMS 1160 PgFincore and the OS Page Cache Is my table in memory ? Performance lecture en While PostgreSQL can see the contents of shared buffers, it does not know about the OS page cache, which in turn tells which pages are actually in memory. PgFincore provides this information, which allows us to: - Preload the exact pages that PostgreSQL will probably want in order to respond more quickly to the first queries on server restart. - Try to improve planner choice and cost estimation. It suggest ideas to : - Keep pg_dump from trashing the OS Page Cache - Explicitly ask for a non-cached sequential scan. PgFincore also provides information about how the data in the OS page cache is distributed. You are a DBA, you know your database and your hardware. You are already able to get a lot of information about shared buffers but you still don't know if your tables or indexes are in RAM. Perhaps you have an application which needs to get the best performance from PostgreSQL as soon as it starts. Finally pg_dump trash your OS Page Cache Finally you totally trash your OS Page Cache with pg_dump or like to execute some sequential scan (that's it : a lot of the interesting pages in memory are replaced by those read for the pg_dump). As PostgreSQL actually can not know eactly what happens in the OS Page Cache, the planner may sometime choose a bad plan and you would like to use planner hint. Maybe we can optimise and improve those points. Cedric Villemain audio PgFincore Project Homepage 16:30 01:00 DMS 1160 The Illustrated Elephant Literary modeling and automatic documentation in PostgreSQL DBA lecture en Users of proprietary database management systems are usually bound to use graphical modeling tools with an emphasis on drawing diagrams and generating SQL DDL code from them. The process is generally error-prone and cumbersome, being based on mediochre user interfaces and generating bad SQL representing poor data models, as diagrams can hardly represent the full richness of SQL data models — specially in an SQL flavor like PostgreSQL’s. Many databases are reverse-engineered into entity-relationship diagrams, loosing much information coded into SQL features not directly supported by the diagramming tools, or in the SQL DDL source code which originally created the reverse-engineered database. A well-kept but open secret of many database administrators is reliance on source code and automatic diagramming tools. Breaking free from the misconception that all information should be graphically represented, or even that it should be graphically created, and from the mistaken identification of modeling and drawing, such data modelers are free to use the full power of both SQL and their well-proven, flexible source code tools, all the while generating all the graphics and web pages they could possibly want automatically, using simple, fast programs which can lay out diagrams much better than most drafters. SQL DDL coding can also be nicely combined to literary programming tools, in what we call ‘literary modeling’: interspersing SQL DDL statements in a full text explanation of the model and the reasoning behind it, we can generate both text files for database schema creation, and nicely formatted documents for reading, browsing and reference, both printed and online. These documents can, and typically will, include graphics generated automatically from either the SQL DDL or the database schema itself. It is our tested conviction that this process is much more pleasurable and efficient than the tradicional diagram-based one. We intend to show how, in the course of the normal life cycle of a database, data architects, administrators, programmers and users are stiffled and unnerved by the reliance on graphical, diagram-based modeling tools, the cumbersome processes they require, the problems they engender, the poor models they all but enforce, and the bad code they generate. We will introduce, or rather remind, a few among several tools that can and do ease the productive work on rich, maintainable database models, helping generate databases that use the full range of possibilities enabled by PostgreSQL and its SQL dialect. Focusing on the most formal and familiar representation of SQL database schemas, namely SQL DDL code itself, we profit of the full range of traditional text-based coding tools, such as text editors and utilities, markup and formatting languages, source code versioning and control, build systems, scripts, hypertext and automatic layout of diagrams. Using only tried and tested tools such as NoWeb, LaTeX, HTML, Autodoc, SQL::Fairy and SchemaSpy, and the basic ideas legated by people such as Codd (the relational model itself) and Knuth (literary programming), we present a flexible skeleton process that relies not on secrets, methodologies or black box tools and file formats, but on standards and on well-known interfaces. As we present the tools’ capabilities and possible combinations thereof, both novice and expert users will recognize some of their own practices and, hopefully, a few new ones. We hope to inspire some interest on further implementation of the SQL standard in PostgreSQL, of the relational model aspects in SQL, and on development of the presently existing tools. Early versions of this talk were met with interest by the Brazilian free software community in PgCon BR 2008, PgDay SP 2009, and FISL 10 (2009). Leandro Guimarães Faria Corcete DUTRA audio 17:30 01:00 DMS 1160 lightning Lightning talks Short sharp descriptions of short topics Plenary lightning en A regular feature, PGCon will have a Lightning talks session, with presentations on diverse topics. The format remains essentially the same: in a one hour period, audiences are entertained and informed by a rapid fire series of short talks on interesting new or on-going work by individuals or groups. Slides aer permitted, but not obligatory; pictures are highly recommended. Topic areas include new open source software projects, works in progress for future releases of existing projects, student projects, etc. Lightning talks topics this year may make good conference papers next year! The number of slots is limited, and experience suggests there will be more takers than slots. Sign up well in advance to be assured a spot. Please e-mail <light@pgcon.org> to sign up. Send a one or two paragraph summary of the topic to be presented, and the names of the person(s) presenting it. Also, please give a time estimate -- typically times will be one to five minutes. The time limit will be strictly enforced -- you will be cut off if you try to run over! The Lightning talks e-mail registration deadline is May 16, after which remaining slots (if any) may be signed up for in person. Any slides must be received by the WIP session chair by, at latest, May 19 at 11:59pm GMT. The session chair this year is Selena Deckelmann. -- The speakers were: * PostgreSQL Developer Meeting in Five Minutes – Bruce Momjian * Slony 1 => 2.0 – Steve Singer * PostgreSQL and Twisted – Jan Urbanski * The FlyMine Project – Matthew Wakeling * Enhanced Tools for PostgreSQL – Tomonari Katsumata * Servoy – Christophe Pettus * Tail_n_mail – Greg Sabino Mullane * GSOC – Robert Treat * Pg Staging – Dimitri Fontaine * Serializable Transaction Isolation – Kevin Grittner * 10 ways to wreck your database – Josh Berkus Magnus Hagander Selena Deckelmann audio List of talks + slides 19:00 03:30 The Velvet Room socialouting PGCon 2010 Major Social Event! sponsored by EnterpriseDB Social other en Come and join us for the major social event of PGCon 2010 EnterpriseDB are sponsoring this evening for PGCon 2010 attendees. Dinner and drinks will be provided. See the map on the website for directions to the venue. Dan Langille Map 10:00 01:00 DMS 1140 enova Online financial services & Postgres Staking our claim on open source technologies Case Studies lecture en Enova Financial provides online financial services to under-served consumers in the United States, Great Britain, Australia and Canada. What originally started with a handful of people and a small application running on MySQL has grown into a multi-million dollar business running on Postgres. In an environment where downtime costs hundreds of thousands of dollars a minute, we continue to stake our claim on open source technologies, including Postgres, Skytools, Ruby on Rails and Linux. In this talk, we will explain - How we're using Postgres - Why we switched from MySQL - Why we chose Postgres - What challenges we face using Postgres Jim Nasby audio 11:30 01:00 DMS 1140 PostgreSQL as a secret weapon for high-performance Ruby on Rails applications Applications lecture en This session will cover lessons learned about Ruby on Rails development using PostgreSQL. From the database-centric view the session will explain Rails best practices, taking advantage of RoR strong points, dealing with its weak points, PostgreSQL strong and weak points, and using advanced SQL features in web applications. The session will demonstrate how PostgreSQL is used to speedup Rails code, making slow things in your web application fast and impossible things - possible. The session will also discuss the peculiarities of complex enterprise apps and show that PostgreSQL is an ideal open source match for their development. Topics covered will include: Our experience with Ruby on Rails and PostgreSQL combo: - performance characteristics of Ruby applications - Rails advantages - PostgreSQL advantages How to optimize Rails with PostgreSQL: - doing as much as possible in SQL - preloading attributes and associations - using Postgres' arrays for even faster preloading - generating and executing SQL queries instead of manipulating data with ORM How to optimize Rails application by moving logic to the PostgreSQL database: - efficient trees - efficient pagination - efficient access control system with roles and privileges - efficient data analysis and aggregation How to deal with PostgreSQL limitations: - optimizer forcing subselects for the whole result set despite limit/offset - optimizer not being able to estimate the resulting set size of the generate_series() function call - "in" in where conditions forcing joins - need for pushing down conditions in certain cases - avoiding on-disk sorts - selecting records holding group-wise maximum without windowing functions (pre 8.4) - using "not exists" as a cure to bad performance of "not in" conditions How to make your database faster: - improving shared database performance under severe memory restrictions - realistic explain analyze - Postgres-specific performance tips How to keep your database and application robust: - getting the right compromise between ORM and the database - best practices for database schema development and maintaining data integrity - performance testing and benchmarking - performance monitoring Gleb Arshinov Oleksandr Dymo audio 13:30 01:00 DMS 1140 Probing PostgreSQL with DTrace and SystemTap Performance lecture en Operating system developments in recent years have provided administrators with new and powerful ways of peeking into live, production applications to investigate behaviors and solve problems in real time without significant system impact. PostgreSQL provides several probe points allowing these dynamic tracing tools access to running applications that was formerly available only with a debugger. In this discussion we will explore the DTrace and SystemTap applications and some of their capabilities, with specific focus on PostgreSQL. Ever wished an application's logging were just a bit more detailed? Ever wanted an application to spit out some internal variable, or count the iterations of a section of code? Ever wanted to see exactly what values resulted from some convoluted calculation? Ever been unable to provide these details because you couldn't recompile the application to add the code you needed? Dynamic tracing can provide all those details, on a running, production application, and -- theoretically, at least -- do it all without your users noticing. PostgreSQL provides a wide array of tracing probes which allow users to gather data regarding parsing, planning, execution, storage, locking, and many other back-end behaviors, all without recompiling the application or adding new PostgreSQL code. We'll examine these probes and the capabilities of systems like DTrace and SystemTap to gather and analyze the data they provide. Joshua Tolley audio 15:00 01:00 DMS 1140 Secure PostgreSQL Deployment DBA lecture en PostgreSQL supports several options for securing communications and access when deployed outside the typical webserver/database combination. This talk will discuss the features that make this possible, with some extra focus on the changes in 8.4 and 8.5. PostgreSQL supports several options for securing communications when deployed outside the typical webserver/database combination. This talk will go into some details about the features that make this possible, with some extra focus on the changes in 8.4. The main areas discussed are: * Securing the channel between client and server using SSL, including an overview of the threats and how to secure against them * Securing the login process with methods including LDAP, Kerberos or SSL certificates The talk will not focus on security and access control inside the database once the user is connected and authenticated. Magnus Hagander audio 16:30 01:00 DMS 1140 Using Git to work with PostgreSQL Hacking lecture en The talk will explore using Git to work with PostgreSQL in various roles, including: tester, reviewer, developer, committer and buildfarm owner. It will also explain the issues that were encountered in converting to PostgreSL Buildfarm client to be able to use Git as well as CVS. Andrew Dunstan audio 18:00 02:00 DMS 1140 bof3 We have money, do you have the time? How commercial companies can fund Postgres development. BOF meeting en Postgres has grown to the point where it is very difficult to add missing features without financial support from commercial users and cooperation between companies providing Postgres support as well as general community members. But there are many open questions, such as how should fundraising work and how should those funds be distributed within the community. Jim Nasby 10:00 01:00 DMS 1150 PgMQ PgMQ: Embedding messaging in PostgreSQL Advanced Features en Embedded Messaging with PgMQ, the PostgreSQL Message Queueing add-on PgMQ embeds messaging directly into PostgreSQL so that committed transactions can be published to message queues via various popular messaging protocols (AMQP, STOMP, OpenWire). Supports ActiveMQ (STOMP, OpenWire) and any transport supporting AMQP (such as RabbitMQ). PgMQ easily enables "eventually consistent" replication and/or sharding along customized data boundaries. PgMQ also introduces an index extension that enhances temporal data types (timestamp, date, etc.) by firing an event (trigger) when the value is equal to current_time, an aid to replication and partitioning. This lecture will show how to set up and configure PGMQ, with realtime examples. PgMQ (PostgreSQL Message Queueing) is an add-on that embeds a messaging client inside PostgreSQL. It supports the AMQP, STOMP and OpenWire messaging protocols, meaning that it can work with all of the major messaging systems such as ActiveMQ and RabbitMQ. PgMQ enables two replication capabilities: "Eventually Consistent" Replication and sharding. PgMQ has been developed at Etsy.com, a top 50 internet site, as a solution to its own replication challenges. "Eventually Consistent" Replication means that slaves are not guaranteed to be in sync with the master at any moment in time because of latency in the replication process. However, the data is guaranteed to arrive and eventually (in practice, a very short time) the slave and master are in sync. For many applications, this is acceptable. With PgMQ, "Eventually Consistent" replication is easily done by publishing data commits (insert/update/delete) to a message queue to which slave subscribes. In this scenario, the slave is "eventually consistent" with the master. Most replicated data can be probably work with an "Eventually Consistent" model. PgMQ operates much like a per-row "after" trigger: committed tuples (insert/update/delete) are published to a message queue. Configuration is very easily done, and PgMQ has granularity such that commits on any table can be published to any number of queues, in any of the supported protocols. This enables master data to be easily be "sharded" to various slaves, simply by publishing the sharded data to different queues, to which the sharded data servers subscribe. The presenter is the principal author of PgMQ, and a database engineer at Etsy.com. The presentation will give a brief technical description of PgMQ, then cover how PgMQ can be used to set up "Eventually Consistent" replication. Real examples from the experience of using PgMQ in a production environment will be used. This will be a very good presentation for anyone who has faced the issue of replication with PostgreSQL. Chris Bohn audio 11:30 01:00 DMS 1150 Replication Panel Scaling Out en audio 13:30 01:00 DMS 1150 Check Please! What Your Postgres Databases Wishes You Would Monitor DBA lecture en Compared to many proprietary systems, Postgres tends to be pretty straight forward to run. However, if you want to get the most from your database, you shouldn't just set it and forget it, you need to monitor a few key pieces of information to keep performance going. This talk will review several key metrics you should be aware of, and explain under which scenarios you may need additional monitoring. Compared to many proprietary systems, Postgres tends to be pretty straight forward to run. However, if you want to get the most from your database, you shouldn't just set it and forget it, you need to monitor a few key pieces of information to keep performance going. This talk will review several key metrics you should be aware of, and explain under which scenarios you may need additional monitoring. Robert Treat audio http://omniti.com http://xzilla.net 15:00 01:00 DMS 1150 Replacing GEQO Join ordering via Simulated Annealing Advanced Features lecture en Finding the optimal join order for an arbitrary number of relations is an NP-hard problem. For small queries applying exhaustive search is feasible, but the runtime and memory consumption make that approach impractical in many real-life applications. PostgreSQL's answer to that problem is GEQO: the genetic query optimizer. It employs heuristics similar to those commonly used for solving the Travelling Salesman Problem. However, recent studies suggest other randomized algorithms could yield better results in shorter time. One such approach, called Simulated Annealing, will be presented, along with a prototype implementation that you can load and try against your most monstrous queries. A way of determining the join order for a query is picking a random solution of the problem and transforming it into similar solutions, also generated randomly, accepting only those that improve the quality of the final execution plan. This method has a severe drawback of being easily trapped in local minima, thus delivering an inferior plan to one that a different algorithm could possibly find. Simulated Annealing (SA) tries address this issue by mimicking the [annealing](http://en.wikipedia.org/wiki/Annealing_%28metallurgy%29) process used in metallurgy to increase the durability of materials. The idea is to heat the system to a certain temperature, then slowly cool it down, allowing it to settle in a state of minimum energy. Practically, it means choosing a random solution and generating its neighbours, but also accepting plans that are worse than their ancestors, with probability dependant on the current system's "temperature". With the constant decrease of the temperature, this algorithm hopes to finally arrive at a state that has a reasonably low cost, while avoiding being stuck in a local minimum. Applying Simulated Annealing in PostgreSQL means dealing with several issues: * finding a good starting plan * randomly generating valid plans in the presence of join order restrictions * comparing the costs of each generated plan * choosing the starting temperature * adapting the speed of temperature reduction The proposed experimental module offers solutions to these issues and eventually intents to replace GEQO with a SA-based approach, while still being able to decently optimize even the most nightmarish queries out there. Jan Urbański audio SA module for PostgreSQL (source) 16:30 01:00 DMS 1150 Postgres-XC, Write-scalable, synchronous multi-master PostgreSQL cluster with shared nothing approach Scaling Out lecture en We will present a new PostgreSQL cluster called Postgres-XC (Extensible Cluster) developed by NTT and EnterpriseDB. Postgres-XC's performance is write-scalable. It also provides synchronous multi-master capability. Updates through one master are visible from any other masters immediately after the commit. At present, Postgres-XC is focusing on the transactional use case of the cluster. With a DBT-1-based benchmark, Postgres-XC has achieved a 3.4 scalability for five servers and 6.4 scalability for ten servers. We will explain the main features of Postgres-XC, key algorithms and implementation, as well as the latest performance achievement. Postgres-XC is going to be an open source project. Further technical issues and the future plan will also be presented. Koichi Suzuki Mason Sharp audio Postgres-XC Wiki 18:00 02:00 DMS 1150 cluster Cluster-Hackers BOF cluster like you mean it BOF meeting en Anyone interested in working on clustering solutions to PostgreSQL please attend this BOF. Topics covered will include clustering features in core postgres, documenting clustering solutions, and projects under current development including PostgresXC. Josh Berkus 10:00 01:00 DMS 1160 PL/Parrot Yep, there's actually code now! Advanced Features lecture en Calling functions written in one PL from another shouldn't be painful, and with PL/Parrot, it won't be. Parrot is a virtual machine built explicitly to serve the needs of dynamic languages. Using it as a basis for PLs will make interoperability automatic. David Fetter Jonathan Leto audio Jonathan Leto's Repository 11:30 01:00 DMS 1160 Monitoring PostgreSQL Buffer Cache Internals Watching disk caching inside the database Performance lecture en When you give your database server memory, you expect it's going to use it. But for what? A look inside PostgreSQL's buffer cache can tell you exactly what that memory is doing for you. Every systematic database tuning effort should include a look at this critical resource. When it comes to optimization work, profiling beats guessing every time. PostgreSQL keeps most of its working data inside a block of shared memory allocated when the server starts, used for caching disk reads and writes. Looking at the contents of that cache can give you valuable clues to how your database application really works. The best ways to handle many types of optimization tasks involve carefully measuring the variables you're changing, but most people change the size of this cache without any plan for measuring the impact. Information about data moving in and out of the cache is useful for performance tuning, query optimization, system monitoring, and even predicting the future! This presentation aims to describe the basics of how the cache is organized, how to query its contents, and how to interpret the results of those queries. By monitoring what goes in and out of the cache, you get a unique window into what's really happening inside your database when it's running your application. Greg Smith audio 13:30 01:00 DMS 1160 2 years of londiste PostgreSQL usage at Hi-Media Case Studies lecture en Hi-Media online services all run atop PostgreSQL, and use some form of replication. This talk will present what problems we solve with replication, and how. As we only use Skytools (Londiste) for replicating data, the talk will summarize what we've found in this project after having been using it for 2 years in production. From the community aspects to the failure experience and the impact on the database management (rollouts, etc). Dimitri Fontaine audio 15:00 01:00 DMS 1160 Reconciling and comparing databases redux Deploying and testing triggers and functions in multiple databases Case Studies lecture en The Millburn Corporation is a hedge fund which uses complex data-driven trading models based on the daily prices of various commodities, currencies and other inputs. As part of our application development process, we use independent staging and development instances of our production database to let us have a smoother and less mistake-prone deployment of new models and price streams. Last year, I presented a talk at PgCon 2009 that examined in broad detail how we make heavy use of different schemas to compare and reconcile data between our different environments. In this talk, I'll examine in detail our attempt to solve a problem we face in our database environment: how to test complex triggers and functions before they're deployed; and how to reconcile and track changes in these functions as they move from our development to our staging and production environments. Specifically, I'll describe how we make use of subversion, pg_dump and pgTAP to roll trigger and function changes to our development environment nightly; and how we test the integrity of the functions and trigger functions on our staging and production environments nightly using pgTAP. Last year's talk covered how we use simple cross-schema queries and DBI-Link (for cross-database queries) to reconcile data between different databases. In more complex cases, we also use inherited tables with non-overlapping sequences and custom accessor functions to access different data streams in different schemas. Function comparison and trigger comparisons across schemas and databases are more complex. I will examine how we use MD5 hashes to compare function bodies; walk through most of the common system catalog queries we use to verify function arguments and return values -- stealing ideas from Greg Sabino Mullane's check_postgres script; and then show the testing and deployment framework we've developed at Millburn to test triggers and functions using temporary schemas and fixture files (canned data) to regression test our functions. Difficulties to be discussed: how to keep the same trigger code in different environments, even if the trigger must use different search paths in different environments; pitfalls of function caching; many other issues tk... Norman Yamada audio 16:30 01:00 DMS 1160 PL/Perl - new features in 9.0 Applications lecture en Find out all you need to know about the new PL/Perl features in PostgreSQL 9.0 New features include: - New utility functions: `quote_literal`, `quote_nullable`, `quote_ident`, `encode_bytea`, `decode_bytea`, `looks_like_number`, `encode_array_literal`, `encode_array_constructor`. - The `use` and `require` commands can be used in plperl for modules that have already been loaded, like `strict` and `warnings`. - The `features` module is pre-loaded for perl 5.10+. - Better integration with tools like [Devel::NYTProf](http://search.cpan.org/perldoc?Devel::NYTProf). - `END` blocks and object destructors are run at session end. - Added `plperl.on_init`, `plperl.on_plperl_init` and `plperl.on_plperlu_init` GUCs for DBA use. Your plperl functions can now use external Perl modules, if your DBA allows. I'll talk about the security implications of this. I'll also demonstrate using [Devel::NYTProf](http://search.cpan.org/perldoc?Devel::NYTProf) to performance profile your PL/Perl functions. Tim Bunce audio 17:30 00:30 DMS 1160 Closing sessions prizes, auctions, fun, games Plenary other en The Traditional Closing Session Watch the video. We raised thousands for charity. Dan Langille audio 18:00 02:00 DMS 1160 bof1 Testing BOF Testing your database BOF meeting en Calling all database testers! Come to the PostgreSQL testing BOF. We'll discuss the latest developments in database testing, as well as the state of testing in core. Not to be missed, your future will be decided whether you come or not! David E. Wheeler 18:30 05:00 Royal Oak pub Pub Night! Last chance for social intercourse before the Touristy stuff tomorrow Social other en The last big social event... Be there or miss out. :) Dan Langille 09:30 05:00 Out and about tourist Tourist stuff Spend some time exploring Social other en Explore Ottawa Ottawa has a large number of great attractions. Spend some time looking around and explore. Spend as much time as you want with us, or leave early. We will walk everywhere we go. Wear sensible shoes. Bring your camera. We'll probably have lunch somewhere along the way. Consider the weather (sun block, rain coat, umbrella, swim suit). This is one option for Saturday. You are free to come with us or take part in all or some activities. Or make your own way around... The agenda for the tourist day is completely wrong. The forecast: http://www.theweathernetwork.com/weather/CAON0512 Cora's - 8 AM - Breakfast - 179 Rideau Street - http://bit.ly/dngpnV - if you arrive much past 8:30, you probably won't get through in time to get to the next meeting point - we will depart Cora's at 8:55 Rideau Center - 9:13 - south end of the Mall on the Mackenzie King bridge - we are catching the #95 bus to direction Orleans to BLAIR 1B - from there, at 9:27, take Bus route 129 (OC Transpo) direction Aviation Aviation Museum - we'll do a guided tour first, then wander around - departing here at 1:30 Earl of Sussex Pub - arriving at about 2:15 - 431 Sussex Drive Ottawa, ON K1N 9M6 (613) 562-5544 earlofsussex.ca All times after we leave Rideau Center are very subject to change. Dan Langille National Memorial Residence Forum