PGCon 2012

PGCon 2012 The PostgreSQL Conference University of Ottawa Ottawa 2012-05-15 2012-05-19 5 Final Release 09:00 00:30 09:00 03:00 MRT 212 Mastering PostgreSQL Administration DBA workshop In this two part course you will learn the essential details of PostgreSQL configuration, security, maintenance, monitoring, tuning, backups, recovery. The course is designed for people with experience in database administration but who are new to the Postgres platform. There will be a 1hr lunch break at noon. Mastering PostgreSQL Administration In this two part course you will learn the essential details of PostgreSQL configuration, security, maintenance, monitoring, tuning, backups, recovery. The course is designed for people with experience in database administration but who are new to the Postgres platform. The training will run for approximately two 3 hour sessions with several breaks during the class. Every student would be required to carry their own laptops with OSX or Linux/Win on their machines. It is recommended to download PostgreSQL 9.x one click installer. During training participants will cover the following topics: Introduction Installation Configuration Security File structure Maintenance Backup Monitoring Disk space computations Hot standby and Replication Disaster Recovery Bruce Momjian Robert Treat http://momjian.us/main/presentations/overview.html#admin 13:00 03:00 MRT 212 Mastering PostgreSQL Administration afternoon session DBA workshop en In this two part course you will learn the essential details of PostgreSQL configuration, security, maintenance, monitoring, tuning, backups, recovery. The course is designed for people with experience in database administration but who are new to the Postgres platform. There will be a 1hr lunch break at noon. Mastering PostgreSQL Administration In this two part course you will learn the essential details of PostgreSQL configuration, security, maintenance, monitoring, tuning, backups, recovery. The course is designed for people with experience in database administration but who are new to the Postgres platform. The training will run for approximately two 3 hour sessions with several breaks during the class. Every student would be required to carry their own laptops with OSX or Linux/Win on their machines. It is recommended to download PostgreSQL 9.x one click installer. During training participants will cover the following topics: Introduction Installation Configuration Security File structure Maintenance Backup Monitoring Disk space computations Hot standby and Replication Disaster Recovery Bruce Momjian Robert Treat http://momjian.us/main/presentations/overview.html#admin 09:00 03:00 MRT 212 Getting Hot and Streamy with Postgres Using Postgres' built in replication facilities DBA workshop en An overview of Postgres' built in replication system, starting from PITR and going to cascading replication with demonstration's along the way. There will be more emphasis on the more recent and interesting technologies such as streaming replication and hot standby. Starting off with background about the WAL and how it enables all the technologies I am going to describe. Moving on to PITR, Warm Standby, Streaming Replication, Hot Standby, Synchronous Streaming Replication and Cascading Replication. I will have demonstrations along the way that will build on each other as we get toward the more advanced methods. Most of the focus will be on the most practical Streaming Replication and Hot Standby. The goal is that someone watching this tutorial would be able to understand enough about how replication works in Postgres to implement and maintain it. Phillip Sorber OmniPITR Config's from Demo 13:00 03:00 MRT 212 Tutorial - Configuring write-scalable PostgreSQL cluster Postgres-XC primer and more Scaling Out other en Postgres-XC (simply XC) is write-scalable PostgreSQL cluster, which will be generally available by early May, 2012. So far, XC is only one write-scalable, symetric database cluster solution available as open source. This tutorial covers almost all the topics needed to use Postgres-XC, listed as follows: 1) Postgres-XC, what it is and what it is not 2) Postgres-XC elements -- Global Transaction Manager, Coordinator and Datanode 3) How to design a Postgres-XC cluster --- cluster configuration and table design 4) Build and installation 5) How to configure Postgres-XC 6) How to test Postgres-XC 7) Cluster-wide backup and restore 8) High availability and component failure 9) Postgres-XC as a community, be a developer! Ashutosh Bapat Koichi Suzuki Michael Paquier 15:00 04:00 Royal Oak registration Registration pickup The social way to register: at the pub Social other en Pick up your registration pack Stop by the Royal Oak Pub on Laurier Street and get your registration pack. You'll help us avoid long line ups on Friday morning and you get to have a beer, and chat with your fellow attendees. We guarantee you'll spot someone famous. Dan Langille 18:00 05:00 L152 hacker1 Hacker Lounge meet, greet, code, slack Social other en A place to gather... This is the place where many people will gather to work on their laptops, converse, code, hack, slack, and generally behave in cooperative ventures. The times will vary, depending on when people gather. wifi is available, but bring power strips and extension cords to share the wealth. L152 is located in the [Residence](http://g.co/maps/8scp6), ground floor, just to the left as you pass by the front desk. Ask if you can't find it. Dan Langille 09:30 01:00 MRT 212 Using PostgreSQL in modern enterprise web applications Using HTML5, JavaScript, NodeJS with PostgreSQL Applications lecture en PostgreSQL's object relational heritage makes it an outstanding choice for developing web applications that have a rich object model domain. See how PostgreSQL's object relational features allow for building database models to support a NodeJS data service feeding a 100% JavaScript web client. PostgreSQL's object relational heritage makes it an outstanding choice for developing web applications that have a rich object model domain. See how PostgreSQL's object relational features allow for building database models to support a NodeJS data service feeding a 100% JavaScript web client. Topics include: * The benefits of a rich domain model in enterprise software - Comparing different architecture approaches: scripted model, table model and domain model * The dissonance between relational databases and the object model - Why NoSQL database's seem attractive - Why relational is still the best choice for enterprise applications - What is an "object relational" database? * Using Postgres' object-relational features to reduce the friction - Compound Types - Querying an object hierarchy - Object relational views * Exploring an example - Define a rich domain in a JavaScript MVC web client - Build a data source using NodeJS using PostgreSQL JavaScript drivers - Defining models in Postgres - Extending models in Postgres * Yes, it's got a Hemi - Use Google's plV8js language in PostresSQL to run JavaScript directly in the database - Parse and process JSON payloads John Rogelstad 11:00 01:00 MRT 212 MADlib An open source machine learning library on RDBMS for Big Data age Applications lecture en MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. The MADlib mission is to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. The library consists of various analytics methods including linear regression, logistic regression, k-means clustering, decision tree, support vector machine and more. That's not all; there is also super-efficient user-defined data type for sparse vector with a number of arithmetic methods. It can be loaded and run in PostgreSQL 8.4 to 9.1 as well as Greenplum 4.0 to 4.2. This talk covers its concept overall with some introductions to the problems we are tackling and the solutions for them. It will also contain some topics around parallel data processing which is very hot in both of research and commercial area these days. Hitoshi Harada MADlib homepage 12:00 01:00 MRT 212 schema Schemaverse Learn more about the tournament format, available prizes, game mechanics or even simply discuss the idea DBA contest en Compete against your fellow PostgreSQL users for prizes and the honor of the Schemaverse Champion title. If you would like to learn more about the tournament format, available prizes, game mechanics or even simply discuss the idea further, we welcome you to join the game's creator, Joshua 'Abstrct' McDougall, for some pizza in Room MRT-212 at 12:30 on Thursday. Josh 'Abstrct' McDougall The Schemaverse Schemaverse tutorial 13:00 01:00 MRT 212 Database Ops Easy and Effective Operation for production systems with PostgreSQL DBA workshop en NTT (Nippon Telegraph and Telephone) group has made effort to introduce PostgreSQL to its production systems that are large and mission-critical. Introducing PostgreSQL, we found it may be an obstacle that operation tools for PostgreSQL are not provided enough. So we have developed tools for backup, data load, and performance monitoring. In the talk, we will introduce these tools and how to improve database operation using them. NTT group, which is the largest telecommunication career in Japan providing more than 120 million subscribers, has made effort introducing PostgreSQL to its production systems that support telecommunication. When we apply PostgreSQL to a production system, we will have to do many 'house-keeping chores'; At the beginning, you will have to load initial data to PostgreSQL, just after starting operation, you will have to periodically take back up files against data loss caused by media crash. Many proprietary DBMS provide operational and/or management tools to make DBA's work easier and more efficient. Additionally, such a tool provides an easy and standard way of oprerations, it enables not a skilled engineer to manage database systems well. Concerning PostgreSQL, such tools are not enough provided, it may be an obstacle to introduce PostgreSQL into enterprise systems. So we have developed some operational tools for taking back up, loading data and monitoring performance. And we provide technical know-hows about appropriate operations so that an engineer who is not familiar with PostgreSQL as we expect can manage PostgreSQL well. The talk will introduce our daily operation activities with assistance of the tools; pg_rman for taking backups, pg_bulkload for data loading with data cleansing, and pg_statsinfo for performance monitoring. Talking things above we hope to share PostgreSQL operation know-hows with many DBAs. Tetsuo Sakata 14:30 01:00 MRT 212 Range Types and Temporal: Past, Present, and Future 9.2 Features en Range Types didn't exist before, why do we need them now? How do they work? Why is "Temporal" important if we already have timestamps? How do we apply these concepts before deploying PostgreSQL 9.2? What's left to be done, and what solutions are in the works? I'll be asking the audience these questions, so -- Err... I mean: I will be answering these questions during the talk. Extensions, changes to core postgresql, and future ideas will be described in the context of solving a simple use case from 2006. These ideas build up to the larger point that powerful types are important, and database systems should do more to support them. Jeff Davis 09:00 00:30 MRT 218 keynote Keynote Plenary lecture en One thing is certain: databases in the future are not going to look much like today. At Heroku, we are host to hundreds of thousands of databases that back applications which range from Berkeley class projects to SuperBowl ad campaigns. From this unusual position we have developed a unique perspective about how the data landscape is changing and can offer some thoughts on how PostgreSQL can change the way people build applications. Slides at: http://pgcon-2012-keynote.herokuapp.com Peter van Hardenberg slides 09:30 01:00 MRT 218 9.2: Full Throttle Database New Feature Grand Prix 9.2 Features lecture en Gentlemen, start your database engines! PostgreSQL 9.2 beta is here, and it's faster and more exciting than ever before. Come down to the track and join us for a high-speed tour of the next version like never before! Gentlemen, start your database engines! PostgreSQL 9.2 beta is here, and it's faster and more exciting than ever before. Come down to the track and join us for a high-speed tour of a database which is faster than ever before! Starting at pole position, we will whip around the features of version 9.2, speeding through one demo after another, including: * cascading replication * enhanced vertical scalability * improved performance * index-only access * range types * JSON support * better live DDL deployments * new administrative views It's the fastest PostgreSQL yet, and you have a shotgun seat! Josh Berkus 11:00 01:00 MRT 218 "Cheap, Fast AND Good" .... CHECK A checklist for database replication shoppers. Scaling Out lecture en PostgreSQL's built in replication is available for a while now, yet all the previous solutions enjoy ongoing popularity. For the experts, this is hardly surprising. A) because none of the solutions was ever meant to replace anything else. B) because replication is one single term for several different attempts of solving a subset of many different problems. For IT decision makers this can be rather confusing. Like with so many things, when looking for the right replication solution, people often don't know what they are really looking for. They look over the feature lists of products and try to determine from that which product best fits their needs. But unless they actually know what they need, how is the feature list going to help them? It works the other way around. We need to look at all the problems, that can be solved with database replication in general, and identify which are or may become relevant in the case at hand. Then find the solution, that solves most of them by priority. Only if we have a prioritized list of problems to solve, the feature list of products starts making sense. This talk discusses high level features of database replication systems and presents them in the form of use cases. These usage patterns are what will drive your decision when you look for the right replication solution to your problem(s). Jan Wieck 13:00 01:00 MRT 218 Large Scale MySQL Migration to PostgreSQL! Case Studies en Once a Top-10 internet audience site. 32 million users. Billions of photos and comments, more than 6TB of them. Migrating away from MySQL to PostgreSQL! This talk will share hindsights about the why and the how of that migration, what problems couldn't be solved without moving away and how the solution now looks. The tools used for migrating away the data, the methods and will detail the new architecture. And the new home, in the cloud! On the technical side of things, we will be talking about MySQL, mysqltocsv, pgloader, pljava, Google Protocol Buffers, pgbouncer, plproxy, PostgreSQL, pghashlib, walmgr, streaming replication. And Amazon hosting facilities too (EBS for starters). Dimitri Fontaine 14:30 01:00 MRT 218 Unlocking the Postgres Lock Manager Advanced Features lecture en Locking is critical for providing high concurrency for any database — you cannot fully utilize your hardware if locking is throttling its use. This talk explores all aspects of locking in Postgres by showing queries and their locks; covered lock types include row, table, shared, exclusive, and advisory lock types. The high concurrency provided by Multiversion Concurrency Control (MVCC) is also covered. Locking is critical for providing high concurrency for any database — you cannot fully utilize your hardware if locking is throttling its use. This talk explores all aspects of locking in Postgres by showing queries and their locks; covered lock types include row, table, shared, exclusive, and advisory lock types. The high concurrency provided by Multiversion Concurrency Control (MVCC) is also covered. Bruce Momjian http://momjian.us/main/presentations/internals.html#locking 16:00 01:00 MRT 218 lightning Lightning talks Short sharp descriptions of short topics Plenary lightning en A regular feature, PGCon will have a Lightning talks session, with presentations on diverse topics. The format remains essentially the same: in a one hour period, audiences are entertained and informed by a rapid fire series of short talks on interesting new or on-going work by individuals or groups. Slides aer permitted, but not obligatory; pictures are highly recommended. Topic areas include new open source software projects, works in progress for future releases of existing projects, student projects, etc. Lightning talks topics this year may make good conference papers next year! The number of slots is limited, and experience suggests there will be more takers than slots. Sign up well in advance to be assured a spot. The session chair this year is yet to be decided. Our tentative list of talks is: * Security, Why it's Awful and How To Fix It - David Fetter * Finishing Your PostgreSQL Talk On time - Greg Smith * PostgreSQL China PUG - Galy Lee * pgBadger - Gilles Darold * PostGIS 2.0 and CartoDB 1.0 - Javier de la Torre * Seven Deadly Sins of Deployment - Josh Berkus * pg_extractor - Keith Fiske * Improving the PostgreSQL Experience on the Mac, One App at a Time - Mattt Thompson * pg_stat_statements - Peter Geoghegan * What to Do with a Cray Supercomputer - Stephen Frost ... however, this is subject to change up until the Lightning Talks actually begin. Galy Lee Josh Berkus Magnus Hagander List of talks + slides 09:30 01:00 MRT 219 Writing a foreign data wrapper Experiences with Informix Hacking lecture en Writing a foreign data wrapper (FDW) for PostgreSQL seems easy. However, there are many pitfalls. This talk will cover experiences from writing a FDW for Informix and will discuss problems with client libraries, data type mapping, optimizer support and performance related topics. Interested attendees will get a short overview on what they can expect from a FDW and (hopefully) learn something to do it better ;) Bernd Helmle 11:00 01:00 MRT 219 Schemaless SQL The Best of Both Worlds Advanced Features lecture en Schemaless database are a joy to use because they make it easy to iterate on your app, especially early on. And to be honest, the relational model isn't always the best fit for real-world evolving and messy data. On the other hand, relational databases are proven, robust, and powerful. Also, over time as your data model stabilizes, the lack of well-defined schemas becomes painful. How are we supposed to pick one or the other? Simple: pick both. Fortunately recent advances in Postgres allow for a hybrid approach that we've been using at Heroku. The hstore datatype gives you key/value in a single column, and PLV8 enables JavaScript and JSON in Postgres. These and others in turn make Postgres the best document database in the world. We will explore the power of hstore and PLV8, explain how to use them in your project today, and examine their role in the future of data. Will Leinweber 13:00 01:00 MRT 219 Finding Similar Effective similarity search in database Hacking lecture en Finding similar objects is an ubiquitous task in day-to-day activity of developers of informational services. We present PostgreSQL extension, which provides an effective way to find similar objects in database, as well as several usage examples. The extension provides several methods to calculate sets similarity and similarity operator with indexing support on the base of GiST and GIN frameworks. Similarity search in large databases is an important issue in nowadays informational services, such as recommender systems. Naive implementation is slow and resource consuming. We developed PostgreSQL extension, called smlar, which provides several methods to calculate sets similarity (all built-in data types supported), similarity operator with indexing support on the base of GiST and GIN frameworks. Sets similarity means, that smlar isn't about content similarity (it doesn't interested in the nature of objects), but it's about similarity of sets. One example is a recommender system, which produces a list of recommendations based on collaborative and/or content filtering (Amazon is one of the most popular electronic commerce company, which provides recommendations, based on item-item similarity). Content filtering utilizes a set of discrete metadata of an object to build recommendation list of additional objects with similar properties, while collaborative filtering uses information about user's past behaviour and similar decisions made by other users, to predict objects that the user may have interest in. Smlar extension was developed in mind with collaborative filtering. It provides several methods to compute similarity between sets: jaccard, cosine and tfidf. Experiments with generated and real data sets show considerable advantage of using smlar extension in compare with brute-force approach. Oleg Bartunov Teodor Sigaev 14:30 01:00 MRT 219 PL/R Tricks Server Monitoring with Predictive Analytics Applications lecture en We will present the results of an investigation into the use of PostgreSQL and PL/R in conjunction with a server monitoring application to perform predictive analytics of server performance. Usually server monitoring is reactive in nature. Some threshold is exceeded, and an alert is sent. By the time you receive the alert, something bad has already happened. Wouldn't it be nice to be able to foresee trouble before it rears its ugly head? We will investigate the feasibility of applying both well established relatively simple, and more advanced forms of dynamic statistical analysis to server monitoring to allow more proactive server management. The tools used will be PostgreSQL, R, and PL/R. Jeff Hamann Joe Conway 18:00 03:00 Out and about socialouting Major Social Event! sponsored by EnterpriseDB Social other en EnterpriseDB invites all PGCon attendees to a big evening with drinks, appetizers, dinner and music on Thursday May 17th at [My Condo](http://mycondoottawa.ca/), just minutes from the conference venue in the Byward Market. We have exclusive use of My Condo between 6:00 - 9:00 pm, meaning that all 4 floors including the 4th floor patio will be ours to move around, mingle and catch up with fellow attendees. Food stations will be available on multiple floors, BBQ burgers, Jambalaya, Fettuccini Primavera and Honey Mustard Chicken, so come hungry! NOTE: Remember to bring your conference badge with you as it will be your ticket to entry at [My Condo](http://mycondoottawa.ca/). For directions, please consult the [official conference map](http://g.co/maps/rzxqq). MyCondo turns into a night club at 9:30pm, and everyone is welcome to stay free of charge and enjoy the evening - [check it out](http://mycondoottawa.ca/nights/) Dan Langille Map 21:00 03:00 L152 hacker2 Hacker Lounge meet, greet, code, slack Social other en A place to gather... This is the place where many people will gather to work on their laptops, converse, code, hack, slack, and generally behave in cooperative ventures. The times will vary, depending on when people gather. wifi is available, but bring power strips and extension cords to share the wealth. L152 is located in the [Residence](http://g.co/maps/8scp6), ground floor, just to the left as you pass by the front desk. Ask if you can't find it. Dan Langille 09:00 01:00 MRT 205 WAL Internals Of PostgreSQL Hacking lecture en Describes the Write-Ahead-Log Internals of PostgreSQL system. Improvements in WAL system that can be done to improve the performance. PostgreSQL uses WAL files to perform Crash recovery, Point In Time Recovery and Streaming Replication. This article will cover details of WAL system in PostgreSQL like what kind os WAL record gets generated on DML operations. WAL file name details and the contents it contains. The details of Async Commit and how it protects Partial Page writes using WAL system are covered. Finally some Advantages/Disadvantages and improvements w.r.t other RDBMS that can be done in PostgreSQL WAL system to improve its performance. Amit Kapila 10:00 01:00 MRT 205 Dear SQL Server, I'm filing for divorce. Falling in love with the free spirit of Postgres Case Studies lecture en Using the StackOverFlow datasets, we'll ditch all the drama of a Microsoft stack and convert from SQL Server to Postgres on Windows. Once we do that, we'll migrate our entire DB and Web App from Microsoft to Linux using Postgres and Mono with as few code changes as possible. Having the StackOverFlow dataset loaded into SQL Server and a mock StackOverFlow app in ASP.NET MVC3, we are going to show various ways to ETL into Postgres from SQL Server on Windows. Once that is done, we'll go over some basics of going from Postgres on Windows to Postgres on Linux as we attempt to migrate our app. Once we get our back-end moved, we'll show just how easily you can wire up ASP.NET MVC3 to Postgres and then move our entire stack to Linux using Nginx and Mono. Since I am a SQL Server DBA, I will also be adding lots of opinion on where Postgres really shines compared to SQL Server and where it doesn't. This session will be informative, entertaining and incredibly nerdy. Rob Sullivan 11:30 01:00 MRT 205 Monitoring Ozone Levels with Postgresql Database Streaming Replication and Monitoring DBA lecture en Postgres is used to manage data from the Ozone Monitoring Instrument aboard NASA's Aura spacecraft. The database implementation must handle large volumes of complex data transmitted continually from the satellite and generated by processing-intensive analyses performed by a team of atmospheric scientists. This talk will describe the architecture and some of the challenges faced. Focus will be given to our replication efforts, software developed for monitoring, and ongoing work to create a decentralized network of services commnicating through a RESTful interface. NASA and its international partners operate several Earth observing satellites that closely follow one after another along the same orbital track. This coordinated group of satellites, is called the Afternoon Constellation, or "A-Train" (http://atrain.nasa.gov/), for short. Four satellites currently fly in the A-Train: Aqua, CloudSat, CALIPSO, and Aura. Each satellite has one or more observational instruments that are used together in the construction of high-definition three-dimensional images of the Earth's atmosphere and to monitor changes over time. Aura's instruments include the Ozone Monitoring Instrument (OMI). Data management and processing services for data harvested by OMI are provided by the OMI Science Support Team headquartered at Goddard Space Flight Center. Raw OMI data is received and initially processed at a ground station in Finland, then ingested into the system, where it is analyzed by scientists who submit processing jobs. Earth Science Data Types (ESDTs) are the products of these jobs, and one of the principal types of data managed in the database. Complex and abstract, ESDTs represent the interface between the raw science data and the data management system, and more than 900 are currently defined. Our current database implementation includes 10 clusters, each running Postgres 9.0.4, and divided into three production levels: development, testing, and operations. The central operations cluster handles on average about 200 commit statements per second, contains tables as large as 160 million rows, and is configured for streaming replication. New data is continually being added to the system, and the total quantitiy is increasing at a rate of about 60% per year. This influx of data, in addition to scientific analyses, can cause the load on the database to vary suddenly, and monitoring software has been developed to provide early warning of potential problems. The latest implementation of our software architecture uses decentralized services communicating through a RESTful interface. Databases are bundled together with their software component, and schema changes are managed using patch files. A utility has been created to apply the patches, and ensure schema consistency as the databases are amended. Perl's Rose-DB is used as an object-relational mapper, and database queries, via HTTP requests, are supported by encoding the query information into JSON. The new platform uses a different data model, making it necessary to sync between the two representations, and causing some difficulty with data duplication. Alex Ming Lai Marty Brandon 13:30 01:00 MRT 205 OLTP Performance Benchmarks Overview Performance lecture en Learn about various OLTP Performance benchmark kits , when to use them and what to know when comparing numbers with other databases. pgbench is widely used for micro benchmarking PostgreSQL. However many customers use benchmarks based on the databases prevalent in their environment. MySQL - sysbench SQLServer - TPC-C, DVDStore Oracle - TPC-C and so on. In this session we look at these OLTP Benchmarks like sysbench, dbt2 (TPC-C like), BenchmarkSQL (TPC-C Like), DVDStore, etc and see how to optimize PostgreSQL for these benchmarks and points for consideration when comparing them to other databases. Jignesh K. Shah 15:00 01:00 MRT 205 Running libraries on PostgreSQL The Evergreen library system's (ab)use of PostgreSQL Case Studies en Launched by the Georgia Public Library System in September 2006 to manage and circulate materials through a consortia of over 200 public libraries, Evergreen has since been adopted by over 1,000 public and academic libraries across more than 30 states and provinces. From the beginning, Evergreen's distributed architecture has bet heavily on PostgreSQL features, relying on custom functions, triggers and rules, full-text search, XML support, inheritance, and recently HSTORE to provide reliable high-performance support for the day-to-day operations of libraries. One of the core developers of the Evergreen library system describes how the project tries to use PostgreSQL to its fullest, some of the lessons we have learned, and some of the challenges we face: * Scalability success stories * Replication then and now * Normalizing metadata to the exacting/arcane rules of librarians * TEXT vs. XML vs. MARC(XML|21) * Full-text search challenges * Searching across multiple languages * Relevance, configurability, and performance * Schema evolution and testing Dan Scott Evergreen project home 16:30 01:00 MRT 205 Simple SQL Change Management with Sqitch Hacking lecture en SQL change management has always sucked. This talk introduces Sqitch, the VCS-aware SQL change management application that doesn’t suck. Come see how it works, learn the few simple rules you need to get the most out of it, and liberate yourself from the suckitude. SQL change management is hard. Most “migration”-style implementations require opaque naming conventions, prefer DSLs that cover a fraction of SQL, and require duplication of code for simple changes to existing functions. Such does not have to be. And now it’s not Introducing [Sqitch](http://sqitch.org/), simple SQL change management that doesn’t suck. Sqitch doesn’t care what programming language your app is written in. It has no opinions as to what database to use or what its schema should look like. And it doesn’t require sequentially-named migration scripts or the use of any DSL other than SQL. Sqitch lets you to write SQL migration scripts thar target *your* database, and provides a simple, unintrusive interface for specifying dependencies, so that it can run things in the proper order. Best of all, when used with a version control system (initially Git), you can even modify idempotent deployment scripts between releases. Sqitch recognizes such changes, and automatically knows how to revert to earlier versions if required. And finally, Sqitch supports simple acceptance testing, so that you can be sure that your deployments are successful, and, if not, revert them. So come to this talk to learn all about Sqitch: How it works, where to get it, and how to get the most out of managing database deployments. David E. Wheeler Sqitch GitHub CPAN Tutorial 09:00 01:00 MRT 212 Making your own maps An introduction in using free Geospatial data GIS lecture en PostGIS is an extension to PostgreSQL that turns PostgreSQL into a superb spatial database. Storing spatial data in PostgreSQL is a great way too use up the space on your SSD's however using the data to make maps is much more fun. This talk is aimed at people with limited GIS experience and will talk about how to use OpenStreetMap data for map making. We will tell you how you can get free geo-spatial data from OpenStreetMap and how it can be loaded into a PostGIS database. Common methods of using and accessing your data will be discussed including: * Open Source desktop GIS software * Generating custom map tiles for use on your website * Making pretty paper maps. This talk will introduce common tools and techniques used to with PostGIS when working with OpenStreetMap data. This is a user focused talk suitable for people who have next to no GIS background. Steve Singer 10:00 01:00 MRT 212 On snakes and elephants Using Python with and in PostgreSQL Applications lecture en Python is one of the most popular application programming languages and there's a plethora of PostgreSQL libraries and utilities for Python. This talk will try to give an overview of the contemporary Python-PostgreSQL landscape in a way that's useful both for Python programmers starting on a PostgreSQL project and DBAs dealing with what those programmers wrote. We'll try cover a slightly opinionated selection of libraries, frameworks and technologies and give some recommendations. The richeness of the environment is sometimes confusing. Python people starting with PostgreSQL often don't know which driver or ORM library should they be using. Sometimes they're not aware of all the things PostgreSQL can offer to a Python programmer and the tools available. On the other hand, DBAs sometimes need to debug Python programs (mis)using their database and PostgreSQL-savvy people join or consult on projects written in Python and need to have at least a basic understanding of how Python works, particularily on the database connection front. We'll try to make both of these groups a bit more comfortable when dealing with the other. The talk will cover available drivers, focusing especially on psycopg2 and some of its lesser-known features and ORM libraries, focusing mainly on SQLAlchemy. We'll also discuss PL/PythonU, the possibilities it opens, along with some best practices and caveats. Jan Urbański 11:30 01:00 MRT 212 Hooks in PostgreSQL Advanced Features lecture en PostgreSQL's extensibility is well known. Most people have heard of user types, user operators, the new extension capability, and such. But few know about hooks in PostgreSQL. This talk will cover all kinds of hooks available in PostgreSQL, and will show some tools using them already. Since the 8.3 release, the PostgreSQL developers add many hooks in PostgreSQL. Some extensions already make use of such hooks in the planner and in the executor. pg_stat_statements is one of the various examples available. This talk will give a large overview of the hook system, and how to use it. We'll also see some of the extensions making use of them. Guillaume Lelarge 13:30 01:00 MRT 212 Improving foreign key concurrency To lock and not to block 9.2 Features lecture en Row locking is a mechanism that lets Postgres maintain strict consistency in certain database constraints, such as foreign keys. However, Postgres has historically only provided share and exclusive row locking, which I'll show to have significant drawbacks for concurrency. To solve the concurrency problem, two new row lock types are being introduced in release 9.2: SELECT FOR KEY SHARE and SELECT FOR KEY UPDATE. In this talk I'll explain how this new locking came to be, how it works, and how it helps significantly improve concurrency in applications. Álvaro Herrera 15:00 01:00 MRT 212 Index support for regular expression search Advanced Features lecture en Regular expressions (regex) are powerful tool for text processing. When dealing with large string collections it's important to search fast on that collections (i.e. search using index). Indexing for regex search is a quite hard task. This talk presents novel technique (and WIP patch for PostgreSQL implementing it) for regex search using trigram indexes. Proposed technique provides more comprehensive trigram extraction than analogues, i.e. higher performance. There are two existed approaches for index-based regex search. The FREE indexing engine is based on extractions continued text fractions from regex and perform substring search. Google Code Search approach present more sophisticated recursive analysis of regex with extraction of various regex attributes. This talk presents novel technique of regex analysis which is based on automata transformation rather than original regex analysis. Superiority of proposed technique will be proved by examples and tests. The talk would be organized as following: * Introduction. * Regular expressions * Finite automata * pg_trgm contrib module * Existing techniques for index-based regular expression search * FREE indexing engine * Google Code Search * Proposed technique * Description * Examples * Comparison with analogues * Limitations * Performance results Alexander Korotkov WIP patch Junghoo Cho, Sridhar Rajagopalan "A Fast Regular Expression Indexing Engine" How Google Code Search Worked Video of talk 16:30 01:00 MRT 212 The PostgreSQL replication protocol, tools and opportunities DBA lecture en The new binary replication protocols and tools in PostgreSQL 9.0 and 9.1 are a popular new feature - but they can also be used for other things than just replication! The new binary replication protocols and tools in PostgreSQL 9.0 and 9.1 are a popular new feature - but they can also be used for other things than just replication! PostgreSQL 9.1 includes server changes to allow standalone tools to request and respond according to the replication protocol. From these, tools like pg_basebackup allow a number of new possibilities. And the infrastructure put in place in 9.1 opens opportunities for further enhancements - some already on the drawing board and some just wild ideas so far. Magnus Hagander 09:00 01:00 MRT 218 The Horizontal Struggle Improving the Experience of Scale-Out Scaling Out lecture en Horizontal scale-out of applications using Postgres is typically a time-consuming, expensive, error-prone task. In spite of that, horizontal scale-out is achievable, even if not well-supported by the logical constructs Postgres exposes. This talk is intended to share what we've learned from both our experiences at Heroku and, more importantly, the litany of customers that we are privileged to talk to about their problems. From these, a few choice gaps in functionality are highlighted for improvement. The era of horizontal scale-out has long been upon-us. While some productively continue to outrun the problem by leveraging Moore's Law, others already have made the jump to fully distributed data management systems to achieve better scalability, availability, and latency around the globe. Some enthusiasts have even gone so far as to say that relational models will not gracefully survive and grow in the coming era. The author of this talk is skeptical of this prediction, but acknowledges there are painful gaps in functionality that exist in all known generally-available production-class relational database systems in the domain of enabling scale-out of applications. He also thinks those gaps are, in all likelihood, solvable, without a huge upheaval to the implementation of Postgres nor applications written against it. Here, he will attempt to draw attention to: * The relationship between relational models, ACID, and usability * The sacrifices in usability made by most distributed data management software not intrinsic to their advantages * The surprisingly few basic use-cases required by most people struggling with horizontal scalability * The current state of the art in using Postgres as a member in a distributed system * Choice weaknesses to make progress on, and sketches on mechanism to address them Daniel Farina 10:00 01:00 MRT 218 A Batch of Commit Batching Scaling Out lecture en A database commit can be the most expensive single operation that its users have to wait for. Recent trends in the database industry have proven some applications are willing to accept durability loss, when it must be sacrificed to reach performance goals. And an inevitable downside of more durable approaches like Synchronous Replication are their impact on server commit speed. Some of the fundamental limitations here are physical ones: disk rotation, network performance, and the speed of light. Recent performance improvements changes for PostgreSQL 9.2 aim at getting closer to the theoretical best possible behavior here in every situation. It's more important than ever to tell when the limit you're hitting is a physical one, and when it's something you can address with a software change. Controlling commit batch size and the number of concurrent clients is getting even more important as PostgreSQL is deployed onto cloud and other virtual hardware environments. Four of the fundamental factors going into how expensive a commit is are atomicity, consistency, isolation, durability, collectively referred to as ACID. PostgreSQL has always respected the durability aspects of ACID compliance. Extending that to reach onto multiple servers can significantly expands the suitability of the database for business critical applications. It will cost you though. The question isn't just how much durability you want; it's much durability can you afford? The innovative design used in PostgreSQL doesn't force you to make this sort of decision at the database level. Every individual commit can specify its durability requirements at any time, even in the middle of a transaction. Being able to classify your need at such a fine level allows PostgreSQL an unprecedented range of options in this area. Mission critical data that needs multi-node synchronous commit can coexist with high volume/best effort data, with each transaction fine-tuned to its position in the reliability vs. speed trade-off spectrum. There's a second factor to consider too: client count. The Synchronous Replication implementation used for PostgreSQL 9.1 makes it possible to increase total aggregate commit throughput by scaling up the concurrent number of clients. Improvements in progress for PostgreSQL 9.2 take that basic idea and applies it more aggressively to local commits as well. Carefully adjusting per-client commit behavior is becoming an increasingly important bottleneck to understand and design against. Topics covered will include: * Components of commit latency * Application batch commits * Benchmarking commit speed vs. client count * Local commit durability options and performance * Improvements in progress for PostgreSQL 9.2 group commit performance * Remote server commit latency * Synchronous Replication commit options and performance * Per-transaction commit durability Greg Smith Peter Geoghegan 11:30 01:00 MRT 218 Moving Day: Migrating Big Data from A to B Big Data lecture en In 2011 we moved the Mozilla crash reporting system from old creaky hardware in San Jose to a new shiny datacenter in Phoenix. This system contains more than 40TB of data in HBase, the Hadoop database, and PostgreSQL. The data collecting app has a requirement for close to 100% uptime. On top of that we have data processing, an API, and a webapp. After many months of work, the migration went seamlessly. In this session we’ll talk about: - The checklist manifesto, reprised, and understanding the critical path - How to move all that data in a reasonable timeframe - The importance of devops culture in success - Automating packaging and configuration and how it will save you - Understanding the difference between old and new platforms: correctness testing, load testing, and smoke testing Attendees should walk away with an outline of everything they’ll need to do to achieve a successful data center migration. Laura Thomson 13:30 01:00 MRT 218 PostgreSQL on AWS EC2 with somewhat reduced tears DBA lecture en Amazon Web Services (AWS) has become a very popular platform for deploying PostgreSQL-backed applications. But it's not a standard hosting platform. We'll talk about how to get PostgreSQL to run efficiently and safely on AWS. Among the topics covered will be: -- Selecting an EC2 instance size, and configuring it for PostgreSQL. -- Dealing with ephemeral instance storage: What is it good for? How much do you need? -- Elastic Block Store: How much do you need? How do you configure it for best performance? -- AWS characteristics and quirks. -- Why replication is not optional on AWS. -- Backups and disaster recovery. Christophe Pettus 15:00 01:00 MRT 218 Performance Improvements in PostgreSQL 9.2 Bigger servers, bigger problems Hacking lecture en The upcoming PostgreSQL 9.2 release features a large number of performance enhancements by many different authors, including heavyweight lock manager improvements, reduced lock hold times in key hot spots, better group commit, index-only scans, better write-ahead log parallelism, sorting improvements, and a userspace AVC for sepgsql. In this talk I'll give an overview of what was changed, how it helped, lessons learned, and the challenges that remain. Robert Haas 16:30 01:00 MRT 218 Big Bad "Upgraded" PostgreSQL Case Studies lecture en A few years ago, we started a project to upgrade our multi-terabyte database from 8.3 to 8.4. Along the way we encountered a number of different obstacles and roadblocks which caused us to postpone the project, but this past fall we finally made it through phase 1 of the project, which by now had become an upgrade from 8.3 to 9.1. The course of the talk will cover several tools and tactics we had to use to get pg_upgrade to complete successfully, including all the different ways that things blew up on us. A few years ago, we started a project to upgrade our multi-terabyte database from 8.3 to 8.4. Along the way we encountered a number of different obstacles and roadblocks which caused us to postpone the project, but this past fall we finally made it through phase 1 of the project, which by now had become an upgrade from 8.3 to 9.1. The course of the talk will cover several tools and tactics we had to use to get pg_upgrade to complete successfully, including all the different ways that things blew up on us. We'll also discuss some of the changes we saw after the upgrade, and discuss some of the improvements we've made using new 9.1 features. Long time Postgres may be familiar with the "Big Bad" series of talks, which discuss different ways we have had to bend Postgres to serve the needs of a high transaction, multi-terabyte decision support system. Our talks have featured both technical highlights and low lights, from innovative techniques to outright server meltdown, and all the good times in between. If you are using Postgres for mission critical applications, you'll enjoy this look inside the operations of a complex system that lives on the edge. Robert Treat Slide Info / Blog Slides 17:30 01:00 MRT 218 Closing sessions prizes, auctions, fun, games Plenary other en The Traditional Closing Session Watch the video. We raised thousands for charity. Dan Langille 19:00 02:00 Patty Boland's (upstairs) socialouting2 Major Social Event! sponsored by Heroku Social other en Come and join us for an evening of food and drink at Patty Boland's in the Market. Heroku is sponsoring this event for all PGCon attendees. Dinner and drinks will be provided. See [the map on the website](http://g.co/maps/a4ztx) for directions to the venue. NOTE: the time is 6:45pm to 8:45pm. :) NOTE: Bring your PGCon 2012 badge for admission. Dan Langille Map 21:00 03:00 L152 hacker3 Hacker Lounge meet, greet, code, slack Social other en A place to gather... This is the place where many people will gather to work on their laptops, converse, code, hack, slack, and generally behave in cooperative ventures. The times will vary, depending on when people gather. wifi is available, but bring power strips and extension cords to share the wealth. L152 is located in the [Residence](http://g.co/maps/8scp6), ground floor, just to the left as you pass by the front desk. Ask if you can't find it. Dan Langille 10:00 05:00 Out and about tourist Tourist stuff Spend some time exploring Social other en Explore Ottawa Ottawa has a large number of great attractions. Spend some time looking around and explore. Spend as much time as you want with us, or leave early. We will walk everywhere we go. Wear sensible shoes. Bring your camera. We'll probably have lunch somewhere along the way. Consider the weather (sun block, rain coat, umbrella, swim suit). This is one option for Saturday. You are free to come with us or take part in all or some activities. Or make your own way around... The agenda for the tourist day is completely wrong. The forecast: http://www.theweathernetwork.com/weather/CAON0512 Cora's - 8 AM - Breakfast - 179 Rideau Street - http://bit.ly/dngpnV - if you arrive much past 8:30, you probably won't get through in time to get to the next meeting point - we will depart Cora's at 8:55 Rideau Center - 9:13 - south end of the Mall on the Mackenzie King bridge - we are catching the #95 bus to direction Orleans to BLAIR 1B - from there, at 9:27, take Bus route 129 (OC Transpo) direction Aviation Aviation Museum - we'll do a guided tour first, then wander around - departing here at 1:30 Earl of Sussex Pub - arriving at about 2:15 - 431 Sussex Drive Ottawa, ON K1N 9M6 (613) 562-5544 earlofsussex.ca All times after we leave Rideau Center are very subject to change. Dan Langille National Memorial Residence Forum