PGCon2012 - Final Release

PGCon 2012
The PostgreSQL Conference

Speakers
Alex Ming Lai
Marty Brandon
Schedule
Day Talks - 2 - Friday - 2012-05-18
Room MRT 205
Start time 11:30
Duration 01:00
Info
ID 426
Event type Lecture
Track DBA
Language used for presentation English

Monitoring Ozone Levels with Postgresql

Database Streaming Replication and Monitoring

Postgres is used to manage data from the Ozone Monitoring Instrument aboard NASA's Aura spacecraft. The database implementation must handle large volumes of complex data transmitted continually from the satellite and generated by processing-intensive analyses performed by a team of atmospheric scientists. This talk will describe the architecture and some of the challenges faced. Focus will be given to our replication efforts, software developed for monitoring, and ongoing work to create a decentralized network of services commnicating through a RESTful interface.

NASA and its international partners operate several Earth observing satellites that closely follow one after another along the same orbital track. This coordinated group of satellites, is called the Afternoon Constellation, or "A-Train" (http://atrain.nasa.gov/), for short. Four satellites currently fly in the A-Train: Aqua, CloudSat, CALIPSO, and Aura. Each satellite has one or more observational instruments that are used together in the construction of high-definition three-dimensional images of the Earth's atmosphere and to monitor changes over time. Aura's instruments include the Ozone Monitoring Instrument (OMI). Data management and processing services for data harvested by OMI are provided by the OMI Science Support Team headquartered at Goddard Space Flight Center.

Raw OMI data is received and initially processed at a ground station in Finland, then ingested into the system, where it is analyzed by scientists who submit processing jobs. Earth Science Data Types (ESDTs) are the products of these jobs, and one of the principal types of data managed in the database. Complex and abstract, ESDTs represent the interface between the raw science data and the data management system, and more than 900 are currently defined.

Our current database implementation includes 10 clusters, each running Postgres 9.0.4, and divided into three production levels: development, testing, and operations. The central operations cluster handles on average about 200 commit statements per second, contains tables as large as 160 million rows, and is configured for streaming replication. New data is continually being added to the system, and the total quantitiy is increasing at a rate of about 60% per year. This influx of data, in addition to scientific analyses, can cause the load on the database to vary suddenly, and monitoring software has been developed to provide early warning of potential problems.

The latest implementation of our software architecture uses decentralized services communicating through a RESTful interface. Databases are bundled together with their software component, and schema changes are managed using patch files. A utility has been created to apply the patches, and ensure schema consistency as the databases are amended. Perl's Rose-DB is used as an object-relational mapper, and database queries, via HTTP requests, are supported by encoding the query information into JSON. The new platform uses a different data model, making it necessary to sync between the two representations, and causing some difficulty with data duplication.