PGCon2008 - Final - we hope

PGCon 2008
The PostgreSQL Conference

Wayne Schroeder
Day Talks - second day (2008-05-23)
Room B
Start time 11:30
Duration 01:00
ID 110
Event type lecture
Track Horizontal Scaling
Language en

iRODS - A Large-Scale Rule-Orientated Data Management System

RDBMS-based Data Grid, Persistent Archive, Digital Library

The integrated Rule Oriented Data management System (iRODS), is an open source software system that implements data grids, persistent archives, and digital libraries. iRODS incorporates an RDBMS for storing and querying persistent information and a distributed Rule Engine to invoke micro-service workflows. This session will present an overview of iRODS, how it makes use of PostgreSQL, and a comparison of PostgreSQL and Oracle iRODS instances.

The integrated Rule Oriented Data management System (iRODS), is an open source software data management system being developed by the Data Intensive Computing Environments (DICE) group at the University of California San Diego with funding from the National Archives and Records Administration and the National Science Foundation. iRODS is the "Next Generation" product from DICE, building on the expertise developed through production use of the Storage Resource Broker (SRB). The SRB organizes more than 2 petabytes of distributed data into shared collections on a world-wide basis. iRODS 1.0 was released in January, 2008.

iRODS functionality includes: maintaining global name spaces for identifying files, users, and storage resources, authentication and authorization controls, high performance WAN data transport, system and user-defined metadata, query-based data discovery and browsing, management of data distribution and replication, and checksum and synchronization mechanisms.

iRODS provides a flexible, adaptive, and customizable data management architecture through the use of a custom-designed Rule Engine at its core which invokes 'micro-services' that are integrated into workflows to process requests and handle information. iRODS automates the execution of management policies by enforcing rules directly at each storage resource.

Metadata, both system-maintained and optional user-defined, are ingested and accessed in an RDBMS system, either PostgreSQL or Oracle. Accesses go through the iRODS clients to iRODS servers via the iRODS protocol, and then to the catalog library which interfaces to the RDBMS via either ODBC or OCI.

Performance and scalability are critical, as the digital holdings can be massive in size, measured in hundreds of millions of files and petabytes of storage, and may be maintained for decades.