PGCon2009 - Final Release

PGCon 2009
The PostgreSQL Conference

Speakers
Terry Jones
Schedule
Day Talks - second day - 2009-05-22
Room DMS 1110
Start time 15:00
Duration 01:00
Info
ID 176
Event type Lecture
Track Case Studies
Language used for presentation English

The design, architecture, and tradeoffs of FluidDB

The database with the heart of a wiki.

FluidDB is a hosted database that Fluidinfo (http://fluidinfo.com) will launch in alpha early this year. In this talk I will describe: the aspects of FluidDB that make it novel, the reasoning behind this approach to working with data, and the architecture of FluidDB. The system is currently deployed on top of Amazon EC2 and S3, and we are using PostgreSQL as a key component in the architecture.

Heart of a wiki

We call FluidDB "the database with the heart of a wiki" because it is writable by any application. That's to say that when an application (or user) encounters an object in FluidDB, they can always add information to it. No need to ask permission, no need for anyone to anticipate needs or to decide in advance how information should be structured, organized, or combined. There are no schema, and there is no distinction between metadata and data. Unlike a wiki though: there are permissions on object attributes to prevent damage existing content, and there is an extremely simple query language - making new information directly searchable.

Wikis changed how people can create content online. FluidDB, I hope, will do a similar thing for applications and the people who use them. It has the wiki advantage of general writability and encouraging information sharing and augmentation, but without the main problems of wikis.

Why is this interesting?

Having a database that's fully writable in this way changes several important things:

  • It gives programmers the flexibility to easily change applications in unpredictable ways after they have been deployed.

  • It means that a successful application (e.g., Twitter) does not need to write a specialized restrictive API for 3rd party apps - they use the FluidDB API just like the original app.

  • It means 3rd party apps can store their data together with the data of the original app (imagine being able to add ratings to Twitter tweets, and then search based on rating and tweet content).

  • It allows us to build a family of apps that can share data. In this way it makes it simple to build mashups that add information to the world and then put it somewhere useful (i.e., onto the original objects), enabling more mashups instead of making further hoops for subsequent programmers.

But wait, there's more!

Allowing anyone or any application to add information to objects also has a big impact on control and organization of information.

  • Control: UIs and APIs provide people and programs with access to information. But OTOH they also limit what we can do. FluidDB changes this because it allows search on anything - including whatever attributes a normal user or app has put into the system. So you might do a search for Amazon books that are out in paperback, and which have been read by Fred, and which have been mentioned in Slashdot, and which Sally has looked at on Amazon but not bought. That kind of search is impossible today because we do not have an underlying information architecture that lets us put all that disparate information in the one place and search on it. In FluidDB no-one can stop anyone from adding whatever they like to objects, or searching as they please, so this kind of thing is easy.

  • Organization: Programmers have traditionally used data structures and pointers to organize information. While very fast at runtime, that approach is very rigid. People, especially different people, want to organize things differently, or in multiple ways, and on-the-fly. Because FluidDB allows you to essentially tag anything you want, you can use unique tags and search instead of data structures fields and pointer following. For example: want to know what's in a folder? Search for objects tagged as belonging to the folder. You can build all data structures in this way, and these organizations can co-exist simultaneously without interfering with one another.