PGCon2011 - Final (2011.06.11)

PGCon 2011
The PostgreSQL Conference

Luis Carvalho
Day Talks - 2 - 2011-05-20
Room DMS 1140
Start time 13:30
Duration 01:00
ID 332
Event type Lecture
Track Applications
Language used for presentation English

Doing Bioinformatics in PostgreSQL

We introduce and describe two modules that grew from the need to perform integrated and efficient Bioinformatics tasks in PostgreSQL: PostBio, a set of methods to store and query genomic sequences and features, and PostStat, a collection of statistical functions that allow for integrated statistical tests. A few practical examples will be presented to showcase the modules.

PostBio includes three data types: a GiST-indexable integer interval used to represent biological sequence features; a suffix tree type to search for maximum unique matches; and a compressed suffix array for fast short exact matches. In addition, PostBio provides a set of utilitary sequence routines.

PostStat comprises routines that compute a number of cumulative probability distributions, linear regression, and statistical tests, both parametric and non-parametric; the main motivation is to provide a way to test statistical hypothesis in simple models.