PGCon2011

PGCon 2011
The PostgreSQL Conference

Luis Carvalho
Day Talks - 2 - 2011-05-20
Doing Bioinformatics in PostgreSQL

We introduce and describe two modules that grew from the need to perform integrated and efficient Bioinformatics tasks in PostgreSQL: PostBio, a set of methods to store and query genomic sequences and features, and PostStat, a collection of statistical functions that allow for integrated statistical tests. A few practical examples will be presented to showcase the modules.

PostBio includes three data types: a GiST-indexable integer interval used to represent biological sequence features; a suffix tree type to search for maximum unique matches; and a compressed suffix array for fast short exact matches. In addition, PostBio provides a set of utilitary sequence routines.

PostStat comprises routines that compute a number of cumulative probability distributions, linear regression, and statistical tests, both parametric and non-parametric; the main motivation is to provide a way to test statistical hypothesis in simple models.