PGCon2011 - Add 4 Video (2015.09.18)
PGCon 2011
The PostgreSQL Conference
| Speakers | |
|---|---|
|
|
Luis Carvalho |
| Schedule | |
|---|---|
| Day | Talks - 2 - 2011-05-20 |
| Room | DMS 1140 |
| Start time | 13:30 |
| Duration | 01:00 |
| Info | |
| ID | 332 |
| Event type | Lecture |
| Track | Applications |
| Language used for presentation | English |
Doing Bioinformatics in PostgreSQL
We introduce and describe two modules that grew from the need to perform integrated and efficient Bioinformatics tasks in PostgreSQL: PostBio, a set of methods to store and query genomic sequences and features, and PostStat, a collection of statistical functions that allow for integrated statistical tests. A few practical examples will be presented to showcase the modules.
PostBio includes three data types: a GiST-indexable integer interval used to represent biological sequence features; a suffix tree type to search for maximum unique matches; and a compressed suffix array for fast short exact matches. In addition, PostBio provides a set of utilitary sequence routines.
PostStat comprises routines that compute a number of cumulative probability distributions, linear regression, and statistical tests, both parametric and non-parametric; the main motivation is to provide a way to test statistical hypothesis in simple models.