PGCon2012 - Slide release #12

PGCon 2012
The PostgreSQL Conference

Hitoshi Harada
Day Talks - 1 - Thursday - 2012-05-17
Room MRT 212
Start time 11:00
Duration 01:00
ID 404
Event type Lecture
Track Applications
Language used for presentation English


An open source machine learning library on RDBMS for Big Data age

MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.

The MADlib mission is to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. The library consists of various analytics methods including linear regression, logistic regression, k-means clustering, decision tree, support vector machine and more. That's not all; there is also super-efficient user-defined data type for sparse vector with a number of arithmetic methods. It can be loaded and run in PostgreSQL 8.4 to 9.1 as well as Greenplum 4.0 to 4.2. This talk covers its concept overall with some introductions to the problems we are tackling and the solutions for them. It will also contain some topics around parallel data processing which is very hot in both of research and commercial area these days.