PGCon2009 - Final Release

PGCon 2009
The PostgreSQL Conference

Speakers
Euler Taveira de Oliveira
Schedule
Day Talks - first day - 2009-05-21
Room DMS 1160
Start time 13:30
Duration 01:00
Info
ID 154
Event type Lecture
Track Advanced Features
Language used for presentation English

pg_similarity

Functions and Operators for Executing Similarity Queries

Similarity query is a fundamental operation in many application areas, such as data integration and cleaning, bioinformatics, and pattern recognition. pg_similarity is a tool that makes available user-friendly methods such as functions and operators for similarity queries. More than a dozen of functions are currently available.

There has been considerable interest in similarity queries in the research community recently. Similarity query is a fundamental operation in many application areas, such as data integration and cleaning, bioinformatics, and pattern recognition. pg_similarity is a tool that makes available user-friendly methods for similarity queries.

pg_similarity is a set of functions and operators for matching similar strings. The following functions are available: Block Distance, Cosine, Dice, Euclidean, Hamming, Jaccard, Jaro, Jaro-Winkler, Monge-Elkan, Needleman-Wunsch, q-Gram, Smith-Waterman, Smith-Waterman-Gotoh, and Soundex. A set of auxiliary functions are available too. They allows a flexible control over the similarity thresholds, tokenizer, and normalization of each function.

It will be released as BSD licensed at pgfoundry soon. The not-yet-released code could be downloaded from http://www.inf.ufrgs.br/~etoliveira/pg_similarity/