wu-manber-fuzzy-search

Wu-Manber approximate string matching

wu-manber index

The library can be used to search for a keyword/pattern in a body of text while allowing for spelling errors. We use Levenshtein distances as the notion of spelling errors, and the functions in the library take some error limit k and searches for substrings in the text with Levenshtein distance less than k from the pattern.

Library wu-manber

The entry point of this library is the module: Wu_Manber.

Wu and Manber Algorithm Variants

Two variants of the algorithm are provided.

  1. The original version from Wu and Manber's technical report.
  2. A right-leaning variant, where delete edits are skipped at the end of the pattern unless at the very end of text. This reports better edit distances is some circumstances.

The right-leaning variant is guaranteed to find a match if the original algorithm finds a match, and the error count reported by the variant is guaranteed to be no worse than the original. But the variant is a little harder to use since extra work is needed to check for matches at the end of the text.

High level Interface Example

We currently only provide the functions in Wu_Manber.StringSearch as a high-level interface for using the library.

The search function in this module stop after the first match, and returns the error count together with the number of characters of the text read by the algorithm.

# #require "wu-manber";;
# open Wu_Manber;;
# StringSearch.(search ~k:2 ~pattern:"abcd" ~text:"abcd" |> report);;
- : string = "Pattern matched with 2 errors at character 2 of text"
# StringSearch.(search ~k:2 ~pattern:"abcd" ~text:"abd" |> report);;
- : string = "Pattern matched with 2 errors at character 2 of text"
# StringSearch.(search_right_leaning ~k:2 ~pattern:"abcd" ~text:"abcd" |> report);;
- : string = "Pattern matched with 0 errors at character 4 of text"
# StringSearch.(search_right_leaning ~k:2 ~pattern:"abcd" ~text:"abd" |> report);;
- : string = "Pattern matched with 1 errors at character 3 of text"

Mid Level Interface Example

We currently only provide the functors in Wu_Manber.FirstMatch as a mid-level interface.

Examples of how this interface is used can be found in the code for Wu_Manber.StringSearch.

Low Level Interface Example

An example of how to use the low level interfaces of the library can be found in the module: Wu_Manber.FirstMatch.