We take inspiration from
for the main algorithm and ideas. However some parts are adapted
module type STRING =
This data structure is used to represent a list of result that is evaluated only as far as the user wants. If the user only wants a few elements, she doesn't pay for the remaining ones.
In particular, when matching a string against a (big) set of indexed
strings, we return a continuation list so that, even if there are many results,
only those actually asked for are evaluated.
[ `Cons of 'a * (unit -> 'a klist) | `Nil ]
val klist_to_list :
'a klist -> 'a list
The signature for a given string representation provides 3 main things:
edit_distancefunction to compute the edit distance between strings
automatontype that is built from a string
sand a maximum distance
n, and only accepts the strings
edit_distance s s' <= n.
Indexmodule that can be used to map many strings to values, like a regular string map, but for which retrieval is fuzzy (for a given maximal distance).
open Batteries;; let words = File.with_file_in "/usr/share/dict/english" (fun i -> IO.read_all i |> String.nsplit ~by:"\\n");; let words = List.map (fun s->s,s) words;; let idx = Spelll.Index.of_list words;; Spelll.Index.retrieve ~limit:1 idx "hell" |> Spelll.klist_to_list;;
Here we use Batteries
to read a dictionary file into a list of words; then we create an index that
maps every string to itself (a set of strings, really), and finally
we find every string at distance at most 1 from "hell" (including "hello"
module type S =
val debug_print :
Pervasives.out_channel -> automaton -> unit