Module Spelll

module Spelll: sig .. end

Levenshtein distance and index

We take inspiration from this blog for the main algorithm and ideas. However some parts are adapted



Abstraction over Strings

Due to the existence of several encodings and string representations we abstract over the type of strings. A string is a finite array of characters (8-bits char, unicode runes, etc.) which provides a length operation and a function to access the n-th character.
module type STRING = sig .. end

Continuation list

This data structure is used to represent a list of result that is evaluated only as far as the user wants. If the user only wants a few elements, she doesn't pay for the remaining ones.

In particular, when matching a string against a (big) set of indexed strings, we return a continuation list so that, even if there are many results, only those actually asked for are evaluated.

type 'a klist = [ `Cons of 'a * (unit -> 'a klist) | `Nil ] 
val klist_to_list : 'a klist -> 'a list
Helper.

Signature

The signature for a given string representation provides 3 main things:

A possible use of the index could be:
open Batteries;;

let words = File.with_file_in "/usr/share/dict/english"
  (fun i -> IO.read_all i |> String.nsplit ~by:"\\n");;

let words = List.map (fun s->s,s) words;;
let idx = Spelll.Index.of_list words;;

Spelll.Index.retrieve ~limit:1 idx "hell" |> Spelll.klist_to_list;;

Here we use Batteries to read a dictionary file into a list of words; then we create an index that maps every string to itself (a set of strings, really), and finally we find every string at distance at most 1 from "hell" (including "hello" for instance).

module type S = sig .. end
module Make: 
functor (Str : STRING) -> S with type string_ = Str.t and type char_ = Str.char_
include Spelll.S
val debug_print : Pervasives.out_channel -> automaton -> unit