# Introduction to Automated Theorem Proving with Logtk

jeu. 24 juillet 2014 by simonMy PhD work is centered around
automated theorem proving
in first-order logic.
This is obviously a very cool topic (otherwise I wouldn't have focused on it),
so this post is a crash course (but the program won't crash because
I use `OCaml`) on one of the most classic method to prove (some) theorems
automatically. I named... *resolution*!

The goal is to prove some (not too complicated) theorems automatically.
In other words, we want a program that reads a bunch of axioms and a formula
that we conjectured is a theorem following from the axioms,
and then tries to produce a proof of the theorem. In practice it's almost always
done by *refutation*: to prove that Γ ⊢ F (formula F is a theorem
under axioms Γ), we try to deduce ⊥ (false) from Γ, ¬F). The
applications for this kind of technology are multiple, but afaik
the prominent one is *software verification* — aims at formally proving that
a program satisfies a specification ("not crashing" is a good start).
There has been a lot of research in this area for decades, but the problem
is extremely hard (only *semi-decidable*: you will find a solution eventually
if the problem is a theorem, but might run forever otherwise).

In this post I will mostly present the code for a very simple (and naive, too)
theorem prover in OCaml, using my library
Logtk. I assume
the reader has some basic knowledge of first-order logic (quantifiers ∀ and
∃, logic connectives ¬, ∨, ∧ and ⇒, and the notion of **term**).
The code is available in Logtk itself
(raw version). If you
installed the most recent version (0.5.1) of `Logtk`, it should compile
using

```
$ ocamlbuild -use-ocamlfind -package logtk,logtk.parsers \
resolution1.native
```

You can test it on easy problems defined in the TPTP syntax, for instance some of the Pelletier Problems (some of which are too hard for the prover!). TPTP is also an archive with literally thousands of problems (from easy to very hard) in the common syntax described above.

## Preamble

Rename a few modules for convenience. `CCError` comes from
ocaml-containers.

```
module Err = CCError
module T = Logtk.FOTerm
module F = Logtk.Formula.FO
module Substs = Logtk.Substs
module Unif = Logtk.Unif
module Util = Logtk.Util
```

## Basic Blocks

We start with basic building blocks that are mostly provided by `Logtk`.
Resolution is a *clausal calculus*, that is, it deals with first-order
clauses. A clause is a disjunction of *literals* (atomic formulas
or negated atomic formulas). Let's see.

First, we define a global **signature**
(maps symbols such as f, parent_of or greater
to their type). Every symbol has exactly one type. The initial signature is
the `TPTP` signature (logic connectives)

```
let _signature = ref Logtk.Signature.TPTP.base
```

We do not have to do anything about terms, because they are already defined in
`Logtk.FOTerm` (which was renamed `T` above). Terms are either variables or
applications of a constant (symbol) to a list of sub-terms.

Some examples of terms would be (capitalized letter are variables):

- Y (a variable)
- the_universe (a constant)
- f(X, g(X,a)) (function applications)
- age_of(grandmother_of(frida))

### Literals

Then we have to represent literals, because `Logtk` doesn't (the
representation would be too specific). A literal is an atomic proposition
(term of type `$o` in `TPTP`, i.e. the type of
propositions), or its negation. We represent this as a pair of
type `FOTerm.t * bool` (term + sign).

Examples:

- older_than(obama, bieber)
- ¬ lives_in(paris, poutine)

```
module Lit = struct
type t = T.t * bool
(** We also define a few basic comparison and printing functions.
Comparison functions are used by many data structures;
Printing is useful for informing the user of results or
for debugging. *)
let compare = CCOrd.pair T.cmp CCOrd.bool_
let equal a b = (compare a b) = 0
let pp buf (t,b) =
Printf.bprintf buf "%s%a"
(if b then "" else "¬") T.pp t
end
```

### Clauses

A clause is a disjunction ("or") of literals. We will simply use a list of literals.

Examples:

- ¬ lives_in(paris, X) ∨ eats_baguette(X) (means "forall X, if X lives in Paris then X eats baguette")
- greater_than(successor(X), X) (property on integers)

The whole Peano arithmetic (excluding induction which is not first-order logic) would look like:

- nat(0)
- X = X
- ¬ (X = Y) ∨ Y = X
- ¬ (X = Y) ∨ ¬ (Y = Z) ∨ (X = Z)
- ¬ nat(X) ∨ ¬ (X = Y) ∨ nat(Y)
- nat(succ(N))
- ¬ (succ(N) = 0)
- ¬ (succ(M) = succ(N)) ∨ (M = N)

```
module Clause = struct
type t = Lit.t list
let make l = CCList.Set.uniq ~eq:Lit.equal l
let compare = CCOrd.list_ Lit.compare
let equal a b = compare a b = 0
let is_trivial c =
List.exists
(fun (t,b) ->
b &&
List.exists (fun (t',b') -> not b' && T.eq t t') c
) c
let apply_subst ~renaming subst c s_c =
let c = List.map
(fun (t,b) -> Substs.FO.apply ~renaming subst t s_c, b)
c
in make c
(** printing a clause: print literals separated with "|" *)
let pp buf c = CCList.pp ~sep:" | " Lit.pp buf c
(** Conversion from list of atomic formulas.
type: [Formula.t list -> clause] *)
let _of_forms c =
let _atom f = match F.view f with
| F.Not f' ->
begin match F.view f' with
| F.Atom t -> t,false
| _ -> failwith "unsupported formula"
end
| F.Atom t -> t, true
| _ -> failwith "unsupported formula"
in
make (List.map _atom c)
end
```

Some parts of this module introduce new concepts. First, **triviality**,
then, **substitutions**.

- A clause is trivial if it contains both a literal and its opposite. It means
the clause is tautological, that is, always true; we can dispose of it because
resolution is about
**refutation**(deduce ⊥ from hypothesis). The function`Clause.is_trivial`checks whether this simple criterion holds. - A substitution maps some variables to terms. Here the function
`Clause.apply_subst`will be used to**apply**the substitution to a clause — replace variables of the clause by their image in the substitution (or keep them unchanged if they do not appear in the substitution. Substitutions are pre-defined in Logtk, and applying a substitution to a term is defined too (the function`Subst.FO.apply`that applies a substitution to a first-order term)

## Managing the Proof State

We have defined basic types, so we are ready to deal with more serious
problems. The **resolution calculus** is based on **saturation**. It
means that, given some *inference rules*, that deduce clauses from other
clauses (deduction), we compute the least fix point of a set S of clauses
with respect to those rules.

In other words, every time we can deduce a new clause C using
inferences on the set S, we add C to S. The process stops
when we find the **empty clause** (equivalent to ⊥, or "false")
or when a fixpoint is reached (every clause we deduce is already
in the set S).

In practice, we use the so-called "given clause algorithm".
The *proof state* is composed of two disjoint sets:

- the
*active set*contains clauses that have been processed (they are "active clauses"). It means we already made all possible inferences between the active clauses. - the
*passive set*contains clauses that have not been processed yet. Initially it contains all the input clauses (those from the problem to solve).

The main loop will transfer clauses from the passive set, to the active set, one-by-one. The current clause is called "given clause" (hence the name).

### Utils

We need a few more types and modules to deal with the sets of clauses:

- A type
`Clause.t * int`is used to refer to a specific literal within a specific clause. We will see why later. See the module`ClauseWithPos`. - A
*term index*is used to query those literals by their term. Indexing is a crucial part of any real theorem prover. An index is basically a multimap from`FOTerm.t`to`Clause.t * int`. When we process a clause c, for each literal`(term,sign)`at position i in the clause c, we add the binding term → (c, i) into the index. Later we will be able to retrieve the pair (c,i) using any term that**unifies**with term.

```
module ClauseWithPos = struct
type t = Clause.t * int
let compare = CCOrd.pair Clause.compare CCInt.compare
end
module Index = Logtk.NPDtree.MakeTerm(ClauseWithPos)
(** Set of clauses. Easy to define thanks to {!Clause.compare} *)
module ClauseSet = Set.Make(Clause)
```

### Sets of Clauses

- We keep an index,
`_idx`, over every atomic term in the set of active clauses; - We also keep the set of those clauses to be able to check whether a new clause is already processed or not;
- Last, a queue is used for
*passive clauses*.

The exception `Unsat` is used for early exit, in case the empty clause
is found.

```
let _idx = ref (Index.empty())
let _active_set = ref ClauseSet.empty
let _passive_set = Queue.create()
exception Unsat
(** add [c] to the passive set, if not already present in
the active set nor it is trivial. *)
let _add_passive c =
if c = [] then raise Unsat
else if Clause.is_trivial c
then (
Util.debug 4 "clause %a is trivial" Clause.pp c;
)
else if not (ClauseSet.mem c !_active_set)
then (
Util.debug 4 "new passive clause %a" Clause.pp c;
Queue.push c _passive_set
)
(** When we process a clause [c], we put it into the
active set (set of processed clauses). That also
means every literal [(term,sign)]
at index [i] will go into the index, so we can
retrieve [c] by its literals later.
*)
let _add_active c =
_active_set := ClauseSet.add c !_active_set;
List.iteri
(fun i (t,_) -> _idx := Index.add !_idx t (c,i))
c
```

## The Resolution Calculus

### Inference rules: Explanations

Here we are at long last! Resolution, a very old calculus (back to the sixties,
when Robinson invented it), only requires two inference rules
to be *complete* (i.e., be able to **eventually** prove any theorem).
Those rules are **factoring** and **resolution**.

The **factoring** rule looks like:

A ∨ A' ∨ C --------------- σ (A' ∨ C) if σ(A) = σ(A')

It means means that if the clause has two positive literals `A` and `A'`
with some substitution σ, such that σ(A) = σ (A'),
then we can *factor* those literals into σ(A) provided we also
apply σ to the rest of the clause. This kind of rule
reads from top (premises) to bottom (conclusion).

The **resolution** rule between two clauses a ∨ C and ¬ a' ∨ D,
where a and a' are literals and C, D clauses, is

A ∨ C ¬A' ∨ D ------------------ σ(C ∨ D) if σ(A) = σ(A')

This rule "resolves" together two complementary literals in two clauses (assuming those clauses do not share variables).

Let us explain in the propositional case (ignoring variables), assuming
*a* = *a*’
. The idea is, roughly:

- We know that either a or either ¬ a is true (excluded middle)
- If a is true, it means that ¬
*a*’∨*D*can only be true if D is true (since a = a' = ⊤). Therefore D must be true. - If a is false, then
*a*∨*C*can only be true if C is true; therefore C holds. - By excluded middle one of those must be true, so in any case C ∨ D is true. Hence the conclusion.

For the first-order case, we compute the *most general unifier* of
a and a' (if it exists), and call this unifier substitution σ.
Then, the reasoning is the same as in the propositional case since
the literals are actually equal.

**Note**: the 0 and 1 are *scopes*, a trick I use to avoid actually
renaming variables in one of the clauses. More details can be found
in the documentation for `Substs` or in the talk I gave at PAAR 2014.

### Inference Rules: implementation

The corresponding code:

```
let _factoring c =
List.iteri
(fun i (t,b) ->
if b then List.iteri
(fun j (t',b') ->
(** Only try the inference if the two literals have
positive sign. The restriction [i < j] is used
not to do the same inference twice (symmetry).
*)
if i<j && b'
then try
let subst = Unif.FO.unification t 0 t' 0 in
(** Now we have subst(t)=subst(t'),
the inference can proceed *)
let c' = CCList.Idx.remove c i in
let renaming = Substs.Renaming.create() in
(** Build the conclusion of the inference (removing
one of the factored literals *)
let c' = Clause.apply_subst ~renaming subst c' 0 in
Util.debug 3 "factoring of %a ----> %a"
Clause.pp c Clause.pp c';
(** New clauses go into the passive set *)
_add_passive c'
with Unif.Fail -> ()
) c
) c
let _resolve_with c =
List.iteri
(fun i (t,b) ->
(** Retrieve within the index, mappings
[term -> (clause,index)]
such that [term] unifies with [t].
0 and 1 are again scopes. *)
Index.retrieve_unifiables !_idx 0 t 1 ()
(fun () _t' (d,j) subst ->
let (_,b') = List.nth d j in
(** We have found [_t'], and a pair [(d, j)] such
that [d] is another clause, and the
[j]-th literal of [d] is [_t', b']).
If [b] and [b'] are complementary we are in
the case where resolution applies.
*)
if b<>b'
then (
let renaming = Substs.Renaming.create() in
(** Build the conclusion clause, merging the
remainders [c'] and [d'] (which live respectively
in scope 1 and 0) of the clauses together after
applying the substitution. *)
let concl =
(let c' = CCList.Idx.remove c i in
Clause.apply_subst ~renaming subst c' 1)
@
(let d' = CCList.Idx.remove d j in
Clause.apply_subst ~renaming subst d' 0)
in
(** Simplify the resulting clause (remove duplicate
literals) and add it into the passive set,
to be processed later *)
let concl = Clause.make concl in
Util.debug 3 "resolution of %a and %a ---> %a"
Clause.pp c Clause.pp d Clause.pp concl;
_add_passive concl
)
)
) c
```

### Saturation Loop

Main saturation algorithm, a simple "given clause" loop. This is the outer loop of the resolution procedure: given an initial set of clauses S, the algorithm does:

- add all the clauses into the passive set
- while some passive clauses remain unprocessed, pick one of them,
call it C, and then do the following:
- add C into the active set
- perform inferences between C and the active set (including C itself)
- add the resulting new clauses to S.

- if at any point the empty clause ⊥ is found, then the initial set of clauses is unsatisfiable (absurd).
- otherwise, if the loop stops, we have computed a fixpoint of the initial clauses with respect to inferences without finding ⊥, which means the original set of clauses is satisfiable (admits a model)

```
let _saturate clauses =
List.iter _add_passive clauses;
try
while not (Queue.is_empty _passive_set) do
let c = Queue.pop _passive_set in
(** Is the clause [c] suitable for processing?
It must not be processed yet and
not be trivial either. *)
if not (Clause.is_trivial c) &&
not (ClauseSet.mem c !_active_set)
then (
Util.debug 2 "given clause: %a" Clause.pp c;
_add_active c;
_resolve_with c;
_factoring c;
)
done;
`Sat
with
| Unsat -> `Unsat
```

## Main, Options, and other Boring Stuff

We only need to define the glue code that reads a file, converts it
into clauses, and calls `saturate` to do the real job. Note the
use of an error monad. `Logtk` provides type inference and an algorithm
to transform arbitrary formulas to clauses ("CNF").

```
(** Read the problem to solve from the file [f],
(try to) solve it and return the result.
We use an error monad to make error handling easier (the
function [>>=] is a {i monadic bind}). *)
let process_file f =
Util.debug 2 "process file %s..." f;
let open Err in
let res =
(** parse the file in the TPTP format *)
Logtk_parsers.Util_tptp.parse_file ~recursive:true f
(** Perform type inference and type checking (possibly updating
the signature) *)
>>= Logtk_parsers.Util_tptp.infer_types (`sign !_signature)
(** CNF ("clausal normal form"). We transform
arbitrary first order formulas into a set of
clauses (see the {!Clause} module)
because resolution only works on clauses.
This algorithm is already implemented in {!Logtk}. *)
>>= fun (signature, statements) ->
let clauses =
Logtk_parsers.Util_tptp.Typed.formulas statements in
let clauses = Sequence.to_list clauses in
(** A way to create fresh symbols for {i Skolemization} *)
let ctx = Logtk.Skolem.create ~prefix:"sk" signature in
let clauses = Logtk.Cnf.cnf_of_list ~ctx clauses in
let clauses = CCList.map Clause._of_forms clauses in
_signature := Logtk.Skolem.to_signature ctx;
(** Perform saturation (solve the problem) *)
Err.return (_saturate clauses)
in
match res with
| `Error msg ->
print_endline msg;
exit 1
| `Ok `Sat -> print_endline "sat"
| `Ok `Unsat -> print_endline "unsat"
(** Parse command-line arguments, including the file to process *)
let _options = ref (
[] @ Logtk.Options.global_opts
)
let _help = "usage: resolution file.p"
let _file = ref None
let _set_file f = match !_file with
| None -> _file := Some f
| Some _ -> failwith "can only deal with one file"
let main () =
Arg.parse !_options _set_file _help;
match !_file with
| None -> print_endline _help; exit 0
| Some f -> process_file f
let () = main()
```

## Conclusion

I wrote this program in a short lapse of time, to illustrate
how `Logtk` could be used. The result is very naive and has no chance of
competing with real provers (such as E). Still, I hope
this post will shine some light on the domain of automated theorem
proving and maybe — who knows? — get some people interested in the domain.
I should point out that I wrote a more serious prover, Zipperposition,
using Logtk.

thanks to nicoo and Enjolras on freenode for their second reading.