# Simple Refinements Types for OCaml

sam. 03 octobre 2015 by simonFor more than one year, vulnerabilies in software (especially pervasive C software) have been disclosed at an alarmingly high rate. I love OCaml, which is definitely safer, but still has gaps left open. I believe formal verification, albeit a very powerful tool, is not mature enough for most programmers (too difficult to use, requires too much efforts), so I'm thinking about alternative solutions that would be more lightweight to define and use, but would still increase the confidence in source code.

So here is a draft of a relatively simple extension to OCaml, that allows to specify more invariants in types without requiring a full-fledged proof system. It's purely hypothetical and I did not implement it (nor don't I plan to do it, not before the end of my PhD). The purpose is to propose a conservative extension of OCaml with a type system that can represent more complex invariants, without putting too much of a burden on the user, in a way so adding annotations to existing programs incrementally is feasible and even easy.

The idea relies on *refinement types*, noted `{ ty | F }` where `ty` is
a regular OCaml type and `F` is a boolean formula (read "type `ty`
such that `F` holds). A regular type `ty` in a signature is short for `{ ty | true }`.

Now, the boolean formula `F` is inductively built from conjunction `and`,
disjunction `or`, negation `not`, and *atoms*. The atoms, for a
type `ty`, are OCaml values of type `ty -> bool` (actually, of a type
that can be specialized into `ty -> bool`, to handle polymorphism.
e.g. if `ty = int list`, and `p : 'a list -> bool`, then `{ ty | p }` is valid). The
predicates of type `ty -> bool` should be referentially transparent and
terminating (although we don't try to enforce this).

## Type Checking

Whenever `f : { ty | p } -> foo` is called on an expression `e : { ty | q }`,
a boolean proof obligation `q => p` is issued to a SAT-solver. If the
proof obligation fails, either an error is emitted, or a warning is
issued and some runtime check of the form `assert (p e); f e`
replaces `f e`.

Applying `f` to some "static" value `c` (e.g. the constant 2) that has
a bare type should, ideally, check `p c` at compile-time. A runtime check
can also be used if the compile-time check proves too complicated.

It would also be interesting to expose to the type checker some
logic properties about predicates, in the form of *rules*: given
two predicates `p : ty -> bool` and `q : ty -> bool`,
a rule would say `p => q` (for any `x : ty`, `p x` implies `q x`). The
type checker can use such affirmations to help
solving the proof obligations generated by functions calls. The rules
could be checked either by adding runtime checks, or using random testing,
or by formal proving.

## Type Inference

Given a function `let f x = <body>`, the type inference
infers the type `tau` of `x` normally, but it also collects
every points where a function of type `{tau | p}` is called on `x`
so the final type of x is `{ tau | p1 and p2 and ... and pn}` (in the
absence of any annotation, we find the empty conjunction, that is, `true`,
that is, `x : tau`. Hence the conservative extension).
The return value collects the annotations in a similar way, but joins
them using a disjunction.

In the `.mli`, the user can express more tight constraints on the
signature of functions and values, same as phantom types or
private aliases.

## Runtime Checking

The assertions that refine a type can be checked at runtime, if some compiler
flag is enabled or disabled (similar to `-no-assert`). Runtime checks
for a function `f : { ty1 | p1 } -> { ty2 | p2 }` look like this:

```
let f x =
assert (p1 x);
(* body of f *)
let res = ..... in
assert (p2 res);
res
```

## Usage

The refinement types as defined here are very simple, yet they can serve to encode interesting invariants in the large (proving programs in the small is left to more ambitious tools: extraction from Coq, why3, CFML etc.)

Examples:

- a
*balanced*predicate for AVL trees

```
module type AVL_SET = sig
type elt
type t = E | N of t * int * elt * t
val balanced : t -> bool
val empty : {t | balanced }
(* empty tree *)
val add : {t | balanced} -> elt -> {t | balanced}
val as_balanced : t -> {t | balanced} option
(* check the tree is balanced *)
val random : (Random.State.t -> elt) -> Random.State.t -> t
(* combine with as_balanced to generate balanced trees *)
end
```

- sorting arrays

```
val sorted : 'a array -> bool
val as_sorted : 'a array -> {'a array | sorted} option
(* check the array is sorted, and cast it *)
val sort : 'a array -> {'a array | sorted}
val binary_search : {'a array | sorted} -> 'a -> int option
(* Expects the array to be sorted *)
```

- encoding basic state machines (Ă la phantom types, but without redefining the type)

```
type t
val at_start : t -> bool
val at_middle : t -> bool
val at_stop : t -> bool
val start : unit -> {t | at_start}
val loop : {t | at_start or at_middle} -> {t | at_middle}
val stop : {t | at_start or at_middle} -> {t | at_stop}
```

### Combining Properties

It can work on third-party types without changing them: a function
accepting `int` also accepts any `{int | F}`. Some row polymorphism
is probably needed to *combine* properties. For instance, the predicate
`as_sorted: 'a array -> {'a array | sorted} option` should really
have the type `as_sorted : {'a array | 'p} -> {'a array | 'p and sorted} option`
not to lose any other properties the array already has.

Maybe `assert (p x); e` on a value `x: {ty | q}` should modify the
type of `x` to `{ty | p and q}` in `e`, or require
that x has type `{ty | p}` if `x` is an argument. This would make
the idiom `val as_foo : ty -> {ty | foo} option` redundant.

## Limitations

How to parametrize predicates with values, even known statically? Would be extremely useful for invariants parametrized by comparison function, for instance. Maybe the values can be existentially quantified in argument positions, and made opaque in return value positionâ€¦

## Conclusion

This is clearly just a rough idea, in dire need of refinement (no pun intended). However, I think it is both really simple on the type-checking/type inference side (compared to true formal verification) and easy to use, as a more powerful version of private aliases or phantom types to express simple invariants in type signatures. I would love to hear the opinion of people who have a good knowledge of OCaml's type-{checker, system}.

**Edit**: instead of a SAT-solver, a BDD library (binary decision diagrams) might
be simpler to use in the context of a type-checker.

I got many interesting pointers from people:

- (late) typestate in rust
- blame, coercions
- liquid haskell
- contract ocaml which looks very interesting in this perspective
- relation to abstract interpretation (the
*predicate domain*)