Simple Refinements Types for OCaml

Tue 10 March 2015

For more than one year, vulnerabilies in software (especially pervasive C software) have been disclosed at an alarmingly high rate. I love OCaml, which is definitely safer, but still has gaps left open. I believe formal verification, albeit a very powerful tool, is not mature enough for most programmers (too difficult to use, requires too much efforts), so I'm thinking about alternative solutions that would be more lightweight to define and use, but would still increase the confidence in source code.

So here is a draft of a relatively simple extension to OCaml, that allows to specify more invariants in types without requiring a full-fledged proof system. It's purely hypothetical and I did not implement it (nor don't I plan to do it, not before the end of my PhD). The purpose is to propose a conservative extension of OCaml with a type system that can represent more complex invariants, without putting too much of a burden on the user, in a way so adding annotations to existing programs incrementally is feasible and even easy.

The idea relies on refinement types, noted { ty | F } where ty is a regular OCaml type and F is a boolean formula (read "type ty such that F holds). A regular type ty in a signature is short for { ty | true }.

Now, the boolean formula F is inductively built from conjunction and, disjunction or, negation not, and atoms. The atoms, for a type ty, are OCaml values of type ty -> bool (actually, of a type that can be specialized into ty -> bool, to handle polymorphism. e.g. if ty = int list, and p : 'a list -> bool, then { ty | p } is valid). The predicates of type ty -> bool should be referentially transparent and terminating (although we don't try to enforce this).

Type Checking

Whenever f : { ty | p } -> foo is called on an expression e : { ty | q }, a boolean proof obligation q => p is issued to a SAT-solver. If the proof obligation fails, either an error is emitted, or a warning is issued and some runtime check of the form assert (p e); f e replaces f e.

Applying f to some "static" value c (e.g. the constant 2) that has a bare type should, ideally, check p c at compile-time. A runtime check can also be used if the compile-time check proves too complicated.

It would also be interesting to expose to the type checker some logic properties about predicates, in the form of rules: given two predicates p : ty -> bool and q : ty -> bool, a rule would say p => q (for any x : ty, p x implies q x). The type checker can use such affirmations to help solving the proof obligations generated by functions calls. The rules could be checked either by adding runtime checks, or using random testing, or by formal proving.

Type Inference

Given a function let f x = <body>, the type inference infers the type tau of x normally, but it also collects every points where a function of type {tau | p} is called on x so the final type of x is { tau | p1 and p2 and ... and pn} (in the absence of any annotation, we find the empty conjunction, that is, true, that is, x : tau. Hence the conservative extension). The return value collects the annotations in a similar way, but joins them using a disjunction.

In the .mli, the user can express more tight constraints on the signature of functions and values, same as phantom types or private aliases.

Runtime Checking

The assertions that refine a type can be checked at runtime, if some compiler flag is enabled or disabled (similar to -no-assert). Runtime checks for a function f : { ty1 | p1 } -> { ty2 | p2 } look like this:

let f x =
  assert (p1 x);
  (* body of f *)
  let res = ..... in
  assert (p2 res);
  res

Usage

The refinement types as defined here are very simple, yet they can serve to encode interesting invariants in the large (proving programs in the small is left to more ambitious tools: extraction from Coq, why3, CFML etc.)

Examples:

  • a balanced predicate for AVL trees
module type AVL_SET = sig
    type elt

    type t = E | N of t * int * elt * t

    val balanced : t -> bool

    val empty : {t | balanced }
    (* empty tree *)

    val add : {t | balanced} -> elt -> {t | balanced}

    val as_balanced : t -> {t | balanced} option
    (* check the tree is balanced *)

    val random : (Random.State.t -> elt) -> Random.State.t -> t
    (* combine with as_balanced to generate balanced trees *)
end
  • sorting arrays
val sorted : 'a array -> bool

val as_sorted : 'a array -> {'a array | sorted} option
(* check the array is sorted, and cast it *)

val sort : 'a array -> {'a array | sorted}

val binary_search : {'a array | sorted} -> 'a -> int option
(* Expects the array to be sorted *)
  • encoding basic state machines (Ă  la phantom types, but without redefining the type)
type t

val at_start : t -> bool
val at_middle : t -> bool
val at_stop : t -> bool

val start : unit -> {t | at_start}

val loop : {t | at_start or at_middle} -> {t | at_middle}

val stop : {t | at_start or at_middle} -> {t | at_stop}

Combining Properties

It can work on third-party types without changing them: a function accepting int also accepts any {int | F}. Some row polymorphism is probably needed to combine properties. For instance, the predicate as_sorted: 'a array -> {'a array | sorted} option should really have the type as_sorted : {'a array | 'p} -> {'a array | 'p and sorted} option not to lose any other properties the array already has.

Maybe assert (p x); e on a value x: {ty | q} should modify the type of x to {ty | p and q} in e, or require that x has type {ty | p} if x is an argument. This would make the idiom val as_foo : ty -> {ty | foo} option redundant.

Limitations

How to parametrize predicates with values, even known statically? Would be extremely useful for invariants parametrized by comparison function, for instance. Maybe the values can be existentially quantified in argument positions, and made opaque in return value position…

Conclusion

This is clearly just a rough idea, in dire need of refinement (no pun intended). However, I think it is both really simple on the type-checking/type inference side (compared to true formal verification) and easy to use, as a more powerful version of private aliases or phantom types to express simple invariants in type signatures. I would love to hear the opinion of people who have a good knowledge of OCaml's type-{checker, system}.

Edit: instead of a SAT-solver, a BDD library (binary decision diagrams) might be simpler to use in the context of a type-checker.

I got many interesting pointers from people:

Category: ocaml Tagged: ocaml types refinement


Batch Operations on Collections

Wed 11 June 2014

Some very common (and useful) operations, including the classic map, filter, and flat_map, traverse their whole argument and return another collection. When several such operations are composed, intermediate collections will be created and become useless immediately after. Languages like Haskell sometimes perform optimizations that merge together the operations so as ...

Category: ocaml Tagged: ocaml flat_map collections performance batch gadt

Read More

Representing Lazy Values

Fri 02 May 2014

A quick survey of Lazy in OCaml

I remember having heard that Jane Street had its own implementation of Lazy values, and that it was faster than the standard one. So I played a bit with several ways of managing lazy values in OCaml.

Definition of Lazy implementations

First we ...

Category: ocaml Tagged: ocaml lazy obj performance

Read More

Go, C++!!

Tue 18 December 2012

I've recently read an interesting article which shows an example of concurrency implemented in 3 differenet languages, namely Go, Erlang and C++. While the Erlang and Go examples seemed clear and concise, the C++ one looks long and hard to understand. The reason behind this complexity is that C ...

Category: concurrency Tagged: c++11 concurrency go

Read More

The inside of a curve

Wed 05 December 2012

Hi, I am shuba, and I'll be using this blog to dicuss various computer graphics topics I find interesting. I might also disgress on C++, which is my main programming language.

For my first article, I'll try and answer a question that's not as simple as it ...

Category: graphics Tagged: topology graphics 2D

Read More

Tail-recursive map in OCaml

Mon 03 December 2012

The List.map function of the OCaml standard lib is not implemented tail-recursively. The current version (as of 4.00 on my computer) is

let rec map f = function
    [] -> []
  | a::l -> let r = f a in r :: map f l

It's pretty efficient on small lists, but blows up ...

Category: ocaml Tagged: ocaml unsafe optimization

Read More
Page 1 of 2

Next »