OCaml Compiler Hacking: how to add a primitive

Thu 06 August 2015

I have been hacking on the OCaml compiler recently; in particular, I added some support for coloring warning/error messages. At some point during the discussion over this pull request, it became clear that colored output should only be enabled if stderr was an interactive terminal (in opposition to a regular file handle or whatnot). The compiler does not link with the Unix library, so I finally decided to add a primitive caml_sys_is_interactive. My purpose here is to explain how it works (from what I gathered) and how to do it.

I am not a compiler expert, some of the explanations here might be wrong or misleading. If you spot mistakes I will be happy to fix them.

What is a primitive?

OCaml allows C functions to be used directly as primitives, that is, as basic operations of the language. The stdlib is full of such functions.

A primitive (generally named caml_foo since C is not namespaced) is generally located in a C file in byterun/ (the directory containing the interpreter and runtime of OCaml). It is prefixed with CAMLprim and should take and return only C values of type value (that is, an OCaml value, as declared in byterun/caml/mlvalues.h).

Let's take an example: Sys.file_exists : string -> bool. The module sys.ml contains the following signature:

external file_exists : string -> bool = "caml_sys_file_exists"

and there is, in byterun/sys.c, the function caml_sys_file_exists (here in a simplified form):

CAMLprim value caml_sys_file_exists(value name)
    struct stat st;

    char *p = String_val(name);
    int ret = stat(p, &st);

    return Val_bool(ret == 0);

This function uses macros defined in byterun/caml/mlvalues.h to convert between OCaml values and C values, but this is not the point of this post.

note: I have no idea how CAMLprim works, but there is a lot of magical automation that extracts a list of all primitives, exports their names in C arrays, etc. A primitive is a C function shipped with the runtime or a library (such as Unix), whereas some other functions are CAMLexport or CAMLlocal (I don't know exactly what that means).

edit: @gasche points that all the CAMLfoo primitives are documented in the official manual. In particular, writing C "stubs" (C functions that can be used directly from OCaml via a external declaration) requires, in general, using CAMLparam and CAMLlocal to ensure that the GC is aware that some values exist (either as parameters to the function, or as local variables).

How to add a primitive: Bootstrap!

It is a bad™ idea to add a primitive to, say, byterun/sys.c and use it in the compiler immediately. I tried it, and it failed to compile. The correct way, as I learnt from Jérémie Dimino (@diml) and Thomas Refis, is as follows:

  1. add the primitive into the relevant .c file, but do not use it yet anywhere in the compiler.

  2. make world. This compiles the interpreter and bytecode compiler.

  3. make bootstrap. This updates the bytecode archives (in boot/) of ocamlc, ocamldep, and ocamllex.

    Then, commit those new archives, as they will be needed to compile the compiler.

  4. make changes that use the new primitive, such as fancy coloring system. Using the primitive first requires to declare it as external foo : a -> b = "caml_foo" in one or more files.

    In other projects, ctypes can be used instead of writing primitives by hand, depending on the programmers' preference.

  5. make world world.opt tests should now work properly.

  6. argue with @gasche about whether the primitive is useful or not ;-)

Category: ocaml Tagged: ocaml compiler primitive C

Simple Refinements Types for OCaml

Tue 10 March 2015

For more than one year, vulnerabilies in software (especially pervasive C software) have been disclosed at an alarmingly high rate. I love OCaml, which is definitely safer, but still has gaps left open. I believe formal verification, albeit a very powerful tool, is not mature enough for most programmers (too ...

Category: ocaml Tagged: ocaml types refinement

Read More

Batch Operations on Collections

Wed 11 June 2014

Some very common (and useful) operations, including the classic map, filter, and flat_map, traverse their whole argument and return another collection. When several such operations are composed, intermediate collections will be created and become useless immediately after. Languages like Haskell sometimes perform optimizations that merge together the operations so as ...

Category: ocaml Tagged: ocaml flat_map collections performance batch gadt

Read More

Representing Lazy Values

Fri 02 May 2014

A quick survey of Lazy in OCaml

I remember having heard that Jane Street had its own implementation of Lazy values, and that it was faster than the standard one. So I played a bit with several ways of managing lazy values in OCaml.

Definition of Lazy implementations

First we ...

Category: ocaml Tagged: ocaml lazy obj performance

Read More

Go, C++!!

Tue 18 December 2012

I've recently read an interesting article which shows an example of concurrency implemented in 3 differenet languages, namely Go, Erlang and C++. While the Erlang and Go examples seemed clear and concise, the C++ one looks long and hard to understand. The reason behind this complexity is that C ...

Category: concurrency Tagged: c++11 concurrency go

Read More

The inside of a curve

Wed 05 December 2012

Hi, I am shuba, and I'll be using this blog to dicuss various computer graphics topics I find interesting. I might also disgress on C++, which is my main programming language.

For my first article, I'll try and answer a question that's not as simple as it ...

Category: graphics Tagged: topology graphics 2D

Read More
Page 1 of 2

Next »