Landmarks is a simple profiling library for OCaml. It provides primitives to measure time spent in portion of instrumented code. The instrumentation of the code may either done by hand, automatically or semi-automatically using a PPX extension.
Published: 11 Dec 2018
Landmarks: A Simple Profiling Library
Landmarks is a simple profiling library for OCaml. It provides primitives to delimit portions of code and measure the performance of instrumented code at runtime. The available measures are obtained by aggregating CPU cycles (using the cpu's time stamp counter), applicative time (using
Sys.time) and allocated bytes (with
Gc.allocated_bytes). The instrumentation of the code may either done by hand, automatically or semi-automatically using a PPX extension.
During the execution of your program, the traversal of instrumented code by the control flow is recorded as a "callgraph" that carries the collected measures. The results may be browsed either directly on the console, or exported to json.
This tool is intended to be used as a way to find where the time is spent in your programs (and not benchmark independent pieces of code like Core_bench) while providing results that only correspond to the instrumented portion of your OCaml code (contrary to tools that directly work with the binary executable like gprof or perf).
For more information, you may browse the API.
opam install landmarks
opam install landmarks-viewer
for installing the landmarks viewer.
git clone https://github.com/LexiFi/landmarks.git cd landmarks dune build @install
make uninstall to remove installed files.
Usage with dune
Simply use the library
landmarks and the preprocessor
landmarks.ppx to benchmark your executables and libraries. For instance, the following
dune file builds the executable
test using the
landmarks library and its PPX.
(executable (name test) (libraries landmarks) (preprocess (pps landmarks.ppx)) )
You can find a sample program in the example directory.
Usage with ocamlfind
Compiling and linking:
ocamlfind ocamlopt -c -package landmarks prog.ml ocamlfind ocamlopt -o prog -package landmarks -linkpkg prog.cmx
You can replace "ocamlopt" by "ocamlc" to compile the program in bytecode.
With the PPX extension:
ocamlfind ocamlopt -c -package landmarks -package landmarks.ppx prog.ml ocamlfind ocamlopt -o prog -package landmarks -linkpkg prog.cmx
Launching the viewer (when available):
x-www-browser "$(ocamlfind query landmarks-viewer)/landmarks_viewer.html"
You can find a sample program in the example directory.
There are three main primitives:
val register: string -> landmark val enter: landmark -> unit val exit: landmark -> unit
register function declares new landmarks and should be used at the toplevel. The functions
exit are used to delimit the portion of code attached to a landmark. At the end of the profiling, we retrieve for each landmark the aggregated time information spent executing the corresponding piece of code. During the execution, a trace of each visited landmark is also recorded in order to build a "callgraph".
open Landmark let loop = register "loop" let sleep = register "sleep" let main = register "main" let zzz () = enter sleep; Unix.sleep 1; exit sleep let () = begin start_profiling (); enter main; enter loop; for _ = 1 to 9 do zzz () done; exit loop; zzz (); exit main; end
(This file can be compiled with
ocamlfind ocamlc -o prog -package landmarks -package unix -linkpkg prog.ml)
The induced callgraph is:
- 100.00% : main | - 90.00% : loop | | - 100.00% : sleep | - 10.00% : sleep
which can be paraphrased as:
100% of time is spent inside the main landmark,
90% of time spent inside the main landmark is spent in the loop landmark,
10% of time spent inside the main landmark is spent in the sleep landmark,
100% of the time spent in loop is spent in the sleep landmark.
The library provides a binding to the high-performance cycles counter for x86 32 and 64 bits architectures (note that you may use the
landmarks-noc.cm(x)a archive to provide your own implementation). It is used to measure the time spent inside instrumented code.
The PPX extension point
To avoid writing boilerplate code, you may use the ppx extension distributed with this package. It allows the programmer to instrument expressions using annotation and to automatically instrument top-level functions.
expr [@landmark "name"] is expanded into
Landmark.enter __generated_landmark_1; let r = try expr with e -> Landmark.exit __generated_landmark_1; raise e in Landmark.exit __generated_landmark_1; r
and the declaration
let __generated_landmark_1 = Landmark.register "name"
is appended at the top-level.
It should be pointed out that this transformation does not preserve tail-recursive calls (and also prevents some polymorphism generalization). To get around these problems, it is recommended to use the other provided extension around
let ... in and
let rec ... in:
let[@landmark] f = body
which is expanded in :
let __generated_landmark_2 = Landmark.register "f" let f = body let f x1 ... xn = Landmark.enter __generated_landmark_2; let r = try f x1 ... xn with e -> Landmark.exit __generated_landmark_2; raise e in Landmark.exit __generated_landmark_2; r
when the arity
f is obtained by counting the shallow occurrences of
fun ... -> and
function ... -> in
body. Please note that when using this annotation with let-rec bindings, only entry-point calls will be recorded. For instance, in the following piece of code
let () = let[@landmark] rec even n = (n = 0) || odd (n - 1) and[@landmark] odd n = (n = 1) || n > 0 && even (n - 1) in Printf.printf "'six is even' is %b\n" (even 6)
the landmark associated with "even" will be traversed exactly once (and not three times !) whereas the control flow will not pass through the landmark associated with "odd".
The structure annotations
[@@@landmark "auto"] and
[@@@landmark "auto-off"] activate or deactivate the automatic instrumentation of top-level functions in a module. In automatic mode, all functions declarations are implicitly annotated. Automatic instrumentation can be enabled/disabled for all files via option
OCAML_LANDMARKS, as detailed below.
The OCAML_LANDMARKS environment variable
The environment variable
OCAML_LANDMARKS is read at two different stages: when the ppx rewriter is executed, and when the landmarks module is loaded by an instrumented program. This variable is parsed as a comma-separated list of items of the form
During the ppx rewriter stage (at compilation time):
auto(no arguments): turns on the automatic instrumentation by default (behaves as if each module starts with annotation
threads(no arguments): tells the ppx extension to use the
Landmark_threadsmodule instead of the module
When loading an instrumented program (at runtime):
formatwith possible arguments:
json. It controls the output format of the profiling which is either a console friendly representation or json encoding of the callgraph.
thresholdwith a number between 0.0 and 100.0 as argument (default: 1.0). If the threshold is not zero the textual output will hide nodes in the callgraph below this threshold (in percent of time of their parent). This option is meaningless for other formats.
outputwith possible argument:
<file>is the path of a file). It tells where to output the results of the profiling. With
temporaryit will print it in a temporary file (the name of this file will be printed on the standard error). You may also use
temporary:<directory>to specify the directory where the files are generated.
debugwith no argument. Activates a verbose mode that outputs traces on stderr each time the landmarks primitives are called.
timewith no argument. Also collect
Sys.timetimestamps during profiling.
offwith no argument. Disable profiling.
onwith no argument. Enable profiling (default; may be omitted).
allocationwith no argument. Also collect
Browsing the JSON export using the Web Viewer
You can either compile the web viewer on your computer or browse it online. You need to load the JSON files using the filepicker and then you can click around to browse the callgraph.
Instrumenting with threads
Landmark module is not thread-safe. If you have multiple threads, you have to make sure that at most one thread is executing instrumented code. For that you may use the
Landmark_threads module (included in the landmarks-threads.cm(x)a archive) that prevents non thread-safe functions to execute in all threads but the one which started the profiling.
Instrumenting with OCAMLPARAM
A way to blindly instrument a project is to use ocaml's OCAMLPARAM experimental feature, by setting the environment variable OCAMLPARAM with
I=$(ocamlfind query landmarks),cma=landmarks.cma,cmxa=landmarks.cmxa,ppx=$(ocamlfind query landmarks)/ppx_landmarks,_"
The annotation on expressions may temper with polymorphism (this is not the case for the let-binding annotation). For instance, the following piece of code will fail to compile:
let test = (fun x -> x)[@landmark "test"] in test "string", test 1
This 'Landmarks' package is licensed by LexiFi under the terms of the MIT license.