package memcad

  1. Overview
  2. Docs
The MemCAD analyzer

Install

Dune Dependency

Authors

Maintainers

Sources

memcad-v1.1.0.tar.gz
sha512=7e37933fb3c2b67d166906f1ac38855b6819d4ce8f015f84b98012a513cf254eea5081b5d37fe69689dead565f8a1973418a023c9cee76426a5bdb93815c681f

Description

MemCAD is an abstract interpreter for shape analysis. MemCAD can analyze C programs manipulating complex data structures.

Published: 22 Oct 2021

README

MemCAD

MemCAD is a static analyzer to carry out shape analysis of C programs. This file states the main principles underlying the analysis and shows its basic use. The reference manual provides more detailed information about the parameterization of the analysis.

Presentation

MemCAD stands for Memory Composite Abstract Domain. It is an experimental tool for program verification, that focuses mainly on memory properties.

MemCAD is a static analyzer for C programs, specialized in the inference and verification of memory properties. It performs a forward analysis by abstract interpretation. More precisely, it starts from abstract pre-conditions and computes abstract post-conditions. Unless specified otherwise, the pre-condition assumes an empty memory state (no variable is defined) The analysis is designed to be sound, which means that the invariants it computes should over-approximate the concrete behaviors of the analyzed program (though, note that MemCAD is offered with no guarantee, even though the design intent is to ensure its soundness ---e.g., the analyzer may not be free of implementation bugs). For instance, when a new local variable is defined and before it is uninitialized, it is assumed to hold any possible value. To allow for the verification of a code fragment under some assumption (e.g., that a function is called in a state where some structure are already built and stored in memory), the analyzer features specific assumption commands.

During the analysis, MemCAD attempts to prove the absence of memory errors and assertion violations. In particular, it attempts to prove:

  • the absence of memory errors, such as the dereference of null or invalid pointers;

  • the absence of accesses (read or write) outside the bounds of memory blocks (that includes array out of bound errors);

  • that regular assertions are satisfied;

  • that structural assertions (that are described in a specific language) are also satisfied, which means that certain structural invariants hold.

The main use of MemCAD consists in verifying the preservation of structural assertions such as the preservation of well-formed dynamic structures such as lists, trees, possibly nested into arrays.

The analysis is conservative and may fail to prove properties that hold in all concrete executions. Messages reporting the analyzer cannot prove the correctness of some operations do not necessarily mean that there exists a concrete execution where they do not hold, or even that the analyzer cannot prove them under different settings. In many cases, they stem for imprecision in the analysis, that could either be solved using other settings, or require some deeper refinements of the abstract interpreter implemenation (addition of an abstract domain, refinement of abstract transfer functions and analysis algorithms). In the current version of the MemCAD implementation, shape properties (well formedness of structures with respect to purely pointer properties) and basic numerical properties (linear constraints) can be specified and are handled rather precisely. Also, set abstraction allow to verify the preservation of shared data structures. The analysis handles basic relations between pointer properties, numerical properties and sharing properties, but cannot handle precisely all such properties (the current version is not able to generalize numerical invariants on red-black trees, for instance).

The abstract domain of MemCAD is highly parametric and modular. The analyzer relies on a library of abstract domains that can often be enabled or disabled independently. Some domains require some parameterization, which can be performed either automatically or manually.

The MemCAD analyzer inputs C programs, and optional user-supplied description of data-structures of interest. The purpose of the data-structure descriptions is to parameterize the abstract domain of the analyzer. The data-structure descriptions can be inferred automatically when the structures involve no sharing or pointer relations (as in doubly-linked lists or trees with parent pointers). In the other hand, basic structures such as singly-linked trees or binary trees can be inferred from their type definitions. Not all C features are supported and the programs to analyze must be compilable by the clang compiler. Moreover, MemCAD does not aim at handling all features of the C programming language. The following C features are not currently supported by MemCAD, although support of several of these could be added easily (the main purpose of MemCAD is to design, implement and assess memory abstractions, not numerical abstractions):

  • floating point variables;

  • inlined assembly code;

  • recursive functions;

  • function pointers;

  • pointers used as arrays (though core of the analyzer could handle this feature well, the frontend and typing phase currently do not).

Basic use

We consider a very simple program manipulating singly-linked lists to illustrate a first analysis example, and assume that this example program is stored in a file named "list.c" located in the current directory:

struct elt* next;
  int data;
} elt;

typedef elt* list;

void main() {
  list l;
  int i = 0;
  _memcad("add_inductive(l,elt)");
  while(l != 0){
    l->data = i++;
    l = l->next;
  }
  _memcad("check_inductive(l,elt)");
}

First, let us note that the meta-command _memcad is used to control the analysis, and state assumptions and proof goals using a meta-syntax specific to the analyzer. Two instances of this command are used here:

  • the first one consists of the meta-command add_inductive and requires the analyzer to assume at the point where it appears the value stored in variable l satisfies some inductive predicate in separation logic named elt (this predicate describes a structure that can either be inferred from type definitions or set manually; more details on this predicate will be provided below);

  • the second one consists of the meta-command check_inductive and requires the analysis to prove that the value stored in variable l still satisfies the inductive predicate named elt.

Essentially, those two commands are used here so as to specify an abstract pre-condition (to assume) and an abstract post-condition (to prove).

Then, the analysis is launched by the command below:

./analyze -a -auto-ind list.c

The -a option instructs the executable to perform the analysis. The -auto-ind option activates automatic inference of the set of inductive definitions to be used for the analysis from the type definitions in file "list.c". This generates one inductive definition elt to be used as a parameter of the abstract domain before actually launching analysis. Thus, the execution is split into two phases:

  1. collection of inductive definitions computed from the type definitions (under the assumption that no sharing is used), and instantiation of the abstract domain; essentially, this phase computes an inductive definition in separation logic by a.elt ::= (emp, a=0) / (a.n -> b * a.d -> c * n.elt, a!=0) (internally, the analysis takes the memory layout into account and utilizes numeric offsets rather than field names)

  2. forward abstract interpretation of the program using this inductive definition as a parameter of the abstract domain, and starting from the entry point of the program with an empty memory state (no variable, and no heap space); this phase over-approximates soundly each variable definition and program statement; while doing so, the _memcad meta-commands respectively lets the analysis make a memory assumption (which would be satisfied if replaced by any code fragment that onstructs a valid structure, that can be described by elt), and requires it to verify a memory property.

Note that an automatically generated inductive definition is named after the name of the struct it is derived from. In our example, the struct is named elt, so elt must be used as the inductive name in both the add_inductive and in the check_inductive MemCAD commands.

The analyzer output will end like this:

[...]
Final status report (0.1901 sec spent)
- assertions proved:     1 (context)
- assertions not proved: 0 (context)

If the count on the line 'assertions not proved' is non zero, it means the program could not be verified (i.e., at least one memory assertion could not be proved).

The automatically generated inductive definition can be found in the file named 'auto.ind'.

Advance use in the documentation

The documentation in pdf format provides detailed information about more use cases, options, and the parameterization of the analysis.

Dependencies (13)

  1. setr >= "0.1.1"
  2. mlbdd >= "0.5"
  3. qtest
  4. parmap
  5. ounit
  6. dune >= "1.3"
  7. conf-graphviz
  8. clangml-transforms >= "0.26"
  9. clangml >= "4.4.0"
  10. bdd
  11. base-unix
  12. apron >= "v0.9.12"
  13. ocaml >= "4.08.0"

Dev Dependencies

None

Used by

None

Conflicts

None