package bare

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Binary Analysis Rule Engine (BARE).

The Binary Analysis Rule Engine is a production system (a forward chaining rule system) that is used in the Binary Analysis Platform, and features non-linear pattern matching and a stream based approach to the state of the world.

BARE defines the syntax and semantics of a rule specification language, and provides a simple pattern matching engine. The semantics of facts, as well as the semantics of the evaluation loop are left to concrete implementations of interpreters.

A BARE specification consists of a sequence of rules. Each rule is a pair. The first element, called the left hand side, is a list of tuples that represents patterns that should be matched with facts. The second element, called the right hand side, is a list of tuples that represents the list of facts that should be produced once all patterns on the left hand side match with the provided facts. A rule is applied to a stream of facts, if all patterns in a rule matches with some facts, then a rule is triggered and produces facts from the right hand side. A rule may have variables on the left hand side, that will bind to elements of a fact tuple that matched with a pattern. The same variable may occur in different patterns of the same rule, i.e., a pattern is not required to be linear. When the same variable occurs in two patterns of the same rule, then a constraint is imposed that this variable should be bound to the same term in both patterns. The order of the patterns in the left hand side doesn't matter, thus the semantics of rule doesn't change with the permutation of its patterns. The order of facts may matter, depending on a concrete interpretation of fact. If a rule is applied to different permutations of a list of facts it may produce different sequences of facts, though all derivable facts will be produced.

BARE represents tuples with S-expressions and variables with atoms that start with the question mark (?). A single question mark character represents a wild card variable that matches with everything but cannot be used on the right hand side of a rule. Since the right hand side of a rule may contain more than one pattern the engine will perform the join operation on a stream of events. Given that any join operation generates a number of matches that is in general a polynomial function of the number of patterns, a caution should be used by an expert who builds a system with complex rules that contains many patterns in the same rule. BARE can't apply any query optimization techniques on a rule, given the open and extensible notion of a fact. Thus BARE should be considered as a base representation for the pattern/action forward chanining systems, that shall perform all the necessary optimizations on their level.

Example

Suppose we want to process a stream of events that represent the behavior of some memory allocator. We expect the following kinds of events:

  • (acquire SITE PTR LEN) represents an allocation event, where SITE is an allocation site (i.e., an address of a program instruction that performs an allocation), PTR is a pointer to the allocated data, and LEN is the data size;
  • (release SITE PTR LEN) represents a memory deallocation event;
  • (violate SITE PTR KIND) represents a violation of a memory allocator invariants of the specified KIND.

Suppose we want to build a rule, that will report each violation as a tuple that contains a kind of the violation and all three sites, that are involved in the incident.

    (((acquire ?asite ?ptr ?)
      (release ?rsite ?ptr ?)
      (violate ?vsite ?ptr ?kind)

     ((?kind ?asite ?rsite ?vsite)))

This rule can be read as, for all acquire, release, and violate events that have the same PTR produce a fact of a violation of the specified kind that contains the program locations of these three events. For example, given the following sequence of events:

      (acquire L10 0xBAD 10)
      (release L20 0XBAD 10)
      (violate L30 0XBAD use-after-free)
      (violate L40 0xBAD double-free)
      (violate L50 0XBAD use-after-free)

The following facts will be produced if the rule is sequentially applied to the above sequence:

      (use-after-free L10 L20 L30)
      (double-free L10 L20 L40)
      (use-after-free L10 L20 L50)

Syntax

Concrete Syntax

A tuple is represented by an arbitrary S-expression. A variable is an atom that starts with the question mark. A special variable ? (one question mark symbol) may occur on the left hand side of a rule, and represents a freshly created variable. Every occurrence of the ? symbol represents a different variable. Both sides of a rule may be empty (represented by the 0-tuple ()).

    rule,r  ::= ((p1 .. pM) (f1 .. fN))
    sexp,s  ::= atom | (s1 .. sM)
    patt,p  ::= sexp
    fact,f  ::= sexp
    atom,a  ::= ?sequence-of-chars?

Example:

    (((duck ?name) (does ?action)) ((looks like ?name did ?action)))

This rule contains two patterns (duck ?name) and (does ?action) and one production fact (looks like ?name did ?action).

Abstract Syntax

A rule r is a tuple (P,F) where P is a sequence of patterns p1,...,pM, F is a sequence of facts f1,...,fN, and M,N are non-negative numbers. A pattern p and fact f are arbitrary n-tuples (terms) t, that contain atoms, other tuples, and free variables.

    rule,r ::= (B,P,F)
    P ::= p1,...
    F ::= f1,...
    B ::= v1->f,...
    V ::= v1,...
    T ::= t1,...
    term,p,f,s,t ::= v | a | (t,..,tM)
    atom,x,y,z ::= ?atom?
    v ::= ?variable?

Semantics

We use a regular sequent calculus (with all structural rules) to represent semantics. A rule with a sequence of patterns P matches with a sequence of facts with the given valuations, denoted as B,T |- P, if it can be proved using the rules below (plus regular rules of the sequent calculus). The (match-rule) rule, expands rules (eliminating tuples on the top level). The (match-tuple) rule further expands patterns until each pattern is either an atom, in that case it is proved by the axiom of identity, or to a variable, in that case a match can be proved with the (match-var) rule, that says that a variable matches a term only if it is bound to that term.

Note, that exchange and weakening rules of sequent calculus allows us to drop and rearrange facts. The latter makes the order of rules irrelevant. If for a sequence of facts, there are several valuations of variables that will lead to a match, then all these matches are provided. In other words, the engine will generate all derivable facts in an unspecified but consistent order.

     B,T |- p1 .. B,T |- pM
     ----------------------- (match-rule)
     B,T |- (p1,..,pM)


     B,t1,T |- s1 .. B,tM,T |- sM
     ---------------------------- (match-tuple)
     B,(t1,..,tM),T |- (s1,..,sM)


     B,T |- v->t
     ---------- (match-var)
     B,t,T |- v
type tuple = Core_kernel.Sexp.t

representation of a tuple

type fact = tuple

representation of a fact

module Rule : sig ... end

Matching rule specification.