package pacomb

  1. Overview
  2. Docs

Module to build and parse list of words

type ('a, 'b) t

Type of a word list with 'a : the type of characters (typically, char for ascii or string for utf8) 'b : a value associated to each word

exception Already_bound

exception raise when multiple binding are added and not allowed

val create : ?unique:bool -> ?map:('a -> 'a) -> ?cs:Charset.t -> ?final_test:(Input.buffer -> Input.idx -> bool) -> unit -> ('a, 'b) t

Create a new empty table. The optional parameter unique defaults to true. Setting it to false with allow multiple identical bindings, creating ambiguous grammars. If unique is true, then adding multiple bindings will raise the exception Already_bound .

map is a function transforming character before addition (typically a case transformer or a unicode normalisation). (defaults to identity).

final_test will be called after parsing. It may be used typically to ensure that the next character is not alphanumeric. Defaults to an always passing test.

cs can be given as an optimisation. All words added should start with characters in this set.

val size : ('a, 'b) t -> int

Returns the number of bindings in the table

val reset : ('a, 'b) t -> unit

empty a table

add_ascii tbl s v adds a binding from s to v in tbl, keep all previous bindings.

val add_ascii : (char, 'b) t -> string -> 'b -> unit
val mem_ascii : (char, 'b) t -> string -> bool

mem_ascii tbl s tells if s if present in tbl. Typically used to reject identifiers that are keywords

val add_utf8 : (string, 'b) t -> string -> 'b -> unit

Same as above for a unicode string, which are splitted in graphemes

val mem_utf8 : (string, 'b) t -> string -> bool
val word : ?name:string -> (char, 'a) t -> 'a Grammar.t

Parses word from a dictionnary returning as action all the assiociated values (it is an ambiguous grammar if there is more than one value).

val utf8_word : ?name:string -> (string, 'a) t -> 'a Grammar.t
type 'a data
val save : ('a, 'b) t -> 'b data
val save_and_reset : ('a, 'b) t -> 'b data
val restore : ('a, 'b) t -> 'b data -> unit