package pacomb

  1. Overview
  2. Docs

Main module of Pacomb

Type

type 'a grammar

type of a grammar with semantical action of type 'a .

type 'a t = 'a grammar

An abbreviation

type name_kind =
  1. | Created
  2. | Inherited
  3. | Given
type name = string * name_kind

Grammar contructors

All construœqctors can be given an optional name argument that is used when printing the grammar.

val print_grammar : ?no_other:bool -> ?def:bool -> Stdlib.out_channel -> 'a grammar -> unit

print_grammar ch g prints the grammar g of the given output channel. if def=false (the default is true) it will print the transformed grammar prior to compilation.

val fail : ?name:string -> unit -> 'a grammar

fail () is a grammar that parses nothing (always fails)

val error : ?name:string -> string list -> 'a grammar

fails reporting an error

val empty : ?name:string -> 'a -> 'a grammar

empty a accepts the empty input and returns a

val cond : ?name:string -> bool -> unit grammar

test b is if b then empty () else fail (). Very usefull in grammar family at the beginning of a rule. The test is done at grammar construction, not at parsing time (except if it is used in a dependant grammar).

val term : ?name:string -> 'a Lex.terminal -> 'a grammar

term t accepts the terminal t and returns its semantics. See module Lex

val appl : ?name:string -> 'a grammar -> ('a -> 'b) -> 'b grammar

appl g f parses with g and apply f to the resulting semantics

val unmerge : ?name:string -> 'a list grammar -> 'a grammar

unmerge g introduce multiple parse branch from a list of semantics. Allows to create ambiguous terminals and allows fr unmerge to continue a dependent parsing

val lazy_ : ?name:string -> 'a grammar -> 'a lazy_t grammar
val force : ?name:string -> 'a lazy_t grammar -> 'a grammar
val alt : ?name:string -> 'a grammar list -> 'a grammar

alt [g1;g2;...;gn] parses with g1 and if it fails then g2 and so on

val seq : ?name:string -> 'a grammar -> ('a -> 'b) grammar -> 'b grammar

seq g1 g2 parses with g1 and then with g2 for the rest of the input, combine both semantics by apply the semantics of g2 to g1

val iseq : ?name:string -> 'a grammar -> 'b grammar -> 'b grammar

variation of the abover when we do not use all semantics, it allows cost less right recursion with no semantics

val dseq : ?name:string -> ('a * 'b) grammar -> ('a -> ('b -> 'c) grammar) -> 'c grammar

dseq g1 g2) is a dependant sequence, the grammar g2 used after g1 may depend upon the semantics of g1. This is not very efficient as the grammar g2 must be compiled at parsing time. g2 is memoized by default to partially overcome this fact.

val diseq : ?name:string -> 'a grammar -> ('a -> 'b grammar) -> 'b grammar

variation of the abover when we do not use all semantics, it allows cost less right recursion with no semantics

val lpos : ?name:string -> (Pos.spos -> 'a) grammar -> 'a grammar

lpos g is identical to g but passes the position just before parsing with g to the semantical action of g

val rpos : ?name:string -> (Pos.spos -> 'a) grammar -> 'a grammar

rpos g is identical to g but passes the position just after parsing with g to the semantical action of g

val mk_pos : ?name:string -> (Pos.pos -> 'a) grammar -> 'a grammar
val seq_pos : ?name:string -> 'a grammar -> ((Pos.pos * 'a) -> 'b) grammar -> 'b grammar

variants of seq with the position of the first iterm

val dseq_pos : ?name:string -> ('a * 'b) grammar -> ('a -> ((Pos.pos * 'b) -> 'c) grammar) -> 'c grammar

variants of dseq with the position of the first iterm

val cache : ?name:string -> ?merge:('a -> 'a -> 'a) -> 'a grammar -> 'a grammar

cache g avoids to parse twice the same input with g by memoizing the result of the first parsing. The optional merge parameter is applied to group semantics corresponding to the same part of the input. Using cache with merge allows to recover a polynomial time complexity (cubic at worst) and a quadratic space (in the size of the input)

val test_before : ?name:string -> (Lex.buf -> Lex.idx -> Lex.buf -> Lex.idx -> bool) -> 'a grammar -> 'a grammar

allows to perform a test, the test function receive the position before and after the blanks

val test_after : ?name:string -> ('a -> Lex.buf -> Lex.idx -> Lex.buf -> Lex.idx -> bool) -> 'a grammar -> 'a grammar
val no_blank_before : ?name:string -> 'a grammar -> 'a grammar

particular cases of the above testing the absence of blanks.

val no_blank_after : ?name:string -> 'a grammar -> 'a grammar
val layout : ?name:string -> ?config:Blank.layout_config -> Blank.t -> 'a grammar -> 'a grammar

layout b g changes the blank function to parse the input with the grammar g. The optional parameters allow to control which blanks are used at the boundary. Both can be used in which case the new blanks are used second before parsing with g and first after.

val option : ?name:string -> 'a grammar -> 'a option grammar

usual option/star/plus combinator

val default_option : ?name:string -> 'a -> 'a grammar -> 'a grammar
val star : ?name:string -> 'a grammar -> 'a list grammar
val plus : ?name:string -> 'a grammar -> 'a list grammar
val star_sep : ?name:string -> 'b grammar -> 'a grammar -> 'a list grammar
val plus_sep : ?name:string -> 'b grammar -> 'a grammar -> 'a list grammar

Definition of recursive grammars

val declare_grammar : string -> 'a grammar

to define recursive grammars, one may declare the grammar first and then gives its value. declare_grammar name creates an undefined grammar with the given name

val set_grammar : 'a grammar -> 'a grammar -> unit

set_grammar g1 g2 set the value of g1 declared with declare_grammar. will raise Invalid_argument if g1 was not defined using declare_grammar or if it was already set.

val fixpoint : ?name:string -> ('a grammar -> 'a grammar) -> 'a grammar

fixpoint g compute the fixpoint of g, that is a grammar g0 such that g0 = g g0

val grammar_family : ?param_to_string:('a -> string) -> string -> ('a -> 'b grammar) * (('a -> 'b grammar) -> unit)

grammar_family to_str name returns a pair (gs, set_gs), where gs is a finite family of grammars parametrized by a value of type 'a. A name name is to be provided for the family, and an optional function to_str can be provided to print the parameter and display better error messages.

(* Declare the grammar family *)
let (gr, set_gr) = grammar_family to_str name in

... code using grammars of gr to define mutually recursive grammars ...

(* Define the grammar family *)
let _ = set_gr the_grammars

... now the new family can be used ...

Compilation of a grammar and various

val compile : 'a grammar -> 'a Comb.t

compile g produces a combinator that can be used to actually do the parsing see the Comb module

val grammar_name : 'a grammar -> string

gives the grammar name

val give_name : string -> 'a grammar -> 'a grammar

allows to rename a grammar

val parse_buffer : 'a grammar -> Blank.t -> Lex.buf -> Lex.idx -> 'a

Parse a whole input buffer. the eof combinator is added at the end of the given combinator

val partial_parse_buffer : 'a grammar -> Blank.t -> ?blank_after:bool -> Lex.buf -> Lex.idx -> 'a * Lex.buf * Lex.idx

Partial parsing. Beware, the returned position is not the maximum position that can be reached by the grammar it the grammar is ambiguous. In this case, a message is printed on stderr. The charset is the character accepted at the end of input. Mainly useful with 'eof' when blank_after is true.

val parse_all_buffer : 'a grammar -> Blank.t -> Lex.buf -> Lex.idx -> 'a list

Returns all possible parse trees. Usefull for natural languages but also to debug ambiguity in a supposed non ambiguous grammar.

val parse_string : ?utf8:Utf8.context -> 'a grammar -> Blank.t -> string -> 'a

Parse a whole string, reporting position according to utf8 if optional argument utf8 is given and Utf8.UTF8 or Utf8.CJK_UTF8

val parse_channel : ?utf8:Utf8.context -> ?filename:string -> 'a grammar -> Blank.t -> Stdlib.in_channel -> 'a

Parse a whole input channel, reporting postiion according to utf8. After closing the file position reporting by parsing cannot be transformed bash to line/column number.

val parse_fd : ?utf8:Utf8.context -> ?filename:string -> 'a grammar -> Blank.t -> Unix.file_descr -> 'a

Parse a whole Unix.file_desc, reporting postiion according to utf8. After closing the file position reporting by parsing cannot be transformed bash to line/column number.

val parse_file : ?utf8:Utf8.context -> 'a grammar -> Blank.t -> string -> 'a

Parse a whole file, reporting postiion according to utf8. File is reopen to read position. So the file should not change on disk