package regenerate

  1. Overview
  2. Docs

Regenerate is a library to generate test cases for regular expression engines.

Here is a typical use of the library, for creating a test harness with QCheck.

let test =
  (* The alphabet is [abc] *)
  let alphabet = ['a'; 'b'; 'c'] in

  (* Words are made of regular strings. *)
  let module Word = Regenerate.Word.String in

  (* Streams are made of ThunkLists. *)
  let module Stream = Regenerate.Segments.ThunkList(Word) in

  let generator =
    Regenerate.arbitrary
      (module Word)
      (module Stream)
      ~compl:false (* Do not generate complement operators. *)
      ~pp:Fmt.char (* Printer for characters. *)
      ~samples:100 (* We want on average 100 samples for each regex. *)
      alphabet
  in

  QCheck.Test.make generator check (* Test the [check] function. *)
type 'a regex = 'a Regex.t

The type of regular expressions on characters of type 'a.

val arbitrary : (module Word.S with type char = 'char and type t = 'word) -> (module Segments.S with type elt = 'word) -> ?skip:int -> compl:bool -> pp:'char Fmt.t -> samples:int -> 'char list -> ('char Regex.t * 'word list * 'word list) QCheck.arbitrary

Regenerate.arbitrary (module W) (module S) ~compl ~pp ~samples alpha creates a QCheck generator that generates triples containing a regex and a list of positive and negative samples.

  • parameter W

    is a module implementing operation on words. See Word for some predefined modules.

  • parameter S

    is a module implementing a data-structure enumerating words. See Segments for some predefined modules. We recommend Segments.ThunkList.

  • parameter skip

    specifies how many samples should we skip on average. Default is 8.

  • parameter compl

    specifies if we generate regex containing the complement operator.

  • parameter pp

    specifies how to print individual characters.

  • parameter alpha

    describes the alphabet as a list of characters.

val parse : string -> (char regex, [> `Not_supported | `Parse_error ]) result

Regenerate.parse s returns the associated regex. It recognizes the Posix Extended Regular Expression syntax plus complement (~a) and intersection (a&b). Character classes are not supported.

module Regex : sig ... end

Definition of Regular expressions and associated utilities.

module Word : sig ... end

Generic definitions of words on which regular expression can match.

module Segments = Segments

Streaming data-structures that will contain the generate samples.

Functorial API

This API allows full access to generators and regular operators. For casual use of the library, consider using arbitrary instead.

module type SIGMA = sig ... end
module Make (Word : Word.S) (Segment : Segments.S with type elt = Word.t) (Sigma : SIGMA with type t = Segment.t) : sig ... end

Regenerate.Make(W)(S)(A) is a module that implements sample generation for words implemented by the module W with the alphabet A. S describes the data structure used internally for the enumeration of words.