package pcre2

  1. Overview
  2. Docs

Perl Compatibility Regular Expressions for OCaml

7.5.1 - homepage

Exceptions
type error =
  1. | Partial
    (*

    String only matched the pattern partially

    *)
  2. | BadPattern of string * int
    (*

    BadPattern (msg, pos) regular expression is malformed. The reason is in msg, the position of the error in the pattern in pos.

    *)
  3. | BadUTF
    (*

    UTF string being matched is invalid

    *)
  4. | BadUTFOffset
    (*

    Gets raised when a UTF string being matched with offset is invalid.

    *)
  5. | MatchLimit
    (*

    Maximum allowed number of match attempts with backtracking or recursion is reached during matching. ALL FUNCTIONS CALLING THE MATCHING ENGINE MAY RAISE IT!!!

    *)
  6. | DepthLimit
  7. | WorkspaceSize
    (*

    Raised by pcre2_dfa_match when the provided workspace array is too small. See documention on pcre2_dfa_match for details on workspace array sizing.

    *)
  8. | InternalError of string
    (*

    InternalError msg C-library exhibits unknown/undefined behaviour. The reason is in msg.

    *)
exception Error of error

Exception indicating PCRE errors.

exception Backtrack

Backtrack used in callout functions to force backtracking.

exception Regexp_or of string * error

Regexp_or (pat, error) gets raised for sub-pattern pat by regexp_or if it failed to compile.

Compilation and runtime flags and their conversion functions
type icflag

Internal representation of compilation flags

and irflag

Internal representation of runtime flags

and cflag = [
  1. | `ALLOW_EMPTY_CLASS
    (*

    Allow empty classes

    *)
  2. | `ALT_BSUX
    (*

    Alternative handling of \u, \U, and \x

    *)
  3. | `ALT_CIRCUMFLEX
    (*

    Alternative handling of ^ in multiline mode

    *)
  4. | `ALT_VERBNAMES
    (*

    Process backslashes in verb names

    *)
  5. | `ANCHORED
    (*

    Pattern matches only at start of string

    *)
  6. | `AUTO_CALLOUT
    (*

    Automatically inserts callouts with id 255 before each pattern item

    *)
  7. | `CASELESS
    (*

    Case insensitive matching

    *)
  8. | `DOLLAR_ENDONLY
    (*

    '$' in pattern matches only at end of string

    *)
  9. | `DOTALL
    (*

    '.' matches all characters (newlines, too)

    *)
  10. | `DUPNAMES
    (*

    Allow duplicate names for subpatterns

    *)
  11. | `ENDANCHORED
    (*

    Pattern can match only at end of subject

    *)
  12. | `EXTENDED
    (*

    Ignores whitespace and PERL-comments. Behaves like the '/x'-option in PERL

    *)
  13. | `EXTENDED_MORE
  14. | `FIRSTLINE
    (*

    Unanchored patterns must match before/at first NL

    *)
  15. | `LITERAL
    (*

    Pattern characters are all literal

    *)
  16. | `MATCH_INVALID_UTF
    (*

    Enable support for matching invalid UTF

    *)
  17. | `MATCH_UNSET_BACKREF
    (*

    Match unset backreferences

    *)
  18. | `MULTILINE
    (*

    '^' and '$' match before/after newlines, not just at the beginning/end of a string

    *)
  19. | `NEVER_BACKSLASH_C
    (*

    Lock out the use of \C in patterns

    *)
  20. | `NEVER_UCP
    (*

    Lock out UCP, e.g. via (\*UCP)

    *)
  21. | `NEVER_UTF
    (*

    Lock out UTF, e.g. via (\*UTF)

    *)
  22. | `NO_AUTO_CAPTURE
    (*

    Disables the use of numbered capturing parentheses

    *)
  23. | `NO_AUTO_POSSESS
    (*

    Disable auto-possessification

    *)
  24. | `NO_DOTSTAR_ANCHOR
    (*

    Disable automatic anchoring for .*

    *)
  25. | `NO_START_OPTIMIZE
    (*

    Disable match-time start optimizations

    *)
  26. | `NO_UTF_CHECK
    (*

    Do not check the pattern for UTF validity (only relevant if UTF is set) WARNING: with this flag enabled, invalid UTF strings may cause a crash, loop, or give incorrect results

    *)
  27. | `UCP
    (*

    Use Unicode properties for \d, \w, etc.

    *)
  28. | `UNGREEDY
    (*

    Quantifiers not greedy anymore, only if followed by '?'

    *)
  29. | `USE_OFFSET_LIMIT
    (*

    Enable offset limit for unanchored matching

    *)
  30. | `UTF
    (*

    Treat pattern and subjects as UTF strings

    *)
]

Compilation flags

val cflags : cflag list -> icflag

cflags cflag_list converts a list of compilation flags to their internal representation.

val cflag_list : icflag -> cflag list

cflag_list cflags converts internal representation of compilation flags to a list.

type rflag = [
  1. | `ANCHORED
    (*

    Match only at the first position

    *)
  2. | `COPY_MATCHED_SUBJECT
    (*

    On success, make a private subject copy

    *)
  3. | `DFA_RESTART
    (*

    Causes matching to proceed presuming the subject string is further to one partially matched previously using the same int-array working set. May only be used with pcre2_dfa_match or unsafe_pcre2_dfa_match, and should always be paired with `PARTIAL.

    *)
  4. | `DFA_SHORTEST
    (*

    Return only the shortest match

    *)
  5. | `ENDANCHORED
    (*

    Pattern can match only at end of subject

    *)
  6. | `NOTBOL
    (*

    Beginning of string is not treated as beginning of line

    *)
  7. | `NOTEOL
    (*

    End of string is not treated as end of line

    *)
  8. | `NOTEMPTY
    (*

    An empty string is not a valid match

    *)
  9. | `NOTEMPTY_ATSTART
    (*

    An empty string at the start of the subject is not a valid match

    *)
  10. | `NO_JIT
    (*

    Do not use JIT matching

    *)
  11. | `NO_UTF_CHECK
    (*

    Do not check the subject for UTF validity (only relevant if PCRE2_UTF was set at compile time)

    *)
  12. | `PARTIAL_HARD
    (*

    Throw Pcre2.Partial for a partial match even if there is a full match

    *)
  13. | `PARTIAL_SOFT
    (*

    Throw Pcre2.Partial for a partial match if no full matches are found

    *)
]

Runtime flags

val rflags : rflag list -> irflag

rflags rflag_list converts a list of runtime flags to their internal representation.

val rflag_list : irflag -> rflag list

rflag_list rflags converts internal representation of runtime flags to a list.

Information on the PCRE2-configuration (build-time options)
val version : string

Version information

Version of the PCRE2-C-library

val config_unicode : bool

Indicates whether unicode support is enabled

val config_newline : char

Character used as newline

Number of bytes used for internal linkage of regular expressions

val config_match_limit : int

Default limit for calls to internal matching function

val config_depth_limit : int

Default limit for depth of nested backtracking

val config_stackrecurse : bool

Indicates use of stack recursion in matching function

Information on patterns

type firstcodeunit_info = [
  1. | `Char of char
    (*

    Fixed first character

    *)
  2. | `Start_only
    (*

    Pattern matches at beginning and end of newlines

    *)
  3. | `ANCHORED
    (*

    Pattern is anchored

    *)
]

Information on matching of "first chars" in patterns

type regexp

Compiled regular expressions

val options : regexp -> icflag

options regexp

  • returns

    compilation flags of regexp.

val size : regexp -> int

size regexp

  • returns

    memory size of regexp.

val capturecount : regexp -> int

capturecount regexp

  • returns

    number of capturing subpatterns in regexp.

val backrefmax : regexp -> int

backrefmax regexp

  • returns

    number of highest backreference in regexp.

val namecount : regexp -> int

namecount regexp

  • returns

    number of named subpatterns in regexp.

val nameentrysize : regexp -> int

nameentrysize regexp

  • returns

    size of longest name of named subpatterns in regexp + 3.

val names : regexp -> string array

names regex

  • returns

    array of names of named substrings in regexp.

val firstcodeunit : regexp -> firstcodeunit_info

firstcodeunit regexp

  • returns

    firstcodeunit info on regexp.

val lastcodeunit : regexp -> char option

lastcodeunit regexp

  • returns

    some last matching character of regexp if available, None otherwise.

val get_stringnumber : regexp -> string -> int

get_stringnumber rex name

  • returns

    the index of the named substring name in regular expression rex. This index can then be used with get_substring.

  • raises Invalid_arg

    if there is no such named substring.

get_match_limit rex

  • returns

    some match limit of regular expression rex or None.

get_depth_limit rex

  • returns

    some depth limit of regular expression rex or None.

Compilation of patterns
type chtables

Alternative set of char tables for pattern matching

val maketables : unit -> chtables

Generates new set of char tables for the current locale.

val regexp : ?limit:int -> ?depth_limit:int -> ?iflags:icflag -> ?flags:cflag list -> ?chtables:chtables -> string -> regexp

regexp ?limit ?depth_limit ?iflags ?flags ?chtables pattern compiles pattern with flags when given, with iflags otherwise, and with char tables chtables. If limit is specified, this sets a limit to the amount of recursion and backtracking (only lower than the builtin default!). If this limit is exceeded, MatchLimit will be raised during matching.

  • parameter limit

    default = no extra limit other than default

  • parameter depth_limit

    default = no extra depth_limit other than default

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter chtables

    default = builtin char tables

  • returns

    the regular expression.

    For detailed documentation on how you can specify PERL-style regular expressions (= patterns), please consult the PCRE2-documentation ("man pcre2pattern") or PERL-manuals.

val regexp_or : ?limit:int -> ?depth_limit:int -> ?iflags:icflag -> ?flags:cflag list -> ?chtables:chtables -> string list -> regexp

regexp_or ?limit ?depth_limit ?iflags ?flags ?chtables patterns like regexp, but combines patterns as alternatives (or-patterns) into one regular expression.

val quote : string -> string

quote str

  • returns

    the quoted string of str.

Subpattern extraction
type substrings

Information on substrings after pattern matching

val get_subject : substrings -> string

get_subject substrings

  • returns

    the subject string of substrings.

val num_of_subs : substrings -> int

num_of_subs substrings

  • returns

    number of strings in substrings (whole match inclusive).

val get_substring : substrings -> int -> string

get_substring substrings n

  • returns

    the nth substring (0 is whole match) of substrings.

  • raises Not_found

    if the corresponding subpattern did not capture a substring.

val get_substring_ofs : substrings -> int -> int * int

get_substring_ofs substrings n

  • returns

    the offset tuple of the nth substring of substrings (0 is whole match).

  • raises Not_found

    if the corresponding subpattern did not capture a substring.

val get_substrings : ?full_match:bool -> substrings -> string array

get_substrings ?full_match substrings

  • returns

    the array of substrings in substrings. It includes the full match at index 0 when full_match is true, the captured substrings only when it is false. If a subpattern did not capture a substring, the empty string is returned in the corresponding position instead.

  • parameter full_match

    default = true

val get_opt_substrings : ?full_match:bool -> substrings -> string option array

get_opt_substrings ?full_match substrings

  • returns

    the array of optional substrings in substrings. It includes Some full_match_str at index 0 when full_match is true, Some captured_substrings only when it is false. If a subpattern did not capture a substring, None is returned in the corresponding position instead.

  • parameter full_match

    default = true

val get_named_substring : regexp -> string -> substrings -> string

get_named_substring rex name substrings

  • returns

    the named substring name in regular expression rex and substrings.

  • raises Not_found

    if the corresponding subpattern did not capture a substring.

val get_named_substring_ofs : regexp -> string -> substrings -> int * int

get_named_substring_ofs rex name substrings

  • returns

    the offset tuple of the named substring name in regular expression rex and substrings.

  • raises Not_found

    if the corresponding subpattern did not capture a substring.

Callouts
type callout_data = {
  1. callout_number : int;
    (*

    Callout number

    *)
  2. substrings : substrings;
    (*

    Substrings matched so far

    *)
  3. start_match : int;
    (*

    Subject start offset of current match attempt

    *)
  4. current_position : int;
    (*

    Subject offset of current match pointer

    *)
  5. capture_top : int;
    (*

    Number of the highest captured substring so far

    *)
  6. capture_last : int;
    (*

    Number of the most recently captured substring

    *)
  7. pattern_position : int;
    (*

    Offset of next match item in pattern string

    *)
  8. next_item_length : int;
    (*

    Length of next match item in pattern string

    *)
}
type callout = callout_data -> unit

Type of callout functions

Callouts are referred to in patterns as "(?Cn)" where "n" is a callout_number ranging from 0 to 255. Substrings captured so far are accessible as usual via substrings. You will have to consider capture_top and capture_last to know about the current state of valid substrings.

By raising exception Backtrack within a callout function, the user can force the pattern matching engine to backtrack to other possible solutions. Other exceptions will terminate matching immediately and return control to OCaml.

Matching of patterns and subpattern extraction
val pcre2_match : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> string -> int array

pcre2_match ?iflags ?flags ?rex ?pat ?pos ?callout subj

  • returns

    an array of offsets that describe the position of matched subpatterns in the string subj starting at position pos with pattern pat when given, regular expression rex otherwise. The array also contains additional workspace needed by the match engine. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

val pcre2_dfa_match : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> ?workspace:int array -> string -> int array

pcre2_dfa_match ?iflags ?flags ?rex ?pat ?pos ?callout ?workspace subj invokes the "alternative" DFA matching function.

  • returns

    an array of offsets that describe the position of matched subpatterns in the string subj starting at position pos with pattern pat when given, regular expression rex otherwise. The array also contains additional workspace needed by the match engine. Uses flags when given, the precompiled iflags otherwise. Requires a sufficiently-large workspace array. Callouts are handled by callout.

    Note that the returned array of offsets are quite different from those returned by pcre2_match et al. The motivating use case for the DFA match function is to be able to restart a partial match with N additional input segments. Because the match function/workspace does not store segments seen previously, the offsets returned when a match completes will refer only to the matching portion of the last subject string provided. Thus, returned offsets from this function should not be used to support extracting captured submatches. If you need to capture submatches from a series of inputs incrementally matched with this function, you'll need to concatenate those inputs that yield a successful match here and re-run the same pattern against that single subject string.

    Aside from an absolute minimum of 20, PCRE does not provide any guidance regarding the size of workspace array needed by any given pattern. Therefore, it is wise to appropriately handle the possible WorkspaceSize error. If raised, you can allocate a new, larger workspace array and begin the DFA matching process again.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

  • parameter workspace

    default = fresh array of length 20

  • raises Not_found

    if the pattern match has failed

  • raises Error

    Partial if the pattern has matched partially; a subsequent exec call with the same pattern and workspace (adding the DFA_RESTART flag) be made to either further advance or complete the partial match.

  • raises Error

    WorkspaceSize if the workspace array is too small to accommodate the DFA state required by the supplied pattern

val exec : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> string -> substrings

exec ?iflags ?flags ?rex ?pat ?pos ?callout subj

  • returns

    substring information on string subj starting at position pos with pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

val exec_all : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> string -> substrings array

exec_all ?iflags ?flags ?rex ?pat ?pos ?callout subj

  • returns

    an array of substring information of all matching substrings in string subj starting at position pos with pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

val next_match : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> substrings -> substrings

next_match ?iflags ?flags ?rex ?pat ?pos ?callout substrs

  • returns

    substring information on the match that follows on the last match denoted by substrs, jumping over pos characters (also backwards!), using pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

  • raises Invalid_arg

    if pos let matching start outside of the subject string.

val extract : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?full_match:bool -> ?callout:callout -> string -> string array

extract ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj

  • returns

    the array of substrings that match subj starting at position pos, using pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. It includes the full match at index 0 when full_match is true, the captured substrings only when it is false. Callouts are handled by callout. If a subpattern did not capture a substring, the empty string is returned in the corresponding position instead.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter full_match

    default = true

  • parameter callout

    default = ignore callouts

val extract_opt : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?full_match:bool -> ?callout:callout -> string -> string option array

extract_opt ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj

  • returns

    the array of optional substrings that match subj starting at position pos, using pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. It includes Some full_match_str at index 0 when full_match is true, Some captured-substrings only when it is false. Callouts are handled by callout. If a subpattern did not capture a substring, None is returned in the corresponding position instead.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter full_match

    default = true

  • parameter callout

    default = ignore callouts

val extract_all : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?full_match:bool -> ?callout:callout -> string -> string array array

extract_all ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj

  • returns

    an array of arrays of all matching substrings that match subj starting at position pos, using pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. It includes the full match at index 0 of the extracted string arrays when full_match is true, the captured substrings only when it is false. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter full_match

    default = true

  • parameter callout

    default = ignore callouts

val extract_all_opt : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?full_match:bool -> ?callout:callout -> string -> string option array array

extract_all_opt ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj

  • returns

    an array of arrays of all optional matching substrings that match subj starting at position pos, using pattern pat when given, regular expression rex otherwise. Uses flags when given, the precompiled iflags otherwise. It includes Some full_match_str at index 0 of the extracted string arrays when full_match is true, Some captured_substrings only when it is false. Callouts are handled by callout. If a subpattern did not capture a substring, None is returned in the corresponding position instead.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter full_match

    default = true

  • parameter callout

    default = ignore callouts

val pmatch : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> string -> bool

pmatch ?iflags ?flags ?rex ?pat ?pos ?callout subj

  • returns

    true if subj is matched by pattern pat when given, regular expression rex otherwise, starting at position pos. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

String substitution
type substitution

Information on substitution patterns

val subst : string -> substitution

subst str converts the string str representing a substitution pattern to the internal representation

The contents of the substitution string str can be normal text mixed with any of the following (mostly as in PERL):

  • $[0-9]+ - a "$" immediately followed by an arbitrary number. "$0" stands for the name of the executable, any other number for the n-th backreference.
  • $& - the whole matched pattern
  • $` - the text before the match
  • $' - the text after the match
  • $+ - the last group that matched
  • $$ - a single "$"
  • $! - delimiter which does not appear in the substitution. Can be used to part "$0-9+" from an immediately following other number.
val replace : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?itempl:substitution -> ?templ:string -> ?callout:callout -> string -> string

replace ?iflags ?flags ?rex ?pat ?pos ?itempl ?templ ?callout subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the substitution string templ when given, itempl otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter itempl

    default = empty string

  • parameter templ

    default = ignored

  • parameter callout

    default = ignore callouts

  • raises Failure

    if there are backreferences to nonexistent subpatterns.

val qreplace : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?templ:string -> ?callout:callout -> string -> string

qreplace ?iflags ?flags ?rex ?pat ?pos ?templ ?callout subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the string templ. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter templ

    default = ignored

  • parameter callout

    default = ignore callouts

val substitute_substrings : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> subst:(substrings -> string) -> string -> string

substitute_substrings ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the substrings of the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

val substitute : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> subst:(string -> string) -> string -> string

substitute ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

val replace_first : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?itempl:substitution -> ?templ:string -> ?callout:callout -> string -> string

replace_first ?iflags ?flags ?rex ?pat ?pos ?itempl ?templ ?callout subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the substitution string templ when given, itempl otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter itempl

    default = empty string

  • parameter templ

    default = ignored

  • parameter callout

    default = ignore callouts

  • raises Failure

    if there are backreferences to nonexistent subpatterns.

val qreplace_first : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?templ:string -> ?callout:callout -> string -> string

qreplace_first ?iflags ?flags ?rex ?pat ?pos ?templ ?callout subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the string templ. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter templ

    default = ignored

  • parameter callout

    default = ignore callouts

val substitute_substrings_first : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> subst:(substrings -> string) -> string -> string

substitute_substrings_first ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the substrings of the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

val substitute_first : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?callout:callout -> subst:(string -> string) -> string -> string

substitute_first ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter callout

    default = ignore callouts

Splitting
val split : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?max:int -> ?callout:callout -> string -> string list

split ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj splits subj into a list of at most max strings, using as delimiter pattern pat when given, regular expression rex otherwise, starting at position pos. Uses flags when given, the precompiled iflags otherwise. If max is zero, trailing empty fields are stripped. If it is negative, it is treated as arbitrarily large. If neither pat nor rex are specified, leading whitespace will be stripped! Should behave exactly as in PERL. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter max

    default = 0

  • parameter callout

    default = ignore callouts

val asplit : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?max:int -> ?callout:callout -> string -> string array

asplit ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj same as Pcre2.split but

  • returns

    an array instead of a list.

type split_result =
  1. | Text of string
    (*

    Text part of split string

    *)
  2. | Delim of string
    (*

    Delimiter part of split string

    *)
  3. | Group of int * string
    (*

    Subgroup of matched delimiter (subgroup_nr, subgroup_str)

    *)
  4. | NoGroup
    (*

    Unmatched subgroup

    *)

Result of a Pcre2.full_split

val full_split : ?iflags:irflag -> ?flags:rflag list -> ?rex:regexp -> ?pat:string -> ?pos:int -> ?max:int -> ?callout:callout -> string -> split_result list

full_split ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj splits subj into a list of at most max elements of type "split_result", using as delimiter pattern pat when given, regular expression rex otherwise, starting at position pos. Uses flags when given, the precompiled iflags otherwise. If max is zero, trailing empty fields are stripped. If it is negative, it is treated as arbitrarily large. Should behave exactly as in PERL. Callouts are handled by callout.

  • parameter iflags

    default = no extra flags

  • parameter flags

    default = ignored

  • parameter rex

    default = matches whitespace

  • parameter pat

    default = ignored

  • parameter pos

    default = 0

  • parameter max

    default = 0

  • parameter callout

    default = ignore callouts

Additional convenience functions
val foreach_line : ?ic:in_channel -> (string -> unit) -> unit

foreach_line ?ic f applies f to each line in inchannel ic until the end-of-file is reached.

  • parameter ic

    default = stdin

val foreach_file : string list -> (string -> in_channel -> unit) -> unit

foreach_file filenames f opens each file in the list filenames for input and applies f to each filename and the corresponding channel. Channels are closed after each operation (even when exceptions occur - they get reraised afterwards!).

UNSAFE STUFF - USE WITH CAUTION!
val unsafe_pcre2_match : irflag -> regexp -> pos:int -> subj_start:int -> subj:string -> int array -> callout option -> unit

unsafe_pcre2_match flags rex ~pos ~subj_start ~subj offset_vector callout. You should read the C-source to know what happens. If you do not understand it - don't use this function!

val make_ovector : regexp -> int * int array

make_ovector regexp calculates the tuple (subgroups2, ovector) which is the number of subgroup offsets and the offset array.

val unsafe_pcre2_dfa_match : irflag -> regexp -> pos:int -> subj_start:int -> subj:string -> int array -> callout option -> workspace:int array -> unit

unsafe_pcre2_dfa_match flags rex ~pos ~subj_start ~subj offset_vector callout ~workpace. You should read the C-source to know what happens. If you do not understand it - don't use this function!

OCaml

Innovation. Community. Security.