package SZXX

  1. Overview
  2. Docs

Advanced parsing utilities: custom parser options and tools to stream huge documents

type node =
  1. | Prologue of DOM.attr_list
  2. | Element_open of {
    1. tag : Base.string;
    2. attrs : DOM.attr_list;
    }
  3. | Element_close of Base.string
  4. | Text of Base.string
  5. | Cdata of Base.string
  6. | Nothing
  7. | Many of node Base.list
val sexp_of_node : node -> Sexplib0.Sexp.t
val compare_node : node -> node -> Base.int
val equal_node : node -> node -> Base.bool
type parser_options = {
  1. accept_html_boolean_attributes : Base.bool;
    (*

    Invalid XML but valid HTML: <div attr1="foo" attr2> But with accept_html_boolean_attributes set to true, attr2 will be "attr2"

    *)
  2. accept_unquoted_attributes : Base.bool;
    (*

    Invalid XML but valid HTML: <div attr1="foo" attr2=bar> But with accept_unquoted_attributes set to true, attr2 will be "bar"

    *)
  3. accept_single_quoted_attributes : Base.bool;
    (*

    Invalid XML but valid HTML: <div attr1="foo" attr2='bar'> But with accept_unquoted_attributes set to true, attr2 will be "bar"

    *)
  4. batch_size : Base.int;
    (*

    (Default: 20) Performance optimization. When batch_size is greater than 1, the parser will prefer to return Many list where the length of list is batch_size.

    *)
}
val sexp_of_parser_options : parser_options -> Sexplib0.Sexp.t
val compare_parser_options : parser_options -> parser_options -> Base.int
val equal_parser_options : parser_options -> parser_options -> Base.bool
val default_parser_options : parser_options

HTML boolean attributes: true. Anything else: false.

val make_parser : parser_options -> node Angstrom.t
val parser : node Angstrom.t

IO-agnostic Angstrom.t XML parser.

It is not fully spec-compliant, it does not attempt to validate character encoding or reject all incorrect documents. It does not process references. It does not automatically unescape XML escape sequences but SZXX.Xml.DOM.unescape is provided to do so.

See README.md for examples on how to use it.

module Expert : sig ... end

For those who want finer-grained control and want to parse (using Angstrom) and fold (using this module) by hand.

OCaml

Innovation. Community. Security.