package SZXX
Library
Module
Module type
Parameter
Class
Class type
Advanced parsing utilities: custom parser options and tools to stream huge documents
type node =
| Prologue of DOM.attr_list
| Element_open of {
tag : Base.string;
attrs : DOM.attr_list;
}
| Element_close of Base.string
| Text of Base.string
| Cdata of Base.string
| Nothing
| Many of node Base.list
val sexp_of_node : node -> Sexplib0.Sexp.t
type parser_options = {
accept_html_boolean_attributes : Base.bool;
(*Invalid XML but valid HTML:
*)<div attr1="foo" attr2>
But withaccept_html_boolean_attributes
set totrue
,attr2
will be"attr2"
accept_unquoted_attributes : Base.bool;
(*Invalid XML but valid HTML:
*)<div attr1="foo" attr2=bar>
But withaccept_unquoted_attributes
set totrue
,attr2
will be"bar"
accept_single_quoted_attributes : Base.bool;
(*Invalid XML but valid HTML:
*)<div attr1="foo" attr2='bar'>
But withaccept_unquoted_attributes
set totrue
,attr2
will be"bar"
batch_size : Base.int;
(*(Default:
*)20
) Performance optimization. Whenbatch_size
is greater than 1, the parser will prefer to returnMany list
where the length oflist
isbatch_size
.
}
val sexp_of_parser_options : parser_options -> Sexplib0.Sexp.t
val compare_parser_options : parser_options -> parser_options -> Base.int
val equal_parser_options : parser_options -> parser_options -> Base.bool
val default_parser_options : parser_options
HTML boolean attributes: true
. Anything else: false
.
val make_parser : parser_options -> node Angstrom.t
val parser : node Angstrom.t
IO-agnostic Angstrom.t
XML parser.
It is not fully spec-compliant, it does not attempt to validate character encoding or reject all incorrect documents. It does not process references. It does not automatically unescape XML escape sequences but SZXX.Xml.DOM.unescape
is provided to do so.
See README.md for examples on how to use it.
module Expert : sig ... end
For those who want finer-grained control and want to parse (using Angstrom) and fold (using this module) by hand.