Module type
Class type

Computing diffs on XML trees.

Algorithm adapted from Gregory Cobena's Phd. thesis: "Gestion des changements pour les données semi-structurées du Web (Change management of semi-structured data on the Web)"

type name = Xmlm.name

= string * string . Representation of tags and attribute names.

module Nmap : Map.S with type key = name

Mapping over names.

type 'a xmlt = [
  1. | `E of name * string Nmap.t * 'a list

    A node (tag, attributes, children)

  2. | `D of string

    CDATA leaf


XML tree. The type is parametrized because it is shared with an internal richer representation.

type xmltree = xmltree xmlt

Simple XML tree.

type label =
  1. | Node of string
  2. | Text of string
type patch_path =
  1. | Path_cdata of int

    Path_cdata n refers to the nth CData leaf.

  2. | Path_node of Xmlm.name * int * patch_path option

    Path_node (tag, n, more) refers to the nth element with tag tag. If more <> None then the path goes on into the children of the referenced node.


A path to a node in an XML tree where to perform an operation.

type position = [
  1. | `FirstChild
  2. | `After

When inserting or moving a node, this indicates whether to insert the tree as first node of the node corresponding to the given path, or as a right sibling.

type patch_operation =
  1. | PInsert of xmltree * position

    Insert the given XML tree to the given path and position.

  2. | PDelete

    Delete the referenced node.

  3. | PUpdateCData of string

    Change the referenced node to a CData with the given contents.

  4. | PUpdateNode of Xmlm.name * string Nmap.t

    Update the referenced node to be a tag with the given attributes.

  5. | PReplace of xmltree

    Replace the referenced node by the given tree.

  6. | PMove of patch_path * position

    Move the node to the given path and position. In this case, the position must be considered when the node has been removed from the tree (this is important when moving a node under the same parent).


The patch operations. Each operation is to be performed at a given node (position) in the tree, referenced by a patch_path.

type patch = (patch_path * patch_operation) list
val diff : ?cut:(name -> string Nmap.t -> xmltree list -> bool) -> xmltree -> xmltree -> patch

diff t1 t2 returns the patch p to change t1 into t2.

  • parameter cut

    is called on XML tree tag nodes; returning true means that that subnodes won't be compared separately, but the whole node will be compared as is. This is useful in case of big XML trees, otherwise a lot of comparisons are done between the two trees. For example, one can cut on "p", "pre", "ul" and "code" tags in a regular web page. Default behaviour is to cut nothing.

val diff_with_final_tree : ?cut:(name -> string Nmap.t -> xmltree list -> bool) -> xmltree -> xmltree -> patch * xmltree

Same as diff but returns also the patched tree, to be able to compare it with the given target tree (for testing purpose).


val xml_of_string : string -> xmltree
val xml_of_file : string -> xmltree
val atts_of_map : string Nmap.t -> (name * string) list
val atts_of_list : (name * string) list -> string Nmap.t
val string_of_xml : ?cut:bool -> xmltree -> string
val string_of_name : (string * string) -> string
val string_of_atts : string Nmap.t -> string
val string_of_path : patch_path -> string
val string_of_patch_operation : (patch_path * patch_operation) -> string
val string_of_patch : patch -> string