patience_diff v0.16.0 · OCaml Package

type elt = string

val get_matching_blocks : 
  transform:('a -> elt) ->
  ?big_enough:int ->
  ?max_slide:int ->
  ?score:([ `left | `right ] -> 'a -> 'a -> int) ->
  prev:'a array ->
  next:'a array ->
  unit ->
  Patience_diff_lib__.Matching_block.t list

Get_matching_blocks not only aggregates the data from matches a b but also attempts to remove random, semantically meaningless matches ("semantic cleanup"). The value of big_enough governs how aggressively we do so. See get_hunks below for more details.

val matches : elt array -> elt array -> (int * int) list

matches a b returns a list of pairs (i,j) such that a.(i) = b.(j) and such that the list is strictly increasing in both its first and second coordinates. This is essentially a "unfolded" version of what get_matching_blocks returns. Instead of grouping the consecutive matching block using length this function would return all the pairs (prev_start * next_start).

val match_ratio : elt array -> elt array -> float

match_ratio ~compare a b computes the ratio defined as:

2 * len (matches a b) / (len a + len b)

It is an indication of how much alike a and b are. A ratio closer to 1.0 will indicate a number of matches close to the number of elements that can potentially match, thus is a sign that a and b are very much alike. On the next hand, a low ratio means very little match.

val get_hunks : 
  transform:('a -> elt) ->
  context:int ->
  ?big_enough:int ->
  ?max_slide:int ->
  ?score:([ `left | `right ] -> 'a -> 'a -> int) ->
  prev:'a array ->
  next:'a array ->
  unit ->
  'a Patience_diff_lib__.Hunk.t list

get_hunks ~transform ~context ~prev ~next will compare the arrays prev and next and produce a list of hunks. (The hunks will contain Same ranges of at most context elements.) Negative context is equivalent to infinity (producing a singleton hunk list). The value of big_enough governs how aggressively we try to clean up spurious matches, by restricting our attention to only matches of length less than big_enough. Thus, setting big_enough to a higher value results in more aggressive cleanup, and the default value of 1 results in no cleanup at all. When this function is called by Patdiff_core, the value of big_enough is 3 at the line level, and 7 at the word level.

The value of max_slide controls how far we are willing to shift a diff (which is immediately preceded/followed by the same lines as it ends/starts with). We choose between equivalent positions by maximising the sum of the score function applied to the two boundaries of the diff. By default, max_slide is 0. The arguments passed to score are firstly whether the boundary is at the start or end of the diff and then the values on either side of the boundary (if a boundary is considered at the start or end of the input, it gets a score of 100).

type 'a segment =

| Same of 'a array
| Different of 'a array array

type 'a merged_array = 'a segment list

val merge : elt array array -> elt merged_array

package patience_diff