package prbnmcn-clustering

  1. Overview
  2. Docs

K-means functor.

module type Element = sig ... end
type init =
  1. | Forgy
    (*

    Forgy selects k elements at random (without replacement) as initial centroids.

    *)
  2. | RandomPartition
    (*

    Assigns each point to a random cluster, and computes the corresponding centroid. Note that these centroids do not necessarily belong to the dataset, which might cause robustness issues.

    *)
  3. | KmeansPP
    (*

    KmeansPP selects initial centroids iteratively with probabilities proportional to their squared distance to the previously selected centroids. This intuitively allows to spread them well.

    *)

K-means is rather sensitive to the initial choice of centroids. This implementation provides several initialization algorithms, the standard one being Kmeans++ (KmeansPP)

type termination =
  1. | Num_iter of int
  2. | Threshold of float
  3. | Min of constraints

Termination of the algorithm can be either specified as: 1) an /exact/ number of iterations Num_iter, 2) a Threshold giving the biggest delta-cost decrease under which we stop iterating, or 3) as the minimum of the above to, i.e. stop iterating when the cost-decrease is under threshold or when we reach max_iter.

and constraints = {
  1. max_iter : int;
  2. threshold : float;
}
exception KmeansError of string

Exception thrown by k_means in case something goes awry.

module Make (E : Element) : sig ... end