package scipy

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type tag = [
  1. | `Gaussian_kde
]
type t = [ `Gaussian_kde | `Object ] Obj.t
val of_pyobject : Py.Object.t -> t
val to_pyobject : [> tag ] Obj.t -> Py.Object.t
val create : ?bw_method: [ `Bool of bool | `F of float | `S of string | `I of int | `Callable of Py.Object.t ] -> ?weights:[> `Ndarray ] Np.Obj.t -> dataset:[> `Ndarray ] Np.Obj.t -> unit -> t

Representation of a kernel-density estimate using Gaussian kernels.

Kernel density estimation is a way to estimate the probability density function (PDF) of a random variable in a non-parametric way. `gaussian_kde` works for both uni-variate and multi-variate data. It includes automatic bandwidth determination. The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed.

Parameters ---------- dataset : array_like Datapoints to estimate from. In case of univariate data this is a 1-D array, otherwise a 2-D array with shape (# of dims, # of data). bw_method : str, scalar or callable, optional The method used to calculate the estimator bandwidth. This can be 'scott', 'silverman', a scalar constant or a callable. If a scalar, this will be used directly as `kde.factor`. If a callable, it should take a `gaussian_kde` instance as only parameter and return a scalar. If None (default), 'scott' is used. See Notes for more details. weights : array_like, optional weights of datapoints. This must be the same shape as dataset. If None (default), the samples are assumed to be equally weighted

Attributes ---------- dataset : ndarray The dataset with which `gaussian_kde` was initialized. d : int Number of dimensions. n : int Number of datapoints. neff : int Effective number of datapoints.

.. versionadded:: 1.2.0 factor : float The bandwidth factor, obtained from `kde.covariance_factor`, with which the covariance matrix is multiplied. covariance : ndarray The covariance matrix of `dataset`, scaled by the calculated bandwidth (`kde.factor`). inv_cov : ndarray The inverse of `covariance`.

Methods ------- evaluate __call__ integrate_gaussian integrate_box_1d integrate_box integrate_kde pdf logpdf resample set_bandwidth covariance_factor

Notes ----- Bandwidth selection strongly influences the estimate obtained from the KDE (much more so than the actual shape of the kernel). Bandwidth selection can be done by a 'rule of thumb', by cross-validation, by 'plug-in methods' or by other means; see 3_, 4_ for reviews. `gaussian_kde` uses a rule of thumb, the default is Scott's Rule.

Scott's Rule 1_, implemented as `scotts_factor`, is::

n**(-1./(d+4)),

with ``n`` the number of data points and ``d`` the number of dimensions. In the case of unequally weighted points, `scotts_factor` becomes::

neff**(-1./(d+4)),

with ``neff`` the effective number of datapoints. Silverman's Rule 2_, implemented as `silverman_factor`, is::

(n * (d + 2) / 4.)**(-1. / (d + 4)).

or in the case of unequally weighted points::

(neff * (d + 2) / 4.)**(-1. / (d + 4)).

Good general descriptions of kernel density estimation can be found in 1_ and 2_, the mathematics for this multi-dimensional implementation can be found in 1_.

With a set of weighted samples, the effective number of datapoints ``neff`` is defined by::

neff = sum(weights)^2 / sum(weights^2)

as detailed in 5_.

References ---------- .. 1 D.W. Scott, 'Multivariate Density Estimation: Theory, Practice, and Visualization', John Wiley & Sons, New York, Chicester, 1992. .. 2 B.W. Silverman, 'Density Estimation for Statistics and Data Analysis', Vol. 26, Monographs on Statistics and Applied Probability, Chapman and Hall, London, 1986. .. 3 B.A. Turlach, 'Bandwidth Selection in Kernel Density Estimation: A Review', CORE and Institut de Statistique, Vol. 19, pp. 1-33, 1993. .. 4 D.M. Bashtannyk and R.J. Hyndman, 'Bandwidth selection for kernel conditional density estimation', Computational Statistics & Data Analysis, Vol. 36, pp. 279-298, 2001. .. 5 Gray P. G., 1969, Journal of the Royal Statistical Society. Series A (General), 132, 272

Examples -------- Generate some random two-dimensional data:

>>> from scipy import stats >>> def measure(n): ... 'Measurement model, return two coupled measurements.' ... m1 = np.random.normal(size=n) ... m2 = np.random.normal(scale=0.5, size=n) ... return m1+m2, m1-m2

>>> m1, m2 = measure(2000) >>> xmin = m1.min() >>> xmax = m1.max() >>> ymin = m2.min() >>> ymax = m2.max()

Perform a kernel density estimate on the data:

>>> X, Y = np.mgridxmin:xmax:100j, ymin:ymax:100j >>> positions = np.vstack(X.ravel(), Y.ravel()) >>> values = np.vstack(m1, m2) >>> kernel = stats.gaussian_kde(values) >>> Z = np.reshape(kernel(positions).T, X.shape)

Plot the results:

>>> import matplotlib.pyplot as plt >>> fig, ax = plt.subplots() >>> ax.imshow(np.rot90(Z), cmap=plt.cm.gist_earth_r, ... extent=xmin, xmax, ymin, ymax) >>> ax.plot(m1, m2, 'k.', markersize=2) >>> ax.set_xlim(xmin, xmax) >>> ax.set_ylim(ymin, ymax) >>> plt.show()

val covariance_factor : [> tag ] Obj.t -> Py.Object.t

Computes the coefficient (`kde.factor`) that multiplies the data covariance matrix to obtain the kernel covariance matrix. The default is `scotts_factor`. A subclass can overwrite this method to provide a different method, or set it through a call to `kde.set_bandwidth`.

val evaluate : points:Py.Object.t -> [> tag ] Obj.t -> Py.Object.t

Evaluate the estimated pdf on a set of points.

Parameters ---------- points : (# of dimensions, # of points)-array Alternatively, a (# of dimensions,) vector can be passed in and treated as a single point.

Returns ------- values : (# of points,)-array The values at each point.

Raises ------ ValueError : if the dimensionality of the input points is different than the dimensionality of the KDE.

val integrate_box : ?maxpts:int -> low_bounds:[> `Ndarray ] Np.Obj.t -> high_bounds:[> `Ndarray ] Np.Obj.t -> [> tag ] Obj.t -> Py.Object.t

Computes the integral of a pdf over a rectangular interval.

Parameters ---------- low_bounds : array_like A 1-D array containing the lower bounds of integration. high_bounds : array_like A 1-D array containing the upper bounds of integration. maxpts : int, optional The maximum number of points to use for integration.

Returns ------- value : scalar The result of the integral.

val integrate_box_1d : low:[ `F of float | `I of int | `Bool of bool | `S of string ] -> high:[ `F of float | `I of int | `Bool of bool | `S of string ] -> [> tag ] Obj.t -> Py.Object.t

Computes the integral of a 1D pdf between two bounds.

Parameters ---------- low : scalar Lower bound of integration. high : scalar Upper bound of integration.

Returns ------- value : scalar The result of the integral.

Raises ------ ValueError If the KDE is over more than one dimension.

val integrate_gaussian : mean:Py.Object.t -> cov:[> `Ndarray ] Np.Obj.t -> [> tag ] Obj.t -> Py.Object.t

Multiply estimated density by a multivariate Gaussian and integrate over the whole space.

Parameters ---------- mean : aray_like A 1-D array, specifying the mean of the Gaussian. cov : array_like A 2-D array, specifying the covariance matrix of the Gaussian.

Returns ------- result : scalar The value of the integral.

Raises ------ ValueError If the mean or covariance of the input Gaussian differs from the KDE's dimensionality.

val integrate_kde : other:Py.Object.t -> [> tag ] Obj.t -> Py.Object.t

Computes the integral of the product of this kernel density estimate with another.

Parameters ---------- other : gaussian_kde instance The other kde.

Returns ------- value : scalar The result of the integral.

Raises ------ ValueError If the KDEs have different dimensionality.

val logpdf : x:Py.Object.t -> [> tag ] Obj.t -> Py.Object.t

Evaluate the log of the estimated pdf on a provided set of points.

val pdf : x:Py.Object.t -> [> tag ] Obj.t -> Py.Object.t

Evaluate the estimated pdf on a provided set of points.

Notes ----- This is an alias for `gaussian_kde.evaluate`. See the ``evaluate`` docstring for more details.

val resample : ?size:int -> ?seed:[ `I of int | `PyObject of Py.Object.t ] -> [> tag ] Obj.t -> Py.Object.t

Randomly sample a dataset from the estimated pdf.

Parameters ---------- size : int, optional The number of samples to draw. If not provided, then the size is the same as the effective number of samples in the underlying dataset. seed : None, int, `~np.random.RandomState`, `~np.random.Generator`, optional This parameter defines the object to use for drawing random variates. If `seed` is `None` the `~np.random.RandomState` singleton is used. If `seed` is an int, a new ``RandomState`` instance is used, seeded with seed. If `seed` is already a ``RandomState`` or ``Generator`` instance, then that object is used. Default is None. Specify `seed` for reproducible drawing of random variates.

Returns ------- resample : (self.d, `size`) ndarray The sampled dataset.

val set_bandwidth : ?bw_method: [ `Bool of bool | `F of float | `S of string | `I of int | `Callable of Py.Object.t ] -> [> tag ] Obj.t -> Py.Object.t

Compute the estimator bandwidth with given method.

The new bandwidth calculated after a call to `set_bandwidth` is used for subsequent evaluations of the estimated density.

Parameters ---------- bw_method : str, scalar or callable, optional The method used to calculate the estimator bandwidth. This can be 'scott', 'silverman', a scalar constant or a callable. If a scalar, this will be used directly as `kde.factor`. If a callable, it should take a `gaussian_kde` instance as only parameter and return a scalar. If None (default), nothing happens; the current `kde.covariance_factor` method is kept.

Notes ----- .. versionadded:: 0.11

Examples -------- >>> import scipy.stats as stats >>> x1 = np.array(-7, -5, 1, 4, 5.) >>> kde = stats.gaussian_kde(x1) >>> xs = np.linspace(-10, 10, num=50) >>> y1 = kde(xs) >>> kde.set_bandwidth(bw_method='silverman') >>> y2 = kde(xs) >>> kde.set_bandwidth(bw_method=kde.factor / 3.) >>> y3 = kde(xs)

>>> import matplotlib.pyplot as plt >>> fig, ax = plt.subplots() >>> ax.plot(x1, np.full(x1.shape, 1 / (4. * x1.size)), 'bo', ... label='Data points (rescaled)') >>> ax.plot(xs, y1, label='Scott (default)') >>> ax.plot(xs, y2, label='Silverman') >>> ax.plot(xs, y3, label='Const (1/3 * Silverman)') >>> ax.legend() >>> plt.show()

val silverman_factor : [> tag ] Obj.t -> float

Compute the Silverman factor.

Returns ------- s : float The silverman factor.

val dataset : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t

Attribute dataset: get value or raise Not_found if None.

val dataset_opt : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t option

Attribute dataset: get value as an option.

val d : t -> int

Attribute d: get value or raise Not_found if None.

val d_opt : t -> int option

Attribute d: get value as an option.

val n : t -> int

Attribute n: get value or raise Not_found if None.

val n_opt : t -> int option

Attribute n: get value as an option.

val neff : t -> int

Attribute neff: get value or raise Not_found if None.

val neff_opt : t -> int option

Attribute neff: get value as an option.

val factor : t -> float

Attribute factor: get value or raise Not_found if None.

val factor_opt : t -> float option

Attribute factor: get value as an option.

val covariance : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t

Attribute covariance: get value or raise Not_found if None.

val covariance_opt : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t option

Attribute covariance: get value as an option.

val inv_cov : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t

Attribute inv_cov: get value or raise Not_found if None.

val inv_cov_opt : t -> [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t option

Attribute inv_cov: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Format.formatter -> t -> unit

Pretty-print the object to a formatter.

OCaml

Innovation. Community. Security.