bundles / scipy latest / scipy / stats / _entropy / differential_entropy
function
scipy.stats._entropy:differential_entropy
source: /scipy/stats/_entropy.py :179
Signature
def differential_entropy ( values : ArrayLike , * , window_length : int | None = None , base : float | None = None , axis : int = 0 , method : str = auto , nan_policy = propagate , keepdims = False ) → numpy.number | numpy.ndarray Summary
Given a sample of a distribution, estimate the differential entropy.
Extended Summary
Several estimation methods are available using the method parameter. By default, a method is selected based the size of the sample.
Parameters
values: sequenceSample from a continuous distribution.
window_length: int, optionalWindow length for computing Vasicek estimate. Must be an integer between 1 and half of the sample size. If
None(the default), it uses the heuristic valuewhere is the sample size. This heuristic was originally proposed in [2] and has become common in the literature.
base: float, optionalThe logarithmic base to use, defaults to
e(natural logarithm).axis: int or None, default: 0If an int, the axis of the input along which to compute the statistic. The statistic of each axis-slice (e.g. row) of the input will appear in a corresponding element of the output. If
None, the input will be raveled before computing the statistic.method: {'vasicek', 'van es', 'ebrahimi', 'correa', 'auto'}, optionalThe method used to estimate the differential entropy from the sample. Default is
'auto'. See Notes for more information.nan_policy: {'propagate', 'omit', 'raise'}Defines how to handle input NaNs.
propagate: if a NaN is present in the axis slice (e.g. row) along which the statistic is computed, the corresponding entry of the output will be NaN.omit: NaNs will be omitted when performing the calculation. If insufficient data remains in the axis slice along which the statistic is computed, the corresponding entry of the output will be NaN.raise: if a NaN is present, aValueErrorwill be raised.
keepdims: bool, default: FalseIf this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
Returns
entropy: floatThe calculated differential entropy.
Notes
This function will converge to the true differential entropy in the limit
The optimal choice of window_length for a given sample size depends on the (unknown) distribution. Typically, the smoother the density of the distribution, the larger the optimal value of window_length [1].
The following options are available for the method parameter.
'vasicek'uses the estimator presented in [1]. This is one of the first and most influential estimators of differential entropy.'van es'uses the bias-corrected estimator presented in [3], which is not only consistent but, under some conditions, asymptotically normal.'ebrahimi'uses an estimator presented in [4], which was shown in simulation to have smaller bias and mean squared error than the Vasicek estimator.'correa'uses the estimator presented in [5] based on local linear regression. In a simulation study, it had consistently smaller mean square error than the Vasiceck estimator, but it is more expensive to compute.'auto'selects the method automatically (default). Currently, this selects'van es'for very small samples (<10),'ebrahimi'for moderate sample sizes (11-1000), and'vasicek'for larger samples, but this behavior is subject to change in future versions.
All estimators are implemented as described in [6].
Beginning in SciPy 1.9, np.matrix inputs (not recommended for new code) are converted to np.ndarray before the calculation is performed. In this case, the output will be a scalar or np.ndarray of appropriate shape rather than a 2D np.matrix. Similarly, while masked elements of masked arrays are ignored, the output will be a scalar or np.ndarray rather than a masked array with mask=False.
Array API Standard Support
differential_entropy has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.
==================== ==================== ==================== Library CPU GPU ==================== ==================== ==================== NumPy ✅ n/a CuPy n/a ✅ PyTorch ✅ ✅ JAX ✅ ✅ Dask ✅ n/a ==================== ==================== ====================
See
dev-arrayapifor more information.
Examples
import numpy as np from scipy.stats import differential_entropy, norm✓
rng = np.random.default_rng() values = rng.standard_normal(100)✓
differential_entropy(values)
✗float(norm.entropy())
✓from scipy import stats import matplotlib.pyplot as plt def rmse(res, expected): '''Root mean squared error''' return np.sqrt(np.mean((res - expected)**2)) a, b = np.log10(5), np.log10(1000) ns = np.round(np.logspace(a, b, 10)).astype(int) reps = 1000 # number of repetitions for each sample size expected = stats.expon.entropy() method_errors = {'vasicek': [], 'van es': [], 'ebrahimi': []} for method in method_errors: for n in ns: rvs = stats.expon.rvs(size=(reps, n), random_state=rng) res = stats.differential_entropy(rvs, method=method, axis=-1) error = rmse(res, expected) method_errors[method].append(error)✓
for method, errors in method_errors.items(): plt.loglog(ns, errors, label=method) plt.legend() plt.xlabel('sample size') plt.ylabel('RMSE (1000 trials)') plt.title('Entropy Estimator Error (Exponential Distribution)')✗
Aliases
-
scipy.stats.differential_entropy