bundles / scipy 1.17.1 / scipy / stats / _quantile / quantile
function
scipy.stats._quantile:quantile
source: /scipy/stats/_quantile.py :138
Signature
def quantile ( x , p , * , method = linear , axis = 0 , nan_policy = propagate , keepdims = None , weights = None ) Summary
Compute the p-th quantile of the data along the specified axis.
Parameters
x: array_like of real numbersData array.
p: array_like of floatProbability or sequence of probabilities of the quantiles to compute. Values must be between 0 and 1 (inclusive). While numpy.quantile can only compute quantiles according to the Cartesian product of the first two arguments, this function enables calculation of quantiles at different probabilities for each axis slice by following broadcasting rules like those of scipy.stats reducing functions. See
axis,keepdims, and the examples.method: str, default: 'linear'The method to use for estimating the quantile. The available options, numbered as they appear in [1], are:
'inverted_cdf'
'averaged_inverted_cdf'
'closest_observation'
'interpolated_inverted_cdf'
'hazen'
'weibull'
'linear' (default)
'median_unbiased'
'normal_unbiased'
'harrell-davis' is also available to compute the quantile estimate according to [2].
'round_outward', 'round_inward', and 'round_nearest' are available for use in trimming and winsorizing data.
See Notes for details.
axis: int or None, default: 0Axis along which the quantiles are computed.
Noneravels bothxandpbefore performing the calculation, without checking whether the original shapes were compatible. As in other scipy.stats functions, a positive integeraxisis resolved after prepending 1s to the shape ofxorpas needed until the two arrays have the same dimensionality. When providingxandpwith different dimensionality, consider using negativeaxisintegers for clarity.nan_policy: str, default: 'propagate'Defines how to handle NaNs in the input data
x.propagate: if a NaN is present in the axis slice (e.g. row) along which the statistic is computed, the corresponding slice of the output will contain NaN(s).omit: NaNs will be omitted when performing the calculation. If insufficient data remains in the axis slice along which the statistic is computed, the corresponding slice of the output will contain NaN(s).raise: if a NaN is present, aValueErrorwill be raised.
If NaNs are present in
p, aValueErrorwill be raised.keepdims: bool, optionalConsider the case in which
xis 1-D andpis a scalar: the quantile is a reducing statistic, and the default behavior is to return a scalar. Ifkeepdimsis set to True, the axis will not be reduced away, and the result will be a 1-D array with one element.The general case is more subtle, since multiple quantiles may be requested for each axis-slice of
x. For instance, if bothxandpare 1-D andp.size > 1, no axis can be reduced away; there must be an axis to contain the number of quantiles given byp.size. Therefore:By default, the axis will be reduced away if possible (i.e. if there is exactly one element of
pper axis-slice ofx).If
keepdimsis set to True, the axis will not be reduced away.If
keepdimsis set to False, the axis will be reduced away if possible, and an error will be raised otherwise.
weights: array_like of finite, non-negative real numbersFrequency weights; e.g., for counting number weights,
quantile(x, p, weights=weights)is equivalent toquantile(np.repeat(x, weights), p). Values other than finite counting numbers are accepted, but may not have valid statistical interpretations. Not compatible withmethod='harrell-davis'or those that begin with'round_'.
Returns
quantile: scalar or ndarrayThe resulting quantile(s). The dtype is the result dtype of
xandp.
Notes
Given a sample x from an underlying distribution, quantile provides a nonparametric estimate of the inverse cumulative distribution function.
By default, this is done by interpolating between adjacent elements in y, a sorted copy of x:
(1-g)*y[j] + g*y[j+1]where the index j and coefficient g are the integral and fractional components of p * (n-1), and n is the number of elements in the sample.
This is a special case of Equation 1 of H&F [1]. More generally,
j = (p*n + m - 1) // 1, andg = (p*n + m - 1) % 1,
where m may be defined according to several different conventions. The preferred convention may be selected using the method parameter:
=============================== =============== =============== ``method`` number in H&F ``m`` =============================== =============== =============== ``interpolated_inverted_cdf`` 4 ``0`` ``hazen`` 5 ``1/2`` ``weibull`` 6 ``p`` ``linear`` (default) 7 ``1 - p`` ``median_unbiased`` 8 ``p/3 + 1/3`` ``normal_unbiased`` 9 ``p/4 + 3/8`` =============================== =============== ===============
Note that indices j and j + 1 are clipped to the range 0 to n - 1 when the results of the formula would be outside the allowed range of non-negative indices. When j is clipped to zero, g is set to zero as well. The -1 in the formulas for j and g accounts for Python's 0-based indexing.
The table above includes only the estimators from [1] that are continuous functions of probability p (estimators 4-9). SciPy also provides the three discontinuous estimators from [1] (estimators 1-3), where j is defined as above, m is defined as follows, and g is 0 when index = p*n + m - 1 is less than 0 and otherwise is defined below.
inverted_cdf:m = 0andg = int(index - j > 0)averaged_inverted_cdf:m = 0andg = (1 + int(index - j > 0)) / 2closest_observation:m = -1/2andg = 1 - int((index == j) & (j%2 == 1))
Note that for methods inverted_cdf and averaged_inverted_cdf, only the relative proportions of tied observations (and relative weights) affect the results; for all other methods, the total number of observations (and absolute weights) matter.
A different strategy for computing quantiles from [2], method='harrell-davis', uses a weighted combination of all elements. The weights are computed as:
where is the number of elements in the sample, are the indices of the sorted elements, , , is the probability of the quantile, and is the regularized, lower incomplete beta function (scipy.special.betainc).
method='round_nearest' is equivalent to indexing y[j], where
j = int(np.round(p*n) if p < 0.5 else np.round(n*p - 1))This is useful when winsorizing data: replacing p*n of the most extreme observations with the next most extreme observation. method='round_outward' adjusts the direction of rounding to winsorize fewer elements
j = int(np.floor(p*n) if p < 0.5 else np.ceil(n*p - 1))and method='round_inward' rounds to winsorize more elements
j = int(np.ceil(p*n) if p < 0.5 else np.floor(n*p - 1))These methods are also useful for trimming data: removing p*n of the most extreme observations. See outliers for example applications.
Array API Standard Support
quantile has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.
==================== ==================== ==================== Library CPU GPU ==================== ==================== ==================== NumPy ✅ n/a CuPy n/a ✅ PyTorch ✅ ✅ JAX ⚠️ no JIT ⚠️ no JIT Dask ⛔ n/a ==================== ==================== ====================
See
dev-arrayapifor more information.
Examples
import numpy as np from scipy import stats x = np.asarray([[10, 8, 7, 5, 4], [0, 1, 2, 3, 5]])✓
stats.quantile(x, 0.5, axis=-1)
✗stats.quantile(x, [[0.25], [0.75]], axis=-1, keepdims=True)
✓stats.quantile(x, [0.25, 0.75], axis=-1)
✓p = np.asarray([[0.25, 0.75], [0.5, 1.0]]) stats.quantile(x, p, axis=-1)✓
stats.quantile(x.T, p.T, axis=0)
✓See also
- numpy.quantile
- outliers
Aliases
-
scipy.stats.quantile