bundles / scipy 1.17.1 / scipy / stats / _stats_py / quantile_test
function
scipy.stats._stats_py:quantile_test
source: /scipy/stats/_stats_py.py :9048
Signature
def quantile_test ( x , * , q = 0 , p = 0.5 , alternative = two-sided ) Summary
Perform a quantile test and compute a confidence interval of the quantile.
Extended Summary
This function tests the null hypothesis that q is the value of the quantile associated with probability p of the population underlying sample x. For example, with default parameters, it tests that the median of the population underlying x is zero. The function returns an object including the test statistic, a p-value, and a method for computing the confidence interval around the quantile.
Parameters
x: array_likeA one-dimensional sample.
q: float, default: 0The hypothesized value of the quantile.
p: float, default: 0.5The probability associated with the quantile; i.e. the proportion of the population less than
qisp. Must be strictly between 0 and 1.alternative: {'two-sided', 'less', 'greater'}, optionalDefines the alternative hypothesis. The following options are available (default is 'two-sided'):
'two-sided': the quantile associated with the probability
pis notq.'less': the quantile associated with the probability
pis less thanq.'greater': the quantile associated with the probability
pis greater thanq.
Returns
result: QuantileTestResultAn object with the following attributes:
statistic
statistic
statistic_type
statistic_type
pvalue
pvalue
The object also has the following method:
confidence_interval(confidence_level=0.95)
Computes a confidence interval around the the population quantile associated with the probability
p. The confidence interval is returned in anamedtuplewith fieldslowandhigh. Values arenanwhen there are not enough observations to compute the confidence interval at the desired confidence.
Notes
This test and its method for computing confidence intervals are non-parametric. They are valid if and only if the observations are i.i.d.
The implementation of the test follows Conover [1]. Two test statistics are considered.
T1: The number of observations in x less than or equal to q.
T1 = (x <= q).sum()
T2: The number of observations in x strictly less than q.
T2 = (x < q).sum()
The use of two test statistics is necessary to handle the possibility that x was generated from a discrete or mixed distribution.
The null hypothesis for the test is:
H0: The population quantile is
q.
and the null distribution for each test statistic is . When alternative='less', the alternative hypothesis is:
H1: The population quantile is less than
q.
and the p-value is the probability that the binomial random variable
is greater than or equal to the observed value T2.
When alternative='greater', the alternative hypothesis is:
H1: The population quantile is greater than
q
and the p-value is the probability that the binomial random variable Y is less than or equal to the observed value T1.
When alternative='two-sided', the alternative hypothesis is
H1:
qis not the population quantile.
and the p-value is twice the smaller of the p-values for the 'less' and 'greater' cases. Both of these p-values can exceed 0.5 for the same data, so the value is clipped into the interval .
The approach for confidence intervals is attributed to Thompson [2] and later proven to be applicable to any set of i.i.d. samples [3]. The computation is based on the observation that the probability of a quantile to be larger than any observations can be computed as
By default, confidence intervals are computed for a 95% confidence level. A common interpretation of a 95% confidence intervals is that if i.i.d. samples are drawn repeatedly from the same population and confidence intervals are formed each time, the confidence interval will contain the true value of the specified quantile in approximately 95% of trials.
A similar function is available in the QuantileNPCI R package [4]. The foundation is the same, but it computes the confidence interval bounds by doing interpolations between the sample values, whereas this function uses only sample values as bounds. Thus, quantile_test.confidence_interval returns more conservative intervals (i.e., larger).
The same computation of confidence intervals for quantiles is included in the confintr package [5].
Two-sided confidence intervals are not guaranteed to be optimal; i.e., there may exist a tighter interval that may contain the quantile of interest with probability larger than the confidence level. Without further assumption on the samples (e.g., the nature of the underlying distribution), the one-sided intervals are optimally tight.
Array API Standard Support
quantile_test has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.
==================== ==================== ==================== Library CPU GPU ==================== ==================== ==================== NumPy ✅ n/a CuPy n/a ⛔ PyTorch ⛔ ⛔ JAX ⛔ ⛔ Dask ⛔ n/a ==================== ==================== ====================
See
dev-arrayapifor more information.
Examples
Suppose we wish to test the null hypothesis that the median of a population is equal to 0.5. We choose a confidence level of 99%; that is, we will reject the null hypothesis in favor of the alternative if the p-value is less than 0.01. When testing random variates from the standard uniform distribution, which has a median of 0.5, we expect the data to be consistent with the null hypothesis most of the time.import numpy as np from scipy import stats rng = np.random.default_rng(6981396440634228121) rvs = stats.uniform.rvs(size=100, random_state=rng)✓
stats.quantile_test(rvs, q=0.5, p=0.5)
✗rvs = stats.norm.rvs(size=100, random_state=rng)
✓stats.quantile_test(rvs, q=0.5, p=0.5)
✗stats.quantile_test(rvs, q=0.5, p=0.5, alternative='greater')
✗rvs = stats.uniform.rvs(size=100, random_state=rng)
✓stats.quantile_test(rvs, q=0.6, p=0.75, alternative='greater')
✗rvs = stats.norm.rvs(size=100, random_state=rng) res = stats.quantile_test(rvs, q=0.6, p=0.75) ci = res.confidence_interval(confidence_level=0.95)✓
ci
✗rvs.sort() q, p, alpha = 0.6, 0.75, 0.95 res = stats.quantile_test(rvs, q=q, p=p, alternative='less') ci = res.confidence_interval(confidence_level=alpha) for x in rvs[rvs <= ci.high]: res = stats.quantile_test(rvs, q=x, p=p, alternative='less') assert res.pvalue > 1-alpha for x in rvs[rvs > ci.high]: res = stats.quantile_test(rvs, q=x, p=p, alternative='less') assert res.pvalue < 1-alpha✓
dist = stats.rayleigh() # our "unknown" distribution p = 0.2 true_stat = dist.ppf(p) # the true value of the statistic n_trials = 1000 quantile_ci_contains_true_stat = 0 for i in range(n_trials): data = dist.rvs(size=100, random_state=rng) res = stats.quantile_test(data, p=p) ci = res.confidence_interval(0.95) if ci[0] < true_stat < ci[1]: quantile_ci_contains_true_stat += 1 quantile_ci_contains_true_stat >= 950✓
Aliases
-
scipy.stats.quantile_test