bundles / scipy latest / scipy / stats / _stats_py / kstest

function

`scipy.stats._stats_py:kstest`

source: /scipy/stats/_stats_py.py :8060

Signature

   def     kstest (    rvs    ,    cdf    ,    args    =  ()   ,    N    =  20   ,    alternative    =  two-sided   ,    method    =  auto   ,  * ,     axis    =  0   ,    nan_policy    =  propagate   ,    keepdims    =  False     )

Summary

Performs the (one-sample or two-sample) Kolmogorov-Smirnov test for goodness of fit.

Extended Summary

The one-sample test compares the underlying distribution F(x) of a sample against a given distribution G(x). The two-sample test compares the underlying distributions of two independent samples. Both tests are valid only for continuous distributions.

Parameters

rvs : str, array_like, or callable

If an array, it should be a 1-D array of observations of random variables. If a callable, it should be a function to generate random variables; it is required to have a keyword argument size. If a string, it should be the name of a distribution in scipy.stats, which will be used to generate random variables.

cdf : str, array_like or callable

If array_like, it should be a 1-D array of observations of random variables, and the two-sample test is performed (and rvs must be array_like). If a callable, that callable is used to calculate the cdf. If a string, it should be the name of a distribution in scipy.stats, which will be used as the cdf function.

args : tuple, sequence, optional

Distribution parameters, used if rvs or cdf are strings or callables.

N : int, optional

Sample size if rvs is string or callable. Default is 20.

alternative : {'two-sided', 'less', 'greater'}, optional

Defines the null and alternative hypotheses. Default is 'two-sided'. Please see explanations in the Notes below.

method : {'auto', 'exact', 'approx', 'asymp'}, optional

Defines the distribution used for calculating the p-value. The following options are available (default is 'auto'):

'auto'selects one of the other options.
'exact'uses the exact distribution of test statistic.
'approx'approximates the two-sided probability with twice the one-sided probability
'asymp': uses asymptotic distribution of test statistic

axis : int or None, default: 0

If an int, the axis of the input along which to compute the statistic. The statistic of each axis-slice (e.g. row) of the input will appear in a corresponding element of the output. If None, the input will be raveled before computing the statistic.

nan_policy : {'propagate', 'omit', 'raise'}

Defines how to handle input NaNs.

propagate: if a NaN is present in the axis slice (e.g. row) along which the statistic is computed, the corresponding entry of the output will be NaN.
omit: NaNs will be omitted when performing the calculation. If insufficient data remains in the axis slice along which the statistic is computed, the corresponding entry of the output will be NaN.
raise: if a NaN is present, a ValueError will be raised.

keepdims : bool, default: False

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

Returns

: res: KstestResult

An object containing attributes:

statistic: statistic
pvalue: pvalue
statistic_location: statistic_location
statistic_sign: statistic_sign

Notes

There are three options for the null and corresponding alternative hypothesis that can be selected using the alternative parameter.

two-sided: The null hypothesis is that the two distributions are identical, F(x)=G(x) for all x; the alternative is that they are not identical.
less: The null hypothesis is that F(x) >= G(x) for all x; the alternative is that F(x) < G(x) for at least one x.
greater: The null hypothesis is that F(x) <= G(x) for all x; the alternative is that F(x) > G(x) for at least one x.

Note that the alternative hypotheses describe the CDFs of the underlying distributions, not the observed values. For example, suppose x1 ~ F and x2 ~ G. If F(x) > G(x) for all x, the values in x1 tend to be less than those in x2.

Beginning in SciPy 1.9, np.matrix inputs (not recommended for new code) are converted to np.ndarray before the calculation is performed. In this case, the output will be a scalar or np.ndarray of appropriate shape rather than a 2D np.matrix. Similarly, while masked elements of masked arrays are ignored, the output will be a scalar or np.ndarray rather than a masked array with mask=False.

Array API Standard Support

kstest is not in-scope for support of Python Array API Standard compatible backends other than NumPy.

See dev-arrayapi for more information.

Examples

Suppose we wish to test the null hypothesis that a sample is distributed according to the standard normal. We choose a confidence level of 95%; that is, we will reject the null hypothesis in favor of the alternative if the p-value is less than 0.05. When testing uniformly distributed data, we would expect the null hypothesis to be rejected.

import numpy as np
from scipy import stats
rng = np.random.default_rng()

✓

stats.kstest(stats.uniform.rvs(size=100, random_state=rng),
             stats.norm.cdf)

✗

Indeed, the p-value is lower than our threshold of 0.05, so we reject the null hypothesis in favor of the default "two-sided" alternative: the data are *not* distributed according to the standard normal. When testing random variates from the standard normal distribution, we expect the data to be consistent with the null hypothesis most of the time.

x = stats.norm.rvs(size=100, random_state=rng)

✓

stats.kstest(x, stats.norm.cdf)

✗

As expected, the p-value of 0.92 is not below our threshold of 0.05, so we cannot reject the null hypothesis. Suppose, however, that the random variates are distributed according to a normal distribution that is shifted toward greater values. In this case, the cumulative density function (CDF) of the underlying distribution tends to be *less* than the CDF of the standard normal. Therefore, we would expect the null hypothesis to be rejected with ``alternative='less'``:

x = stats.norm.rvs(size=100, loc=0.5, random_state=rng)

✓

stats.kstest(x, stats.norm.cdf, alternative='less')

✗

and indeed, with p-value smaller than our threshold, we reject the null hypothesis in favor of the alternative. For convenience, the previous test can be performed using the name of the distribution as the second argument.

stats.kstest(x, "norm", alternative='less')

✗

The examples above have all been one-sample tests identical to those performed by `ks_1samp`. Note that `kstest` can also perform two-sample tests identical to those performed by `ks_2samp`. For example, when two samples are drawn from the same distribution, we expect the data to be consistent with the null hypothesis most of the time.

sample1 = stats.laplace.rvs(size=105, random_state=rng)
sample2 = stats.laplace.rvs(size=95, random_state=rng)

✓

stats.kstest(sample1, sample2)

✗

As expected, the p-value of 0.45 is not below our threshold of 0.05, so we cannot reject the null hypothesis.

Aliases

scipy.stats.kstest