bundles / scipy 1.17.1 / scipy / stats / _qmc / LatinHypercube

ABCMeta

`scipy.stats._qmc:LatinHypercube`

source: /scipy/stats/_qmc.py :1286

Signature

   def     LatinHypercube (    d  :  int | numpy.integer    ,  * ,     scramble  :  bool    =  True   ,    strength  :  int    =  1   ,    optimization  :  Literal['random-cd', 'lloyd'] | None    =  None   ,    rng  :  int | numpy.integer | numpy.random._generator.Generator | numpy.random.mtrand.RandomState | None    =  None   ,    seed    =  None     )   →  None

Members

Summary

Latin hypercube sampling (LHS).

Extended Summary

A Latin hypercube sample ^[1] generates $n$ points in $[0, 1)^{d}$ . Each univariate marginal distribution is stratified, placing exactly one point in $[j / n, (j + 1) / n)$ for $j = 0, 1, ..., n - 1$ . They are still applicable when $n << d$ .

Parameters

d : int

Dimension of the parameter space.

scramble : bool, optional

When False, center samples within cells of a multi-dimensional grid. Otherwise, samples are randomly placed within cells of the grid.

Default is True.

optimization : {None, "random-cd", "lloyd"}, optional

Whether to use an optimization scheme to improve the quality after sampling. Note that this is a post-processing step that does not guarantee that all properties of the sample will be conserved. Default is None.

random-cd: random permutations of coordinates to lower the centered discrepancy. The best sample based on the centered discrepancy is constantly updated. Centered discrepancy-based sampling shows better space-filling robustness toward 2D and 3D subprojections compared to using other discrepancy measures.
lloyd: Perturb samples using a modified Lloyd-Max algorithm. The process converges to equally spaced samples.

strength : {1, 2}, optional

Strength of the LHS. strength=1 produces a plain LHS while strength=2 produces an orthogonal array based LHS of strength 2 ^[7], ^[8]. In that case, only n=p**2 points can be sampled, with p a prime number. It also constrains d <= p + 1. Default is 1.

rng : `numpy.random.Generator`, optional

Pseudorandom number generator state. When rng is None, a new numpy.random.Generator is created using entropy from the operating system. Types other than numpy.random.Generator are passed to numpy.random.default_rng to instantiate a Generator.

Notes

When LHS is used for integrating a function $f$ over $n$ , LHS is extremely effective on integrands that are nearly additive ^[2]. With a LHS of $n$ points, the variance of the integral is always lower than plain MC on $n - 1$ points ^[3]. There is a central limit theorem for LHS on the mean and variance of the integral ^[4], but not necessarily for optimized LHS due to the randomization.

$A$ is called an orthogonal array of strength $t$ if in each n-row-by-t-column submatrix of $A$ : all $p^{t}$ possible distinct rows occur the same number of times. The elements of $A$ are in the set ${0, 1, ..., p - 1}$ , also called symbols. The constraint that $p$ must be a prime number is to allow modular arithmetic. Increasing strength adds some symmetry to the sub-projections of a sample. With strength 2, samples are symmetric along the diagonals of 2D sub-projections. This may be undesirable, but on the other hand, the sample dispersion is improved.

Strength 1 (plain LHS) brings an advantage over strength 0 (MC) and strength 2 is a useful increment over strength 1. Going to strength 3 is a smaller increment and scrambled QMC like Sobol', Halton are more performant ^[7].

To create a LHS of strength 2, the orthogonal array $A$ is randomized by applying a random, bijective map of the set of symbols onto itself. For example, in column 0, all 0s might become 2; in column 1, all 0s might become 1, etc. Then, for each column $i$ and symbol $j$ , we add a plain, one-dimensional LHS of size $p$ to the subarray where $A^{i} = j$ . The resulting matrix is finally divided by $p$ .

Examples

Generate samples from a Latin hypercube generator.

from scipy.stats import qmc
sampler = qmc.LatinHypercube(d=2)
sample = sampler.random(n=5)

✓

sample

✗

Compute the quality of the sample using the discrepancy criterion.

qmc.discrepancy(sample)

✗

Samples can be scaled to bounds.

l_bounds = [0, 2]
u_bounds = [10, 5]

✓

qmc.scale(sample, l_bounds, u_bounds)

✗

Below are other examples showing alternative ways to construct LHS with even better coverage of the space. Using a base LHS as a baseline.

sampler = qmc.LatinHypercube(d=2)
sample = sampler.random(n=5)

✓

qmc.discrepancy(sample)

✗

Use the `optimization` keyword argument to produce a LHS with lower discrepancy at higher computational cost.

sampler = qmc.LatinHypercube(d=2, optimization="random-cd")
sample = sampler.random(n=5)

✓

qmc.discrepancy(sample)

✗

Use the `strength` keyword argument to produce an orthogonal array based LHS of strength 2. In this case, the number of sample points must be the square of a prime number.

sampler = qmc.LatinHypercube(d=2, strength=2)
sample = sampler.random(n=9)

✓

qmc.discrepancy(sample)

✗

Options could be combined to produce an optimized centered orthogonal array based LHS. After optimization, the result would not be guaranteed to be of strength 2. **Real-world example** In [9]_, a Latin Hypercube sampling (LHS) strategy was used to sample a parameter space to study the importance of each parameter of an epidemic model. Such analysis is also called a sensitivity analysis. Since the dimensionality of the problem is high (6), it is computationally expensive to cover the space. When numerical experiments are costly, QMC enables analysis that may not be possible if using a grid. The six parameters of the model represented the probability of illness, the probability of withdrawal, and four contact probabilities. The authors assumed uniform distributions for all parameters and generated 50 samples. Using `scipy.stats.qmc.LatinHypercube` to replicate the protocol, the first step is to create a sample in the unit hypercube:

from scipy.stats import qmc
sampler = qmc.LatinHypercube(d=6)
sample = sampler.random(n=50)

✓

Then the sample can be scaled to the appropriate bounds:

l_bounds = [0.000125, 0.01, 0.0025, 0.05, 0.47, 0.7]
u_bounds = [0.000375, 0.03, 0.0075, 0.15, 0.87, 0.9]
sample_scaled = qmc.scale(sample, l_bounds, u_bounds)

✓

Such a sample was used to run the model 50 times, and a polynomial response surface was constructed. This allowed the authors to study the relative importance of each parameter across the range of possibilities of every other parameter. In this computer experiment, they showed a 14-fold reduction in the number of samples required to maintain an error below 2% on their response surface when compared to a grid sampling.

Aliases

scipy.stats._qmc.LatinHypercube

`scipy.stats._qmc:LatinHypercube`

Signature

Members

Summary

Extended Summary

Parameters

Notes

Examples

See also

Aliases

Referenced by

This package