`scipy.stats._continuous_distns:rv_histogram`

source: /scipy/stats/_continuous_distns.py :12008

Signature

class rv_histogram ( histogram , * args , density = None , ** kwargs )

Members

Summary

Generates a distribution given by a histogram. This is useful to generate a template distribution from a binned datasample.

Extended Summary

As a subclass of the rv_continuous class, rv_histogram inherits from it a collection of generic methods (see rv_continuous for the full list), and implements them based on the properties of the provided binned datasample.

Parameters

histogram : tuple of array_like: Tuple containing two array_like objects. The first containing the content of n bins, the second containing the (n+1) bin boundaries. In particular, the return value of numpy.histogram is accepted.
density : bool, optional: If False, assumes the histogram is proportional to counts per bin; otherwise, assumes it is proportional to a density. For constant bin widths, these are equivalent, but the distinction is important when bin widths vary (see Notes). If None (default), sets density=True for backwards compatibility, but warns if the bin widths are variable. Set density explicitly to silence the warning.
versionadded 1.10.0

Notes

When a histogram has unequal bin widths, there is a distinction between histograms that are proportional to counts per bin and histograms that are proportional to probability density over a bin. If numpy.histogram is called with its default density=False, the resulting histogram is the number of counts per bin, so density=False should be passed to rv_histogram. If numpy.histogram is called with density=True, the resulting histogram is in terms of probability density, so density=True should be passed to rv_histogram. To avoid warnings, always pass density explicitly when the input histogram has unequal bin widths.

There are no additional shape parameters except for the loc and scale. The pdf is defined as a stepwise function from the provided histogram. The cdf is a linear interpolation of the pdf.

Examples

Create a scipy.stats distribution from a numpy histogram

import scipy.stats
import numpy as np
data = scipy.stats.norm.rvs(size=100000, loc=0, scale=1.5,
                            random_state=123)
hist = np.histogram(data, bins=100)
hist_dist = scipy.stats.rv_histogram(hist, density=False)

✓

Behaves like an ordinary scipy rv_continuous distribution

hist_dist.pdf(1.0)
hist_dist.cdf(2.0)

✗

PDF is zero above (below) the highest (lowest) bin of the histogram, defined by the max (min) of the original dataset

hist_dist.pdf(np.max(data))
hist_dist.cdf(np.max(data))
hist_dist.pdf(np.min(data))
hist_dist.cdf(np.min(data))

✗

PDF and CDF follow the histogram

import matplotlib.pyplot as plt
X = np.linspace(-5.0, 5.0, 100)
fig, ax = plt.subplots()

✓

ax.set_title("PDF from Template")
ax.hist(data, density=True, bins=100)
ax.plot(X, hist_dist.pdf(X), label='PDF')
ax.plot(X, hist_dist.cdf(X), label='CDF')
ax.legend()

✗

fig.show()

✓

Aliases

scipy.stats.rv_histogram

Referenced by

This package

release:0.19.0-notes