{ } Raw JSON

bundles / scipy latest / scipy / stats / _continuous_distns / rv_histogram

class

scipy.stats._continuous_distns:rv_histogram

source: /scipy/stats/_continuous_distns.py :12008

Signature

class   rv_histogram ( histogram * args density = None ** kwargs )

Members

Summary

Generates a distribution given by a histogram. This is useful to generate a template distribution from a binned datasample.

Extended Summary

As a subclass of the rv_continuous class, rv_histogram inherits from it a collection of generic methods (see rv_continuous for the full list), and implements them based on the properties of the provided binned datasample.

Parameters

histogram : tuple of array_like

Tuple containing two array_like objects. The first containing the content of n bins, the second containing the (n+1) bin boundaries. In particular, the return value of numpy.histogram is accepted.

density : bool, optional

If False, assumes the histogram is proportional to counts per bin; otherwise, assumes it is proportional to a density. For constant bin widths, these are equivalent, but the distinction is important when bin widths vary (see Notes). If None (default), sets density=True for backwards compatibility, but warns if the bin widths are variable. Set density explicitly to silence the warning.

Notes

When a histogram has unequal bin widths, there is a distinction between histograms that are proportional to counts per bin and histograms that are proportional to probability density over a bin. If numpy.histogram is called with its default density=False, the resulting histogram is the number of counts per bin, so density=False should be passed to rv_histogram. If numpy.histogram is called with density=True, the resulting histogram is in terms of probability density, so density=True should be passed to rv_histogram. To avoid warnings, always pass density explicitly when the input histogram has unequal bin widths.

There are no additional shape parameters except for the loc and scale. The pdf is defined as a stepwise function from the provided histogram. The cdf is a linear interpolation of the pdf.

Examples

Create a scipy.stats distribution from a numpy histogram
import scipy.stats
import numpy as np
data = scipy.stats.norm.rvs(size=100000, loc=0, scale=1.5,
                            random_state=123)
hist = np.histogram(data, bins=100)
hist_dist = scipy.stats.rv_histogram(hist, density=False)
Behaves like an ordinary scipy rv_continuous distribution
hist_dist.pdf(1.0)
hist_dist.cdf(2.0)
PDF is zero above (below) the highest (lowest) bin of the histogram, defined by the max (min) of the original dataset
hist_dist.pdf(np.max(data))
hist_dist.cdf(np.max(data))
hist_dist.pdf(np.min(data))
hist_dist.cdf(np.min(data))
PDF and CDF follow the histogram
import matplotlib.pyplot as plt
X = np.linspace(-5.0, 5.0, 100)
fig, ax = plt.subplots()
ax.set_title("PDF from Template")
ax.hist(data, density=True, bins=100)
ax.plot(X, hist_dist.pdf(X), label='PDF')
ax.plot(X, hist_dist.cdf(X), label='CDF')
ax.legend()
fig.show()

Aliases

  • scipy.stats.rv_histogram

Referenced by

This package