bundles / scipy latest / scipy / stats / _mstats_basic / winsorize
function
scipy.stats._mstats_basic:winsorize
Signature
def winsorize ( a , limits = None , inclusive = (True, True) , inplace = False , axis = None , nan_policy = propagate ) Summary
Returns a Winsorized version of the input array.
Extended Summary
The (limits[0])th lowest values are set to the (limits[0])th percentile, and the (limits[1])th highest values are set to the (1 - limits[1])th percentile. Masked values are skipped.
Parameters
a: sequenceInput array.
limits: {None, tuple of float}, optionalTuple of the percentages to cut on each side of the array, with respect to the number of unmasked data, as floats between 0. and 1. Noting n the number of unmasked data before trimming, the (n*limits[0])th smallest data and the (n*limits[1])th largest data are masked, and the total number of unmasked data after trimming is n*(1.-sum(limits)) The value of one limit can be set to None to indicate an open interval.
inclusive: {(True, True) tuple}, optionalTuple indicating whether the number of data being masked on each side should be truncated (True) or rounded (False).
inplace: {False, True}, optionalWhether to winsorize in place (True) or to use a copy (False)
axis: {None, int}, optionalAxis along which to trim. If None, the whole array is trimmed, but its shape is maintained.
nan_policy: {'propagate', 'raise', 'omit'}, optionalDefines how to handle when input contains nan. The following options are available (default is 'propagate'):
'propagate': allows nan values and may overwrite or propagate them
'raise': throws an error
'omit': performs the calculations ignoring nan values
Notes
This function is applied to reduce the effect of possibly spurious outliers by limiting the extreme values.
Examples
import numpy as np from scipy.stats.mstats import winsorize✓
a = np.array([10, 4, 9, 8, 5, 3, 7, 2, 1, 6])
✓winsorize(a, limits=[0.1, 0.2])
✓Aliases
-
scipy.stats._mstats_basic.winsorize