bundles / scipy latest / scipy / optimize / _differentialevolution / differential_evolution
function
scipy.optimize._differentialevolution:differential_evolution
Signature
def differential_evolution ( func , bounds , args = () , strategy = best1bin , maxiter = 1000 , popsize = 15 , tol = 0.01 , mutation = (0.5, 1) , recombination = 0.7 , rng = None , callback = None , disp = False , polish = True , init = latinhypercube , atol = 0 , updating = immediate , workers = 1 , constraints = () , x0 = None , * , integrality = None , vectorized = False , seed = None ) Summary
Finds the global minimum of a multivariate function.
Extended Summary
The differential evolution method [1] is stochastic in nature. It does not use gradient methods to find the minimum, and can search large areas of candidate space, but often requires larger numbers of function evaluations than conventional gradient-based techniques.
The algorithm is due to Storn and Price [2].
Parameters
func: callableThe objective function to be minimized. Must be in the form
f(x, *args), wherexis the argument in the form of a 1-D array andargsis a tuple of any additional fixed parameters needed to completely specify the function. The number of parameters, N, is equal tolen(x).bounds: sequence or `Bounds`Bounds for variables. There are two ways to specify the bounds:
Instance of Bounds class.
(min, max)pairs for each element inx, defining the finite lower and upper bounds for the optimizing argument offunc.
The total number of bounds is used to determine the number of parameters, N. If there are parameters whose bounds are equal the total number of free parameters is
N - N_equal.args: tuple, optionalAny additional fixed parameters needed to completely specify the objective function.
strategy: {str, callable}, optionalThe differential evolution strategy to use. Should be one of:
'best1bin'
'best1exp'
'rand1bin'
'rand1exp'
'rand2bin'
'rand2exp'
'randtobest1bin'
'randtobest1exp'
'currenttobest1bin'
'currenttobest1exp'
'best2exp'
'best2bin'
The default is 'best1bin'. Strategies that may be implemented are outlined in 'Notes'. Alternatively the differential evolution strategy can be customized by providing a callable that constructs a trial vector. The callable must have the form
strategy(candidate: int, population: np.ndarray, rng=None), wherecandidateis an integer specifying which entry of the population is being evolved,populationis an array of shape(S, N)containing all the population members (where S is the total population size), andrngis the random number generator being used within the solver.candidatewill be in the range[0, S).strategymust return a trial vector with shape(N,). The fitness of this trial vector is compared against the fitness ofpopulation[candidate].maxiter: int, optionalThe maximum number of generations over which the entire population is evolved. The maximum number of function evaluations (with no polishing) is:
(maxiter + 1) * popsize * (N - N_equal)popsize: int, optionalA multiplier for setting the total population size. The population has
popsize * (N - N_equal)individuals. This keyword is overridden if an initial population is supplied via theinitkeyword. When usinginit='sobol'the population size is calculated as the next power of 2 afterpopsize * (N - N_equal).tol: float, optionalRelative tolerance for convergence, the solving stops when
np.std(population_energies) <= atol + tol * np.abs(np.mean(population_energies)), where andatolandtolare the absolute and relative tolerance respectively.mutation: float or tuple(float, float), optionalThe mutation constant. In the literature this is also known as differential weight, being denoted by . If specified as a float it should be in the range [0, 2). If specified as a tuple
(min, max)dithering is employed. Dithering randomly changes the mutation constant on a generation by generation basis. The mutation constant for that generation is taken fromU[min, max). Dithering can help speed convergence significantly. Increasing the mutation constant increases the search radius, but will slow down convergence.recombination: float, optionalThe recombination constant, should be in the range [0, 1]. In the literature this is also known as the crossover probability, being denoted by CR. Increasing this value allows a larger number of mutants to progress into the next generation, but at the risk of population stability.
rng: {None, int, `numpy.random.Generator`}, optionalIf
rngis passed by keyword, types other than numpy.random.Generator are passed to numpy.random.default_rng to instantiate aGenerator. Ifrngis already aGeneratorinstance, then the provided instance is used. Specifyrngfor repeatable function behavior.If this argument is passed by position or
seedis passed by keyword, legacy behavior for the argumentseedapplies:If
seedis None (or numpy.random), the numpy.random.RandomState singleton is used.If
seedis an int, a newRandomStateinstance is used, seeded withseed.If
seedis already aGeneratororRandomStateinstance then that instance is used.
disp: bool, optionalPrints the evaluated
funcat every iteration.callback: callable, optionalA callable called after each iteration. Has the signature
callback(intermediate_result: OptimizeResult)where
intermediate_resultis a keyword parameter containing an OptimizeResult with attributesxandfun, the best solution found so far and the objective function. Note that the name of the parameter must beintermediate_resultfor the callback to be passed an OptimizeResult.The callback also supports a signature like
callback(x, convergence: float=val)valrepresents the fractional value of the population convergence. Whenvalis greater than1.0, the function halts.Introspection is used to determine which of the signatures is invoked.
Global minimization will halt if the callback raises
StopIterationor returnsTrue; any polishing is still carried out.polish: {bool, callable}, optionalIf True (default), then scipy.optimize.minimize with the
L-BFGS-Bmethod is used to polish the best population member at the end, which can improve the minimization slightly. If a constrained problem is being studied then thetrust-constrmethod is used instead. For large problems with many constraints, polishing can take a long time due to the Jacobian computations. Alternatively supply a callable that has aminimize-like signature,polish_func(func, x0, **kwds)and returns an OptimizeResult. This allows the user to have fine control over how the polishing occurs.boundsandconstraintswill be present inkwds. Extra keywords could be supplied topolish_funcusing functools.partial. It is the user's responsibility to ensure that the polishing function obeys bounds, any constraints (including integrality constraints), and that appropriate attributes are set in the OptimizeResult, such asfun,x,nfev,jac.init: str or array-like, optionalSpecify which type of population initialization is performed. Should be one of:
'latinhypercube'
'sobol'
'halton'
'random'
array specifying the initial population. The array should have shape
(S, N), where S is the total population size and N is the number of parameters.
initis clipped toboundsbefore use.The default is 'latinhypercube'. Latin Hypercube sampling tries to maximize coverage of the available parameter space.
'sobol' and 'halton' are superior alternatives and maximize even more the parameter space. 'sobol' will enforce an initial population size which is calculated as the next power of 2 after
popsize * (N - N_equal). 'halton' has no requirements but is a bit less efficient. See scipy.stats.qmc for more details.'random' initializes the population randomly - this has the drawback that clustering can occur, preventing the whole of parameter space being covered. Use of an array to specify a population could be used, for example, to create a tight bunch of initial guesses in an location where the solution is known to exist, thereby reducing time for convergence.
atol: float, optionalAbsolute tolerance for convergence, the solving stops when
np.std(population_energies) <= atol + tol * np.abs(np.mean(population_energies)), where andatolandtolare the absolute and relative tolerance respectively.updating: {'immediate', 'deferred'}, optionalIf
'immediate', the best solution vector is continuously updated within a single generation [4]. This can lead to faster convergence as trial vectors can take advantage of continuous improvements in the best solution. With'deferred', the best solution vector is updated once per generation. Only'deferred'is compatible with parallelization or vectorization, and theworkersandvectorizedkeywords can over-ride this option.workers: int or map-like callable, optionalIf
workersis an int the population is subdivided intoworkerssections and evaluated in parallel (usesmultiprocessing.Pool <multiprocessing>). Supply -1 to use all available CPU cores. Alternatively supply a map-like callable, such asmultiprocessing.Pool.mapfor evaluating the population in parallel. This evaluation is carried out asworkers(func, iterable). This option will override theupdatingkeyword toupdating='deferred'ifworkers != 1. This option overrides thevectorizedkeyword ifworkers != 1. Requires thatfuncbe pickleable.constraints: {NonLinearConstraint, LinearConstraint, Bounds}Constraints on the solver, over and above those applied by the
boundskwd. Uses the approach by Lampinen [5].x0: None or array-like, optionalProvides an initial guess to the minimization. Once the population has been initialized this vector replaces the first (best) member. This replacement is done even if
initis given an initial population.x0.shape == (N,).integrality: 1-D array, optionalFor each decision variable, a boolean value indicating whether the decision variable is constrained to integer values. The array is broadcast to
(N,). If any decision variables are constrained to be integral, they will not be changed during polishing. Only integer values lying between the lower and upper bounds are used. If there are no integer values lying between the bounds then aValueErroris raised.vectorized: bool, optionalIf
vectorized is True,funcis sent anxarray withx.shape == (N, S), and is expected to return an array of shape(S,), whereSis the number of solution vectors to be calculated. If constraints are applied, each of the functions used to construct aConstraintobject should accept anxarray withx.shape == (N, S), and return an array of shape(M, S), whereMis the number of constraint components. This option is an alternative to the parallelization offered byworkers, and may help in optimization speed by reducing interpreter overhead from multiple function calls. This keyword is ignored ifworkers != 1. This option will override theupdatingkeyword toupdating='deferred'. See the notes section for further discussion on when to use'vectorized', and when to use'workers'.
Returns
res: OptimizeResultThe optimization result represented as a OptimizeResult object. Important attributes are:
xthe solution array,successa Boolean flag indicating if the optimizer exited successfully,messagewhich describes the cause of the termination,populationthe solution vectors present in the population, andpopulation_energiesthe value of the objective function for each entry inpopulation. See OptimizeResult for a description of other attributes. Ifpolishwas employed, and a lower minimum was obtained by the polishing, then OptimizeResult also contains thejacattribute. If the eventual solution does not satisfy the applied constraintssuccesswill beFalse.
Notes
Differential evolution is a stochastic population based method that is useful for global optimization problems. At each pass through the population the algorithm mutates each candidate solution by mixing with other candidate solutions to create a trial candidate. There are several strategies [3] for creating trial candidates, which suit some problems more than others. The 'best1bin' strategy is a good starting point for many systems. In this strategy two members of the population are randomly chosen. Their difference is used to mutate the best member (the 'best' in 'best1bin'), , so far:
where is the mutation parameter. A trial vector is then constructed. Starting with a randomly chosen ith parameter the trial is sequentially filled (in modulo) with parameters from b' or the original candidate. The choice of whether to use b' or the original candidate is made with a binomial distribution (the 'bin' in 'best1bin') - a random number in [0, 1) is generated. If this number is less than the recombination constant then the parameter is loaded from b', otherwise it is loaded from the original candidate. A randomly selected parameter is always loaded from b'. For binomial crossover, this is a single random parameter. For exponential crossover, this is the starting point of a consecutive sequence of parameters from b'. Once the trial candidate is built its fitness is assessed. If the trial is better than the original candidate then it takes its place. If it is also better than the best overall candidate it also replaces that.
The other strategies available are outlined in Qiang and Mitchell (2014) [3].
rand1rand2best1best2currenttobest1randtobest1
where the integers are chosen randomly from the interval [0, NP) with NP being the total population size and the original candidate having index i. The user can fully customize the generation of the trial candidates by supplying a callable to strategy.
To improve your chances of finding a global minimum use higher popsize values, with higher mutation and (dithering), but lower recombination values. This has the effect of widening the search radius, but slowing convergence.
By default the best solution vector is updated continuously within a single iteration (updating='immediate'). This is a modification [4] of the original differential evolution algorithm which can lead to faster convergence as trial vectors can immediately benefit from improved solutions. To use the original Storn and Price behaviour, updating the best solution once per iteration, set updating='deferred'. The 'deferred' approach is compatible with both parallelization and vectorization ('workers' and 'vectorized' keywords). These may improve minimization speed by using computer resources more efficiently. The 'workers' distribute calculations over multiple processors. By default the Python multiprocessing module is used, but other approaches are also possible, such as the Message Passing Interface (MPI) used on clusters [6] [7]. The overhead from these approaches (creating new Processes, etc) may be significant, meaning that computational speed doesn't necessarily scale with the number of processors used. Parallelization is best suited to computationally expensive objective functions. If the objective function is less expensive, then 'vectorized' may aid by only calling the objective function once per iteration, rather than multiple times for all the population members; the interpreter overhead is reduced.
Examples
Let us consider the problem of minimizing the Rosenbrock function. This function is implemented in `rosen` in `scipy.optimize`.import numpy as np from scipy.optimize import rosen, differential_evolution bounds = [(0,2), (0, 2), (0, 2), (0, 2), (0, 2)] result = differential_evolution(rosen, bounds)✓
result.x, result.fun
✗result = differential_evolution(rosen, bounds, updating='deferred', workers=2)✓
result.x, result.fun
✗from scipy.optimize import LinearConstraint, Bounds
✓lc = LinearConstraint([[1, 1]], -np.inf, 1.9)
✓bounds = Bounds([0., 0.], [2., 2.]) result = differential_evolution(rosen, bounds, constraints=lc, rng=1)✓
result.x, result.fun
✗def ackley(x): arg1 = -0.2 * np.sqrt(0.5 * (x[0] ** 2 + x[1] ** 2)) arg2 = 0.5 * (np.cos(2. * np.pi * x[0]) + np.cos(2. * np.pi * x[1])) return -20. * np.exp(arg1) - np.exp(arg2) + 20. + np.e bounds = [(-5, 5), (-5, 5)] result = differential_evolution(ackley, bounds, rng=1)✓
result.x, result.fun
✗result = differential_evolution( ackley, bounds, vectorized=True, updating='deferred', rng=1 )✓
result.x, result.fun
✗from functools import partial from scipy.optimize import minimize polish_func = partial(minimize, method="SLSQP") result = differential_evolution( ackley, bounds, vectorized=True, updating='deferred', rng=1, polish=polish_func )✓
result.x, result.fun
✗def custom_strategy_fn(candidate, population, rng=None): parameter_count = population.shape[-1] mutation, recombination = 0.7, 0.9 trial = np.copy(population[candidate]) fill_point = rng.choice(parameter_count) pool = np.arange(len(population)) rng.shuffle(pool) # two unique random numbers that aren't the same, and # aren't equal to candidate. idxs = [] while len(idxs) < 2 and len(pool) > 0: idx = pool[0] pool = pool[1:] if idx != candidate: idxs.append(idx) r0, r1 = idxs[:2] bprime = (population[0] + mutation * (population[r0] - population[r1])) crossovers = rng.uniform(size=parameter_count) crossovers = crossovers < recombination crossovers[fill_point] = True trial = np.where(crossovers, bprime, trial) return trial✓
Aliases
-
scipy.optimize.differential_evolution