ensmallen documentation
ensmallen is a library for function optimization. It is devoted to efficiently solving the problem
\[\operatorname{argmin} f(x)\]for many different types of $f(x)$. The documentation is split into four parts:
-
Function type documentation: how to implement the different types of functions $f(x)$ that ensmallen can handle
-
Optimizer documentation: how to use each of ensmallen’s optimizers to find $\operatorname{argmin} f(x)$
-
Callback documentation: documentation for ensmallen callbacks, which can be used to specify custom behavior during optimization
-
History/changelog: changes for each version of ensmallen
-
Example program: simple example program demonstrating ensmallen usage
Everything in ensmallen is in the ens
namespace so it is often useful to put a
using
directive in your code:
using namespace ens;
Citation
Please cite the following paper if you use ensmallen in your research and/or software. Citations are useful for the continued development and maintenance of the library.
- Ryan R. Curtin, Marcus Edel, Rahul Ganesh Prabhu, Suryoday Basak, Zhihao Lou, Conrad Sanderson.
The ensmallen library for flexible numerical optimization.
Journal of Machine Learning Research, Vol. 22, No. 166, 2021. (bibtex)
See also the following paper detailing the internal design of ensmallen:
- R.R. Curtin, M. Edel, R. Prabhu, S. Basak, Z. Lou, C. Sanderson. Flexible numerical optimization with ensmallen. arXiv preprint 2003.01403. (bibtex)
Contents
- Function type documentation
- Optimizer documentation
- ActiveCMAES
- AdaBelief
- AdaBound
- AdaDelta
- Adagrad
- AdaSqrt
- Adam
- AdaMax
- AMSBound
- AMSGrad
- Augmented Lagrangian
- Big Batch SGD
- Coordinate Descent (CD)
- CMAES
- CNE
- DE
- DemonAdam
- DemonSGD
- Eve
- Frank-Wolfe
- FTML (Follow the Moving Leader)
- Gradient Descent
- Grid Search
- Hogwild! (Parallel SGD)
- IQN
- Katyusha
- L-BFGS
- Lookahead
- LRSDP (low-rank SDP solver)
- Momentum SGD
- Nadam
- NadaMax
- Nesterov Momentum SGD
- MOEA/D-DE
- NSGA2
- OptimisticAdam
- Padam
- PSO
- Primal-dual SDP Solver
- Quasi-Hyperbolic Momentum Update SGD (QHSGD)
- QHAdam
- RMSProp
- Simulated Annealing (SA)
- Simultaneous Perturbation Stochastic Approximation (SPSA)
- Stochastic Recursive Gradient Algorithm (SARAH/SARAH+)
- Standard SGD
- Stochastic Gradient Descent with Restarts (SGDR)
- Snapshot Stochastic Gradient Descent with Restarts (SnapshotSGDR)
- SMORMS3
- Standard stochastic variance reduced gradient (SVRG)
- SPALeRA Stochastic Gradient Descent (SPALeRASGD)
- SWATS
- WNGrad
- Yogi
- Callback documentation
- History/changelog
- ensmallen 2.21.1: “Bent Antenna”
- ensmallen 2.21.0: “Bent Antenna”
- ensmallen 2.20.0: “Stripped Bolt Head”
- ensmallen 2.19.1: “Eight Ball Deluxe”
- ensmallen 2.19.0: “Eight Ball Deluxe”
- ensmallen 2.18.2: “Fairmount Bagel”
- ensmallen 2.18.1: “Fairmount Bagel”
- ensmallen 2.18.0: “Fairmount Bagel”
- ensmallen 2.17.0: “Pachis Din Me Pesa Double”
- ensmallen 2.16.2: “Severely Dented Can Of Polyurethane”
- ensmallen 2.16.1: “Severely Dented Can Of Polyurethane”
- ensmallen 2.16.0: “Severely Dented Can Of Polyurethane”
- ensmallen 2.15.1: “Why Can’t I Manage To Grow Any Plants?”
- ensmallen 2.15.0: “Why Can’t I Manage To Grow Any Plants?”
- ensmallen 2.14.2: “No Direction Home”
- ensmallen 2.14.1: “No Direction Home”
- ensmallen 2.14.0: “No Direction Home”
- ensmallen 2.13.0: “Automatically Automated Automation”
- ensmallen 2.12.1: “Stir Crazy”
- ensmallen 2.12.0: “Stir Crazy”
- ensmallen 2.11.5: “The Poster Session Is Full”
- ensmallen 2.11.4: “The Poster Session Is Full”
- ensmallen 2.11.3: “The Poster Session Is Full”
- ensmallen 2.11.2: “The Poster Session Is Full”
- ensmallen 2.11.1: “The Poster Session Is Full”
- ensmallen 2.11.0: “The Poster Session Is Full”
- ensmallen 2.10.5: “Fried Chicken”
- ensmallen 2.10.4: “Fried Chicken”
- ensmallen 2.10.3: “Fried Chicken”
- ensmallen 2.10.2: “Fried Chicken”
- ensmallen 2.10.1: “Fried Chicken”
- ensmallen 2.10.0: “Fried Chicken”
- ensmallen 1.16.2: “Loud Alarm Clock”
- ensmallen 1.16.1: “Loud Alarm Clock”
- ensmallen 1.16.0: “Loud Alarm Clock”
- ensmallen 1.15.1: “Wrong Side Of The Road”
- ensmallen 1.15.0: “Wrong Side Of The Road”
- ensmallen 1.14.4: “Difficult Crimp”
- ensmallen 1.14.3: “Difficult Crimp”
- ensmallen 1.14.2: “Difficult Crimp”
- ensmallen 1.14.1: “Difficult Crimp”
- ensmallen 1.14.0: “Difficult Crimp”
- ensmallen 1.13.2: “Coronavirus Invasion”
- ensmallen 1.13.1: “Coronavirus Invasion”
- ensmallen 1.13.0: “Coronavirus Invasion”
- ensmallen 1.12.2: “New Year’s Party”
- ensmallen 1.12.1: “New Year’s Party”
- ensmallen 1.12.0: “New Year’s Party”
- ensmallen 1.11.1: “Jet Lag”
- ensmallen 1.11.0: “Jet Lag”
- ensmallen 1.10.1: “Corporate Catabolism”
- ensmallen 1.10.0: “Corporate Catabolism”
Function type documentation
Arbitrary functions
The least restrictive type of function that can be implemented in ensmallen is a function for which only the objective can be evaluated. For this, a class with the following API must be implemented:
Click to collapse/expand example code.
class ArbitraryFunctionType
{
public:
// This should return f(x).
double Evaluate(const arma::mat& x);
};
For this type of function, we assume that the gradient f'(x)
is not
computable. If it is, see differentiable functions.
The Evaluate()
method is allowed to have additional cv-modifiers (static
,
const
, etc.).
The following optimizers can be used to optimize an arbitrary function:
Each of these optimizers has an Optimize()
function that is called as
Optimize(f, x)
where f
is the function to be optimized (which implements
Evaluate()
) and x
holds the initial point of the optimization. After
Optimize()
is called, x
will hold the final result of the optimization
(that is, the best x
found that minimizes f(x)
).
Example: squared function optimization
An example program that implements the objective function f(x) = 2 |x|^2 is shown below, using the simulated annealing optimizer.
Click to collapse/expand example code.
#include <ensmallen.hpp>
class SquaredFunction
{
public:
// This returns f(x) = 2 |x|^2.
double Evaluate(const arma::mat& x)
{
return 2 * std::pow(arma::norm(x), 2.0);
}
};
int main()
{
// The minimum is at x = [0 0 0]. Our initial point is chosen to be
// [1.0, -1.0, 1.0].
arma::mat x("1.0 -1.0 1.0");
// Create simulated annealing optimizer with default options.
// The ens::SA<> type can be replaced with any suitable ensmallen optimizer
// that is able to handle arbitrary functions.
ens::SA<> optimizer;
SquaredFunction f; // Create function to be optimized.
optimizer.Optimize(f, x);
std::cout << "Minimum of squared function found with simulated annealing is "
<< x;
}
Differentiable functions
Probably the most common type of function that can be optimized with ensmallen is a differentiable function, where both f(x) and f’(x) can be calculated. To optimize a differentiable function with ensmallen, a class must be implemented that follows the API below:
Click to collapse/expand example code.
class DifferentiableFunctionType
{
public:
// Given parameters x, return the value of f(x).
double Evaluate(const arma::mat& x);
// Given parameters x and a matrix g, store f'(x) in the provided matrix g.
// g should have the same size (rows, columns) as x.
void Gradient(const arma::mat& x, arma::mat& gradient);
// OPTIONAL: this may be implemented in addition to---or instead
// of---Evaluate() and Gradient(). If this is the only function implemented,
// implementations of Evaluate() and Gradient() will be automatically
// generated using template metaprogramming. Often, implementing
// EvaluateWithGradient() can result in more efficient optimizations.
//
// Given parameters x and a matrix g, return the value of f(x) and store
// f'(x) in the provided matrix g. g should have the same size (rows,
// columns) as x.
double EvaluateWithGradient(const arma::mat& x, arma::mat& g);
};
Note that you may implement either Evaluate()
and Gradient()
or
EvaluateWithGradient()
, but it is not mandatory to implement both. (Of
course, supplying both is okay too.) It often results in faster code when
EvaluateWithGradient()
is implemented, especially for functions where f(x)
and f’(x) compute some of the same intermediate quantities.
Each of the implemented methods is allowed to have additional cv-modifiers
(static
, const
, etc.).
The following optimizers can be used with differentiable functions:
- L-BFGS (
ens::L_BFGS
) - FrankWolfe (
ens::FrankWolfe
) - GradientDescent (
ens::GradientDescent
) - Any optimizer for arbitrary functions
Each of these optimizers has an Optimize()
function that is called as
Optimize(f, x)
where f
is the function to be optimized and x
holds the
initial point of the optimization. After Optimize()
is called, x
will hold
the final result of the optimization (that is, the best x
found that
minimizes f(x)
).
An example program is shown below. In this program, we optimize a linear
regression model. In this setting, we have some matrix data
of data points
that we’ve observed, and some vector responses
of the observed responses to
this data. In our model, we assume that each response is the result of a
weighted linear combination of the data:
where $x$ is a vector of parameters. This gives the objective function $f(x) = (\operatorname{responses} - x’\operatorname{data})^2$.
In the example program, we optimize this objective function and compare the
runtime of an implementation that uses Evaluate()
and Gradient()
, and the
runtime of an implementation that uses EvaluateWithGradient()
.
Click to collapse/expand example code.
#include <ensmallen.hpp>
// Define a differentiable objective function by implementing both Evaluate()
// and Gradient() separately.
class LinearRegressionFunction
{
public:
// Construct the object with the given data matrix and responses.
LinearRegressionFunction(const arma::mat& dataIn,
const arma::rowvec& responsesIn) :
data(dataIn), responses(responsesIn) { }
// Return the objective function for model parameters x.
double Evaluate(const arma::mat& x)
{
return std::pow(arma::norm(responses - x.t() * data), 2.0);
}
// Compute the gradient for model parameters x.
void Gradient(const arma::mat& x, arma::mat& g)
{
g = -2 * data * (responses - x.t() * data);
}
private:
// The data.
const arma::mat& data;
// The responses to each data point.
const arma::rowvec& responses;
};
// Define the same function, but only implement EvaluateWithGradient().
class LinearRegressionEWGFunction
{
public:
// Construct the object with the given data matrix and responses.
LinearRegressionEWGFunction(const arma::mat& dataIn,
const arma::rowvec& responsesIn) :
data(dataIn), responses(responsesIn) { }
// Simultaneously compute both the objective function and gradient for model
// parameters x. Note that this is faster than implementing Evaluate() and
// Gradient() individually because it caches the computation of
// (responses - x.t() * data)!
double EvaluateWithGradient(const arma::mat& x, arma::mat& g)
{
const arma::rowvec v = (responses - x.t() * data);
g = -2 * data * v;
return arma::accu(v % v); // equivalent to \| v \|^2
}
};
int main()
{
// We'll run a simple speed comparison between both objective functions.
// First, generate some random data, with 10000 points and 10 dimensions.
// This data has no pattern and as such will make a model that's not very
// useful---but the purpose here is just demonstration. :)
//
// For a more "real world" situation, load a dataset from file using X.load()
// and y.load() (but make sure the matrix is column-major, so that each
// observation/data point corresponds to a *column*, *not* a row.
arma::mat data(10, 10000, arma::fill::randn);
arma::rowvec responses(10000, arma::fill::randn);
// Create a starting point for our optimization randomly. The model has 10
// parameters, so the shape is 10x1.
arma::mat startingPoint(10, 1, arma::fill::randn);
// We'll use Armadillo's wall_clock class to do a timing comparison.
arma::wall_clock clock;
// Construct the first objective function.
LinearRegressionFunction lrf1(data, responses);
arma::mat lrf1Params(startingPoint);
// Create the L_BFGS optimizer with default parameters.
// The ens::L_BFGS type can be replaced with any ensmallen optimizer that can
// handle differentiable functions.
ens::L_BFGS lbfgs;
// Time how long L-BFGS takes for the first Evaluate() and Gradient()
// objective function.
clock.tic();
lbfgs.Optimize(lrf1, lrf1Params);
const double time1 = clock.toc();
std::cout << "LinearRegressionFunction with Evaluate() and Gradient() took "
<< time1 << " seconds to converge to the model: " << std::endl;
std::cout << lrf1Params.t();
// Create the second objective function, which uses EvaluateWithGradient().
LinearRegressionEWGFunction lrf2(data, responses);
arma::mat lrf2Params(startingPoint);
// Time how long L-BFGS takes for the EvaluateWithGradient() objective
// function.
clock.tic();
lbfgs.Optimize(lrf2, lrf2Params);
const double time2 = clock.toc();
std::cout << "LinearRegressionEWGFunction with EvaluateWithGradient() took "
<< time2 << " seconds to converge to the model: " << std::endl;
std::cout << lrf2Params.t();
// When this runs, the output parameters will be exactly on the same, but the
// LinearRegressionEWGFunction will run more quickly!
}
Partially differentiable functions
Some differentiable functions have the additional property that the gradient
f'(x)
can be decomposed along a different axis j
such that the gradient is
sparse. This makes the most sense in machine learning applications where the
function f(x)
is being optimized for some dataset data
, which has d
dimensions. A partially differentiable separable function is partially
differentiable with respect to each dimension j
of data
. This property is
useful for coordinate descent type algorithms.
To use ensmallen optimizers to minimize these types of functions, only two functions needs to be added to the differentiable function type:
Click to collapse/expand example code.
// Compute the partial gradient f'_j(x) with respect to data coordinate j and
// store it in the sparse matrix g.
void Gradient(const arma::mat& x, const size_t j, arma::sp_mat& g);
// Get the number of features that f(x) can be partially differentiated with.
size_t NumFeatures();
Note: many partially differentiable function optimizers do not require a
regular implementation of the Gradient()
, so that function may be omitted.
If these functions are implemented, the following partially differentiable function optimizers can be used:
Arbitrary separable functions
Often, an objective function f(x)
may be represented as the sum of many
functions:
f(x) = f_0(x) + f_1(x) + ... + f_N(x).
In this function type, we assume the gradient f'(x)
is not computable. If it
is, see differentiable separable functions.
For machine learning tasks, the objective function may be, e.g., the sum of a function taken across many data points. Implementing an arbitrary separable function type in ensmallen is similar to implementing an arbitrary objective function, but with a few extra utility methods:
Click to collapse/expand example code.
class ArbitrarySeparableFunctionType
{
public:
// Given parameters x, return the sum of the individual functions
// f_i(x) + ... + f_{i + batchSize - 1}(x). i will always be greater than 0,
// and i + batchSize will be less than or equal to the value of NumFunctions().
double Evaluate(const arma::mat& x, const size_t i, const size_t batchSize);
// Shuffle the ordering of the functions f_i(x).
// (For machine learning problems, this would be equivalent to shuffling the
// data points, e.g., before an epoch of learning.)
void Shuffle();
// Get the number of functions f_i(x).
// (For machine learning problems, this is often just the number of points in
// the dataset.)
size_t NumFunctions();
};
Each of the implemented methods is allowed to have additional cv-modifiers
(static
, const
, etc.).
The following optimizers can be used with arbitrary separable functions:
Each of these optimizers has an Optimize()
function that is called as
Optimize(f, x)
where f
is the function to be optimized and x
holds the
initial point of the optimization. After Optimize()
is called, x
will hold
the final result of the optimization (that is, the best x
found that
minimizes f(x)
).
Note: using an arbitrary non-separable function optimizer will call
Evaluate(x, 0, NumFunctions() - 1)
; if this is a very computationally
intensive operation for your objective function, it may be best to avoid using
a non-separable arbitrary function optimizer.
Note: if possible, it’s often better to try and use a gradient-based approach. See differentiable separable functions for separable f(x) where the gradient f’(x) can be computed.
The example program below demonstrates the implementation and use of an
arbitrary separable function. The function used is the linear regression
objective function, described in differentiable functions.
Given some dataset data
and responses responses
, the linear regression
objective function is separable as
where $\operatorname{data}(i)$ represents the data point indexed by $i$ and $\operatorname{responses}(i)$ represents the observed response indexed by $i$.
Click to collapse/expand example code.
#include <ensmallen.hpp>
// This class implements the linear regression objective function as an
// arbitrary separable function type.
class LinearRegressionFunction
{
public:
// Create the linear regression function with the given data and the given
// responses.
LinearRegressionFunction(const arma::mat& dataIn,
const arma::rowvec& responsesIn) :
data(data), responses(responses) { }
// Given parameters x, compute the sum of the separable objective
// functions starting with f_i(x) and ending with
// f_{i + batchSize - 1}(x).
double Evaluate(const arma::mat& x, const size_t i, const size_t batchSize)
{
// A more complex implementation could avoid the for loop and use
// submatrices, but it is easier to understand when implemented this way.
double objective = 0.0;
for (size_t j = i; j < i + batchSize; ++j)
{
objective += std::pow(responses[j] - x.t() * data.col(j), 2.0);
}
}
// Shuffle the ordering of the functions f_i(x). We do this by simply
// shuffling the data and responses.
void Shuffle()
{
// Generate a random ordering of data points.
arma::uvec ordering = arma::shuffle(
arma::linspace<arma::uvec>(0, data.n_cols - 1, data.n_cols));
// This reorders the data and responses with our randomly-generated
// ordering above.
data = data.cols(ordering);
responses = responses.cols(ordering);
}
// Return the number of functions f_i(x). In our case this is simply the
// number of data points.
size_t NumFunctions() { return data.n_cols; }
};
int main()
{
// First, generate some random data, with 10000 points and 10 dimensions.
// This data has no pattern and as such will make a model that's not very
// useful---but the purpose here is just demonstration. :)
//
// For a more "real world" situation, load a dataset from file using X.load()
// and y.load() (but make sure the matrix is column-major, so that each
// observation/data point corresponds to a *column*, *not* a row.
arma::mat data(10, 10000, arma::fill::randn);
arma::rowvec responses(10000, arma::fill::randn);
// Create a starting point for our optimization randomly. The model has 10
// parameters, so the shape is 10x1.
arma::mat params(10, 1, arma::fill::randn);
// Use the CMAES optimizer with default parameters to minimize the
// LinearRegressionFunction.
// The ens::CMAES type can be replaced with any suitable ensmallen optimizer
// that can handle arbitrary separable functions.
ens::CMAES cmaes;
LinearRegressionFunction lrf(data, responses);
cmaes.Optimize(lrf, params);
std::cout << "The optimized linear regression model found by CMAES has the "
<< "parameters " << params.t();
}
Differentiable separable functions
Likely the most important type of function to be optimized in machine learning and some other fields is the differentiable separable function. A differentiable separable function f(x) can be represented as the sum of many functions:
f(x) = f_0(x) + f_1(x) + ... + f_N(x).
And it also has a computable separable gradient f'(x)
:
f'(x) = f'_0(x) + f'_1(x) + ... + f'_N(x).
For machine learning tasks, the objective function may be, e.g., the sum of a function taken across many data points. Implementing a differentiable separable function type in ensmallen is similar to implementing an ordinary differentiable function, but with a few extra utility methods:
Click to collapse/expand example code.
class ArbitrarySeparableFunctionType
{
public:
// Given parameters x, return the sum of the individual functions
// f_i(x) + ... + f_{i + batchSize - 1}(x). i will always be greater than 0,
// and i + batchSize will be less than or equal to the value of NumFunctions().
double Evaluate(const arma::mat& x, const size_t i, const size_t batchSize);
// Given parameters x and a matrix g, store the sum of the gradient of
// individual functions f'_i(x) + ... + f'_{i + batchSize - 1}(x) into g. i
// will always be greater than 0, and i + batchSize will be less than or
// equal to the value of NumFunctions().
void Gradient(const arma::mat& x,
const size_t i,
arma::mat& g,
const size_t batchSize);
// Shuffle the ordering of the functions f_i(x).
// (For machine learning problems, this would be equivalent to shuffling the
// data points, e.g., before an epoch of learning.)
void Shuffle();
// Get the number of functions f_i(x).
// (For machine learning problems, this is often just the number of points in
// the dataset.)
size_t NumFunctions();
// OPTIONAL: this may be implemented in addition to---or instead
// of---Evaluate() and Gradient(). If this is the only function implemented,
// implementations of Evaluate() and Gradient() will be automatically
// generated using template metaprogramming. Often, implementing
// EvaluateWithGradient() can result in more efficient optimizations.
//
// Given parameters x and a matrix g, return the sum of the individual
// functions f_i(x) + ... + f_{i + batchSize - 1}(x), and store the sum of
// the gradient of individual functions f'_i(x) + ... +
// f'_{i + batchSize - 1}(x) into the provided matrix g. g should have the
// same size (rows, columns) as x. i will always be greater than 0, and i +
// batchSize will be less than or equal to the value of NumFunctions().
double EvaluateWithGradient(const arma::mat& x,
const size_t i,
arma::mat& g,
const size_t batchSize);
};
Note that you may implement either Evaluate()
and Gradient()
or
EvaluateWithGradient()
, but it is not mandatory to implement both. (Of
course, supplying both is okay too.) It often results in faster code when
EvaluateWithGradient()
is implemented, especially for functions where f(x)
and f’(x) compute some of the same intermediate quantities.
Each of the implemented methods is allowed to have additional cv-modifiers
(static
, const
, etc.).
The following optimizers can be used with differentiable separable functions:
- AdaBelief
- AdaBound
- AdaDelta
- AdaGrad
- AdaSqrt
- Adam
- AdaMax
- AMSBound
- AMSGrad
- Big Batch SGD
- Eve
- FTML
- IQN
- Katyusha
- Lookahead
- Momentum SGD
- Nadam
- NadaMax
- NesterovMomentumSGD
- OptimisticAdam
- QHAdam
- QHSGD
- RMSProp
- SARAH/SARAH+
- SGD
- Stochastic Gradient Descent with Restarts (SGDR)
- Snapshot SGDR
- SMORMS3
- SPALeRA
- SWATS
- SVRG
- WNGrad
The example program below demonstrates the implementation and use of an
arbitrary separable function. The function used is the linear regression
objective function, described in differentiable functions.
Given some dataset data
and responses responses
, the linear regression
objective function is separable as
f_i(x) = (responses(i) - x' * data(i))^2
where data(i)
represents the data point indexed by i
and responses(i)
represents the observed response indexed by i
. This example implementation
only implements EvaluateWithGradient()
in order to avoid redundant
calculations.
Click to collapse/expand example code.
#include <ensmallen.hpp>
// This class implements the linear regression objective function as an
// arbitrary separable function type.
class LinearRegressionFunction
{
public:
// Create the linear regression function with the given data and the given
// responses.
LinearRegressionFunction(const arma::mat& dataIn,
const arma::rowvec& responsesIn) :
data(data), responses(responses) { }
// Given parameters x, compute the sum of the separable objective
// functions starting with f_i(x) and ending with
// f_{i + batchSize - 1}(x), and also compute the gradient of those functions
// and store them in g.
double EvaluateWithGradient(const arma::mat& x,
const size_t i,
arma::mat& g,
const size_t batchSize)
{
// This slightly complex implementation uses Armadillo submatrices to
// compute the objective functions and gradients simultaneously for
// multiple points.
//
// The shared computation between the objective and gradient is the term
// (response - x * data) so we compute that first, only for points in the
// batch.
const arma::rowvec v = (responses.cols(i, i + batchSize - 1) - x.t() *
data.cols(i, i + batchSize - 1));
g = -2 * data.cols(i, i + batchSize - 1) * v;
return arma::accu(v % v); // equivalent to |v|^2
}
// Shuffle the ordering of the functions f_i(x). We do this by simply
// shuffling the data and responses.
void Shuffle()
{
// Generate a random ordering of data points.
arma::uvec ordering = arma::shuffle(
arma::linspace<arma::uvec>(0, data.n_cols - 1, data.n_cols));
// This reorders the data and responses with our randomly-generated
// ordering above.
data = data.cols(ordering);
responses = responses.cols(ordering);
}
// Return the number of functions f_i(x). In our case this is simply the
// number of data points.
size_t NumFunctions() { return data.n_cols; }
};
int main()
{
// First, generate some random data, with 10000 points and 10 dimensions.
// This data has no pattern and as such will make a model that's not very
// useful---but the purpose here is just demonstration. :)
//
// For a more "real world" situation, load a dataset from file using X.load()
// and y.load() (but make sure the matrix is column-major, so that each
// observation/data point corresponds to a *column*, *not* a row.
arma::mat data(10, 10000, arma::fill::randn);
arma::rowvec responses(10000, arma::fill::randn);
// Create a starting point for our optimization randomly. The model has 10
// parameters, so the shape is 10x1.
arma::mat params(10, 1, arma::fill::randn);
// Use RMSprop to find the best parameters for the linear regression model.
// The type 'ens::RMSprop' can be changed for any ensmallen optimizer able to
// handle differentiable separable functions.
ens::RMSProp rmsprop;
LinearRegressionFunction lrf(data, responses);
rmsprop.Optimize(lrf, params);
std::cout << "The optimized linear regression model found by RMSprop has the"
<< " parameters " << params.t();
}
Sparse differentiable separable functions
Some differentiable separable functions have the additional property that
the gradient f'_i(x)
is sparse. When this is true, one additional method can
be implemented as part of the class to be optimized:
Click to collapse/expand example code.
// Add this definition to use sparse differentiable separable function
// optimizers. Given x, store the sum of the sparse gradient f'_i(x) + ... +
// f'_{i + batchSize - 1}(x) into the provided matrix g.
void Gradient(const arma::mat& x,
const size_t i,
arma::sp_mat& g,
const size_t batchSize);
It’s also possible to instead use templates to provide only one Gradient()
function for both sparse and non-sparse optimizers:
Click to collapse/expand example code.
// This provides Gradient() for both sparse and non-sparse optimizers.
template<typename GradType>
void Gradient(const arma::mat& x,
const size_t i,
GradType& g,
const size_t batchSize);
If either of these methods are available, then any ensmallen optimizer that optimizes sparse separable differentiable functions may be used. This includes:
- Hogwild! (Parallel SGD)
Categorical functions
A categorical function is a function f(x) where some of the values of x are
categorical variables (i.e. they take integer values [0, c - 1] and each value
0, 1, …, c - 1 is unrelated). In this situation, the function is not
differentiable, and so only the objective f(x) can be implemented. Therefore
the class requirements for a categorical function are exactly the same as for
an ArbitraryFunctionType
—but for any categorical dimension x_i
in x
, the
value will be in the range [0, c_i - 1] where c_i
is the number of categories
in dimension x_i
.
Click to collapse/expand example code.
class CategoricalFunction
{
public:
// Return the objective function for the given parameters x.
double Evaluate(const arma::mat& x);
};
However, when an optimizer’s Optimize() method is called, two additional parameters must be specified, in addition to the function to optimize and the matrix holding the parameters:
-
const std::vector<bool>& categoricalDimensions
: a vector of length equal to the number of elements inx
(the number of dimensions). If an element is true, then the dimension is categorical. -
const arma::Row<size_t>& numCategories
: a vector of length equal to the number of elements inx
(the number of dimensions). If a dimension is categorical, then the corresponding value innumCategories
should hold the number of categories in that dimension.
The following optimizers can be used in this way to optimize a categorical function:
- Grid Search (all parameters must be categorical)
An example program showing usage of categorical optimization is shown below.
Click to collapse/expand example code.
#include <ensmallen.hpp>
// An implementation of a simple categorical function. The parameters can be
// understood as x = [c1 c2 c3]. When c1 = 0, c2 = 2, and c3 = 1, the value of
// f(x) is 0. In any other case, the value of f(x) is 10. Therefore, the
// optimum is found at [0, 2, 1].
class SimpleCategoricalFunction
{
public:
// Return the objective function f(x) as described above.
double Evaluate(const arma::mat& x)
{
if (size_t(x[0]) == 0 &&
size_t(x[1]) == 2 &&
size_t(x[2]) == 1)
return 0.0;
else
return 10.0;
}
};
int main()
{
// Create and optimize the categorical function with the GridSearch
// optimizer. We must also create a std::vector<bool> that holds the types
// of each dimension, and an arma::Row<size_t> that holds the number of
// categories in each dimension.
SimpleCategoricalFunction c;
// We have three categorical dimensions only.
std::vector<bool> categoricalDimensions;
categoricalDimensions.push_back(true);
categoricalDimensions.push_back(true);
categoricalDimensions.push_back(true);
// The first category can take 5 values; the second can take 3; the third can
// take 12.
arma::Row<size_t> numCategories("5 3 12");
// The initial point for our optimization will be to set all categories to 0.
arma::mat params("0 0 0");
// Now create the GridSearch optimizer with default parameters, and run the
// optimization.
// The ens::GridSearch type can be replaced with any ensmallen optimizer that
// is able to handle categorical functions.
ens::GridSearch gs;
gs.Optimize(c, params, categoricalDimensions, numCategories);
std::cout << "The ens::GridSearch optimizer found the optimal parameters to "
<< "be " << params;
}
Multi-objective functions
A multi-objective optimizer does not return just one set of coordinates at the minimum of all objective functions, but instead finds a front or frontier of possible coordinates that are Pareto-optimal (that is, no individual objective function’s value can be reduced without increasing at least one other objective function).
In order to optimize a multi-objective function with ensmallen, a std::tuple<>
containing multiple ArbitraryFunctionType
s (see here)
should be passed to a multi-objective optimizer’s Optimize()
function.
An example below simultaneously optimizes the generalized Rosenbrock function in 6 dimensions and the Wood function using NSGA2.
Click to collapse/expand example code.
GeneralizedRosenbrockFunction rf(6);
WoodFunction wf;
std::tuple<GeneralizedRosenbrockFunction, WoodFunction> objectives(rf, wf);
// Create an initial point (a random point in 6 dimensions).
arma::mat coordinates(6, 1, arma::fill::randu);
// `coordinates` will be set to the coordinates on the best front that minimize the
// sum of objective functions, and `bestFrontSum` will be the sum of all objectives
// at that coordinate set.
NSGA2 nsga;
double bestFrontSum = nsga.Optimize(objectives, coordinates);
// Set `bestFront` to contain all of the coordinates on the best front.
arma::cube bestFront = optimizer.ParetoFront();
}
Note: all multi-objective function optimizers have both the function Optimize()
to find the
best front, and also the function ParetoFront()
to return all sets of solutions that are on the
front.
The following optimizers can be used with multi-objective functions:
Constrained functions
A constrained function is an objective function f(x)
that is also subject to
some constraints on x
. (For instance, perhaps a constraint could be that x
is a positive semidefinite matrix.) ensmallen is able to handle differentiable
objective functions of this type—so, f'(x)
must also be computable. Given
some set of constraints c_0(x)
, …, c_M(x)
, we can re-express our constrained
objective function as
f_C(x) = f(x) + c_0(x) + ... + c_M(x)
where the (soft) constraint c_i(x)
is a positive value if it is not satisfied, and
0
if it is satisfied. The soft constraint c_i(x)
should take some value
representing how far from a feasible solution x
is. It should be
differentiable, since ensmallen’s constrained optimizers will use the gradient
of the constraint to find a feasible solution.
In order to optimize a constrained function with ensmallen, a class implementing the API below is required.
Click to collapse/expand example code.
class ConstrainedFunctionType
{
public:
// Return the objective function f(x) for the given x.
double Evaluate(const arma::mat& x);
// Compute the gradient of f(x) for the given x and store the result in g.
void Gradient(const arma::mat& x, arma::mat& g);
// Get the number of constraints on the objective function.
size_t NumConstraints();
// Evaluate constraint i at the parameters x. If the constraint is
// unsatisfied, a value greater than 0 should be returned. If the constraint
// is satisfied, 0 should be returned. The optimizer will add this value to
// its overall objective that it is trying to minimize.
double EvaluateConstraint(const size_t i, const arma::mat& x);
// Evaluate the gradient of constraint i at the parameters x, storing the
// result in the given matrix g. If the constraint is not satisfied, the
// gradient should be set in such a way that the gradient points in the
// direction where the constraint would be satisfied.
void GradientConstraint(const size_t i, const arma::mat& x, arma::mat& g);
};
A constrained function can be optimized with the following optimizers:
Semidefinite programs
A special class of constrained function is a semidefinite program. ensmallen
has support for creating and optimizing semidefinite programs via the
ens::SDP<>
class. For this, the SDP must be expressed in the primal form:
min_x dot(C, x) such that
dot(A_i, x) = b_i, for i = 1, ..., M;
x >= 0
In this case, A_i
and C
are square matrices (sparse or dense), and b_i
is
a scalar.
Once the problem is expressed in this form, a class of type ens::SDP<>
can be
created. SDP<arma::mat>
indicates an SDP with a dense C, and
SDP<arma::sp_mat>
indicates an SDP with a sparse C. The class has the
following useful methods:
SDP(cMatrixSize, numDenseConstraints, numSparseConstraints)
: create a newSDP
size_t NumSparseConstraints()
: get number of sparse constraint matrices A_isize_t NumDenseConstraints()
: get number of dense constraint matrices A_istd::vector<arma::mat>& DenseA()
: get vector of dense A_i matricesstd::vector<arma::sp_mat>& SparseA()
: get vector of sparse A_i matricesarma::vec& DenseB()
: get vector of b_i values for dense A_i constraintsarma::vec& SparseB()
: get vector of b_i values for sparse A_i constraints
Once these methods are used to set each A_i matrix and corresponding b_i value, and C objective matrix, the SDP object can be used with any ensmallen SDP solver. The list of SDP solvers is below:
Example code showing how to solve an SDP is given below.
Click to collapse/expand example code.
int main()
{
// We will build a toy semidefinite program and then use the PrimalDualSolver to find a solution
// The semi-definite constraint looks like:
//
// [ 1 x_12 x_13 0 0 0 0 ]
// [ 1 x_23 0 0 0 0 ]
// [ 1 0 0 0 0 ]
// [ s1 0 0 0 ] >= 0
// [ s2 0 0 ]
// [ s3 0 ]
// [ s4 ]
// x_11 == 0
arma::sp_mat A0(7, 7); A0.zeros();
A0(0, 0) = 1.;
// x_22 == 0
arma::sp_mat A1(7, 7); A1.zeros();
A1(1, 1) = 1.;
// x_33 == 0
arma::sp_mat A2(7, 7); A2.zeros();
A2(2, 2) = 1.;
// x_12 <= -0.1 <==> x_12 + s1 == -0.1, s1 >= 0
arma::sp_mat A3(7, 7); A3.zeros();
A3(1, 0) = A3(0, 1) = 1.; A3(3, 3) = 2.;
// -0.2 <= x_12 <==> x_12 - s2 == -0.2, s2 >= 0
arma::sp_mat A4(7, 7); A4.zeros();
A4(1, 0) = A4(0, 1) = 1.; A4(4, 4) = -2.;
// x_23 <= 0.5 <==> x_23 + s3 == 0.5, s3 >= 0
arma::sp_mat A5(7, 7); A5.zeros();
A5(2, 1) = A5(1, 2) = 1.; A5(5, 5) = 2.;
// 0.4 <= x_23 <==> x_23 - s4 == 0.4, s4 >= 0
arma::sp_mat A6(7, 7); A6.zeros();
A6(2, 1) = A6(1, 2) = 1.; A6(6, 6) = -2.;
std::vector<arma::sp_mat> ais({A0, A1, A2, A3, A4, A5, A6});
SDP<arma::sp_mat> sdp(7, 7 + 4 + 4 + 4 + 3 + 2 + 1, 0);
for (size_t j = 0; j < 3; j++)
{
// x_j4 == x_j5 == x_j6 == x_j7 == 0
for (size_t i = 0; i < 4; i++)
{
arma::sp_mat A(7, 7); A.zeros();
A(i + 3, j) = A(j, i + 3) = 1;
ais.emplace_back(A);
}
}
// x_45 == x_46 == x_47 == 0
for (size_t i = 0; i < 3; i++)
{
arma::sp_mat A(7, 7); A.zeros();
A(i + 4, 3) = A(3, i + 4) = 1;
ais.emplace_back(A);
}
// x_56 == x_57 == 0
for (size_t i = 0; i < 2; i++)
{
arma::sp_mat A(7, 7); A.zeros();
A(i + 5, 4) = A(4, i + 5) = 1;
ais.emplace_back(A);
}
// x_67 == 0
arma::sp_mat A(7, 7); A.zeros();
A(6, 5) = A(5, 6) = 1;
ais.emplace_back(A);
std::swap(sdp.SparseA(), ais);
sdp.SparseB().zeros();
sdp.SparseB()[0] = sdp.SparseB()[1] = sdp.SparseB()[2] = 1.;
sdp.SparseB()[3] = -0.2; sdp.SparseB()[4] = -0.4;
sdp.SparseB()[5] = 1.; sdp.SparseB()[6] = 0.8;
sdp.C().zeros();
sdp.C()(0, 2) = sdp.C()(2, 0) = 1.;
// That took a long time but we finally set up the problem right! Now we can
// use the PrimalDualSolver to solve it.
// ens::PrimalDualSolver could be replaced with ens::LRSDP or other ensmallen
// SDP solvers.
PrimalDualSolver solver;
arma::mat X, Z;
arma::vec ysparse, ydense;
// ysparse, ydense, and Z hold the primal and dual variables found during the
// optimization.
const double obj = solver.Optimize(sdp, X, ysparse, ydense, Z);
std::cout << "SDP optimized with objective " << obj << "." << std::endl;
}
Alternate matrix types
All of the examples above (and throughout the rest of the documentation)
generally assume that the matrix being optimized has type arma::mat
. But
ensmallen’s optimizers are capable of optimizing more types than just dense
Armadillo matrices. In fact, the full signature of each optimizer’s
Optimize()
method is this:
template<typename FunctionType, typename MatType>
typename MatType::elem_type Optimize(FunctionType& function,
MatType& coordinates);
The return type, typename MatType::elem_type
, is just the numeric type held by
the given matrix type. So, for arma::mat
, the return type is just double
.
In addition, optimizers for differentiable functions have a third template
parameter, GradType
, which specifies the type of the gradient. GradType
can
be manually specified in the situation where, e.g., a sparse gradient is
desired.
It is easy to write a function to optimize, e.g., an arma::fmat
. Here is an
example, adapted from the SquaredFunction
example from the
arbitrary function documentation.
Click to collapse/expand example code.
#include <ensmallen.hpp>
class SquaredFunction
{
public:
// This returns f(x) = 2 |x|^2.
float Evaluate(const arma::fmat& x)
{
return 2 * std::pow(arma::norm(x), 2.0);
}
void Gradient(const arma::fmat& x, arma::fmat& gradient)
{
gradient = 4 * x;
}
};
int main()
{
// The minimum is at x = [0 0 0]. Our initial point is chosen to be
// [1.0, -1.0, 1.0].
arma::fmat x("1.0 -1.0 1.0");
// Create simulated annealing optimizer with default options.
// The ens::SA<> type can be replaced with any suitable ensmallen optimizer
// that is able to handle arbitrary functions.
ens::L_BFGS optimizer;
SquaredFunction f; // Create function to be optimized.
optimizer.Optimize(f, x); // The optimizer will infer arma::fmat!
std::cout << "Minimum of squared function found with simulated annealing is "
<< x;
}
Note that we have simply changed the SquaredFunction
to accept arma::fmat
instead of arma::mat
as parameters to Evaluate()
, and the return type has
accordingly been changed to float
from double
. It would even be possible to
optimize functions with sparse coordinates by having Evaluate()
take a sparse
matrix (i.e. arma::sp_mat
).
If it were desired to represent the gradient as a sparse type, the Gradient()
function would need to be modified to take a sparse matrix (i.e. arma::sp_mat
or similar), and then you could call optimizer.Optimize<SquaredFunction,
arma::mat, arma::sp_mat>(f, x);
to perform the optimization while using sparse
matrix types to represent the gradient. Using sparse MatType
or GradType
should only be done when it is known that the objective matrix and/or
gradients will be sparse; otherwise the code may run very slow!
ensmallen will automatically infer MatType
from the call to Optimize()
, and
check that the given FunctionType
has all of the necessary functions for the
given MatType
, throwing a static_assert
error if not. If you would like to
disable these checks, define the macro ENS_DISABLE_TYPE_CHECKS
before
including ensmallen:
#define ENS_DISABLE_TYPE_CHECKS
#include <ensmallen.hpp>
This can be useful for situations where you know that the checks should be ignored. However, be aware that the code may fail to compile and give more confusing and difficult error messages!
Optimizer documentation
ActiveCMAES
An optimizer for separable functions.
Active CMA-ES is a variant of the stochastic search algorithm CMA-ES - Covariance Matrix Adaptation Evolution Strategy. Active CMA-ES actively reduces the uncertainty in unfavourable directions by exploiting the information about bad mutations in the covariance matrix update step. This isn’t for the purpose of accelerating progress, but instead for speeding up the adaptation of the covariance matrix (which, in turn, will lead to faster progress).
Constructors
ActiveCMAES<
SelectionPolicyType, TransformationPolicyType
>()
ActiveCMAES<
SelectionPolicyType, TransformationPolicyType
>(
lambda, transformationPolicy
)
ActiveCMAES<
SelectionPolicyType, TransformationPolicyType
>(
lambda, transformationPolicy, batchSize
)
ActiveCMAES<
SelectionPolicyType, TransformationPolicyType
>(
lambda, transformationPolicy, batchSize, maxIterations, tolerance, selectionPolicy
)
ActiveCMAES<
SelectionPolicyType, TransformationPolicyType
>(
lambda, transformationPolicy, batchSize, maxIterations, tolerance, selectionPolicy, stepSize
)
The SelectionPolicyType
template parameter refers to the strategy used to
compute the (approximate) objective function. The FullSelection
and
RandomSelection
classes are available for use; custom behavior can be achieved
by implementing a class with the same method signatures.
The TransformationPolicyType
template parameter refers to transformation
strategy used to map decision variables to the desired domain during fitness
evaluation and optimization termination. The EmptyTransformation
and
BoundaryBoxConstraint
classes are available for use; custom behavior can be
achieved by implementing a class with the same method signatures.
For convenience the following types can be used:
ActiveCMAES<>
(equivalent toActiveCMAES<FullSelection, EmptyTransformation<>>
): uses all separable functions to compute objectiveApproxActiveCMAES<>
(equivalent toActiveCMAES<RandomSelection, EmptyTransformation<>>
): uses a small amount of separable functions to compute approximate objective
Attributes
type | name | description | default |
---|---|---|---|
size_t |
lambda |
The population size (0 uses a default size). | 0 |
TransformationPolicyType |
transformationPolicy |
Instantiated transformation policy used to map the coordinates to the desired domain. | TransformationPolicyType() |
size_t |
batchSize |
Batch size to use for the objective calculation. | 32 |
size_t |
maxIterations |
Maximum number of iterations. | 1000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
SelectionPolicyType |
selectionPolicy |
Instantiated selection policy used to calculate the objective. | SelectionPolicyType() |
size_t |
stepSize |
Initial step size | 0 |
Attributes of the optimizer may also be changed via the member methods
Lambda()
, TransformationPolicy()
, BatchSize()
, MaxIterations()
,
Tolerance()
, and SelectionPolicy()
.
The selectionPolicy
attribute allows an instantiated SelectionPolicyType
to
be given. The FullSelection
policy has no need to be instantiated and thus
the option is not relevant when the ActiveCMAES<>
optimizer type is being used; the
RandomSelection
policy has the constructor RandomSelection(
fraction
)
where fraction
specifies the percentage of separable functions to use to
estimate the objective function.
The transformationPolicy
attribute allows an instantiated
TransformationPolicyType
to be given. The EmptyTransformation<
MatType
>
has no need to be instantiated. BoundaryBoxConstraint<
MatType
>
policy has
the constructor BoundaryBoxConstraint(
lowerBound, upperBound
)
where lowerBound
and lowerBound
are the lower bound and upper bound of
the coordinates respectively.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
// ActiveCMAES with the FullSelection and BoundaryBoxConstraint policies.
BoundaryBoxConstraint b(-1, 1);
ActiveCMAES optimizer(0, b, 32, 200, 1e-4);
optimizer.Optimize(f, coordinates);
// ActiveCMAES with the RandomSelection and BoundaryBoxConstraint policies.
ApproxActiveCMAES<BoundaryBoxConstraint<>> cmaes(0, b, 32, 200, 1e-4);
approxOptimizer.Optimize(f, coordinates);
See also:
- CMAES
- Improving Evolution Strategies through Active Covariance Matrix Adaptation
- Evolution strategy in Wikipedia
AdaBelief
An optimizer for differentiable separable functions.
AdaBelief uses a different denominator from Adam, and is orthogonal to other techniques such as recification, decoupled weight decay. The intuition for AdaBelief is to adapt the stepsize according to the “belief” in the current gradient direction.
Constructors
AdaBelief()
AdaBelief(
stepSize, batchSize
)
AdaBelief(
stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance, shuffle
)
AdaBelief(
stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
The exponential decay rate for the 1st moment estimates. | 0.9 |
double |
beta2 |
The exponential decay rate for the 2nd moment estimates. | 0.999 |
double |
epsilon |
A small constant for numerical stability. | 1e-8 |
size_t |
max_iterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Epsilon()
, MaxIterations()
,
Tolerance()
, Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
AdaBelief optimizer(0.001, 32, 0.9, 0.999, 1e-12, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
- SGD in Wikipedia
- SGD
- AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
- Differentiable separable functions
AdaBound
An optimizer for differentiable separable functions.
AdaBound is a variant of Adam which employs dynamic bounds on learning rates.
Constructors
AdaBound()
AdaBound(
stepSize, batchSize
)
AdaBound(
stepSize, batchSize, finalLr, gamma, beta1, beta2, epsilon, maxIterations, tolerance, shuffle
)
AdaBound(
stepSize, batchSize, finalLr, gamma, beta1, beta2, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Note that the AdaBound
class is based on the AdaBoundType<
UpdateRule
>
class with UpdateRule
= AdaBoundUpdate
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
finalLr |
The final (SGD) learning rate. | 0.1 |
double |
gamma |
The convergence speed of the bound functions. | 0.001 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
epsilon |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
The attributes of the optimizer may also be modified via the member methods
FinalLr()
, Gamma()
, StepSize()
, BatchSize()
, Beta1()
, Beta2()
,
Eps()
, MaxIterations()
, Tolerance()
, Shuffle()
, ResetPolicy()
, and
ExactObjective()
.
Examples
Click to collapse/expand example code.
SphereFunction f(2);
arma::mat coordinates = f.GetInitialPoint();
AdaBound optimizer(0.001, 2, 0.1, 1e-3, 0.9, 0.999, 1e-8, 500000, 1e-3);
optimizer.Optimize(f, coordinates);
See also:
- SGD in Wikipedia
- SGD
- Adaptive Gradient Methods with Dynamic Bound of Learning Rate
- Adam: A Method for Stochastic Optimization
- Differentiable separable functions
AdaDelta
An optimizer for differentiable separable functions.
AdaDelta is an extension of AdaGrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. Instead of accumulating all past squared gradients, the sum of gradients is recursively defined as a decaying average of all past squared gradients.
Constructors
AdaDelta()
AdaDelta(
stepSize
)
AdaDelta(
stepSize, batchSize
)
AdaDelta(
stepSize, batchSize, rho, epsilon, maxIterations, tolerance, shuffle
)
AdaDelta(
stepSize, batchSize, rho, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 1.0 |
size_t |
batchSize |
Number of points to process in one step. | 32 |
double |
rho |
Smoothing constant. Corresponding to fraction of gradient to keep at each time step. | 0.95 |
double |
epsilon |
Value used to initialize the mean squared gradient parameter. | 1e-6 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be changed via the member methods
StepSize()
, BatchSize()
, Rho()
, Epsilon()
, MaxIterations()
,
Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples:
Click to collapse/expand example code.
AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates);
See also:
Adagrad
An optimizer for differentiable separable functions.
AdaGrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. Larger updates for more sparse parameters and smaller updates for less sparse parameters.
Constructors
AdaGrad()
AdaGrad(
stepSize
)
AdaGrad(
stepSize, batchSize
)
AdaGrad(
stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle
)
AdaGrad(
stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Number of points to process in one step. | 32 |
double |
epsilon |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | tolerance |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be changed via the member methods
StepSize()
, BatchSize()
, Epsilon()
, MaxIterations()
, Tolerance()
,
Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples:
Click to collapse/expand example code.
AdaGrad optimizer(1.0, 1, 1e-8, 1000, 1e-9, true);
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates);
See also:
- Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- AdaGrad in Wikipedia
- AdaDelta
- Differentiable separable functions
AdaSqrt
An optimizer for differentiable separable functions.
AdaSqrt is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. Larger updates for more sparse parameters and smaller updates for less sparse parameters. AdaSqrt, removes the square root in the denominator and scales the learning rate by sqrt(T).
Constructors
AdaSqrt()
AdaSqrt(
stepSize
)
AdaSqrt(
stepSize, batchSize
)
AdaSqrt(
stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle
)
AdaSqrt(
stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Number of points to process in one step. | 32 |
double |
epsilon |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | tolerance |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be changed via the member methods
StepSize()
, BatchSize()
, Epsilon()
, MaxIterations()
, Tolerance()
,
Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples:
Click to collapse/expand example code.
AdaSqrt optimizer(1.0, 1, 1e-8, 1000, 1e-9, true);
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates);
See also:
- Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- AdaGrad in Wikipedia
- AdaDelta
- Differentiable separable functions
Adam
An optimizer for differentiable separable functions.
Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Constructors
Adam()
Adam(
stepSize, batchSize
)
Adam(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle
)
Adam(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Note that the Adam
class is based on the AdamType<
UpdateRule
>
class
with UpdateRule
= AdamUpdate
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Epsilon()
, MaxIterations()
,
Tolerance()
, Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
Adam optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
AdaMax
An optimizer for differentiable separable functions.
AdaMax is simply a variant of Adam based on the infinity norm.
Constructors
AdaMax()
AdaMax(
stepSize, batchSize
)
AdaMax(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle
)
AdaMax(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, exactObjective, resetPolicy
)
Note that the AdaMax
class is based on the AdamType<
UpdateRule
>
class
with UpdateRule
= AdaMaxUpdate
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Eps()
, MaxIterations()
,
Tolerance()
, Shuffle()
, ExactObjective()
, and ResetPolicy()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
AdaMax optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
- SGD in Wikipedia
- SGD
- Adam: A Method for Stochastic Optimization (see section 7)
- Differentiable separable functions
AMSBound
An optimizer for differentiable separable functions.
AMSBound is a variant of Adam which employs dynamic bounds on learning rates.
Constructors
AMSBound()
AMSBound(
stepSize, batchSize
)
AMSBound(
stepSize, batchSize, finalLr, gamma, beta1, beta2, epsilon, maxIterations, tolerance, shuffle
)
AMSBound(
stepSize, batchSize, finalLr, gamma, beta1, beta2, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Note that the AMSBound
class is based on the AdaBoundType<
UpdateRule
>
class with UpdateRule
= AdaBoundUpdate
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
finalLr |
The final (SGD) learning rate. | 0.1 |
double |
gamma |
The convergence speed of the bound functions. | 0.001 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
epsilon |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
The attributes of the optimizer may also be modified via the member methods
FinalLr()
, Gamma()
, StepSize()
, BatchSize()
, Beta1()
, Beta2()
,
Eps()
, MaxIterations()
, Tolerance()
, Shuffle()
, ResetPolicy()
, and
ExactObjective()
.
Examples
Click to collapse/expand example code.
SphereFunction f(2);
arma::mat coordinates = f.GetInitialPoint();
AMSBound optimizer(0.001, 2, 0.1, 1e-3, 0.9, 0.999, 1e-8, 500000, 1e-3);
optimizer.Optimize(f, coordinates);
See also:
- SGD in Wikipedia
- SGD
- Adaptive Gradient Methods with Dynamic Bound of Learning Rate
- Adam: A Method for Stochastic Optimization
- Differentiable separable functions
AMSGrad
An optimizer for differentiable separable functions.
AMSGrad is a variant of Adam with guaranteed convergence.
Constructors
AMSGrad()
AMSGrad(
stepSize, batchSize
)
AMSGrad(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle
)
AMSGrad(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, exactObjective, resetPolicy
)
Note that the AMSGrad
class is based on the AdamType<
UpdateRule
>
class
with UpdateRule
= AMSGradUpdate
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Eps()
, MaxIterations()
,
Tolerance()
, Shuffle()
, ExactObjective()
, and ResetPolicy()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
AMSGrad optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
Augmented Lagrangian
An optimizer for differentiable constrained functions.
The AugLagrangian
class implements the Augmented Lagrangian method of
optimization. In this scheme, a penalty term is added to the Lagrangian.
This method is also called the “method of multipliers”. Internally, the
optimizer uses L-BFGS.
Constructors
AugLagrangian(
maxIterations, penaltyThresholdFactor, sigmaUpdateFactor
)
Attributes
type | name | description | default |
---|---|---|---|
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 1000 |
double |
penaltyThresholdFactor |
When penalty threshold is updated, set it to this multiplied by the penalty. | 0.25 |
double |
sigmaUpdateFactor |
When sigma is updated, multiply it by this. | 10.0 |
L_BFGS& |
lbfgs |
Internal l-bfgs optimizer. | L_BFGS() |
The attributes of the optimizer may also be modified via the member methods
MaxIterations()
, PenaltyThresholdFactor()
, SigmaUpdateFactor()
and LBFGS()
.
Click to collapse/expand example code.
/**
* Optimize the function. The value '1' is used for the initial value of each
* Lagrange multiplier. To set the Lagrange multipliers yourself, use the
* other overload of Optimize().
*
* @tparam LagrangianFunctionType Function which can be optimized by this
* class.
* @param function The function to optimize.
* @param coordinates Output matrix to store the optimized coordinates in.
*/
template<typename LagrangianFunctionType>
bool Optimize(LagrangianFunctionType& function,
arma::mat& coordinates);
/**
* Optimize the function, giving initial estimates for the Lagrange
* multipliers. The vector of Lagrange multipliers will be modified to
* contain the Lagrange multipliers of the final solution (if one is found).
*
* @tparam LagrangianFunctionType Function which can be optimized by this
* class.
* @param function The function to optimize.
* @param coordinates Output matrix to store the optimized coordinates in.
* @param initLambda Vector of initial Lagrange multipliers. Should have
* length equal to the number of constraints.
* @param initSigma Initial penalty parameter.
*/
template<typename LagrangianFunctionType>
bool Optimize(LagrangianFunctionType& function,
arma::mat& coordinates,
const arma::vec& initLambda,
const double initSigma);
Examples
Click to collapse/expand example code.
GockenbachFunction f;
arma::mat coordinates = f.GetInitialPoint();
AugLagrangian optimizer;
optimizer.Optimize(f, coords);
See also:
Big Batch SGD
An optimizer for differentiable separable functions.
Big-batch stochastic gradient descent adaptively grows the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation, so the Big Batch SGD optimizer is able to adaptively adjust batch sizes without user oversight.
Constructors
BigBatchSGD<
UpdatePolicy
>()
BigBatchSGD<
UpdatePolicy
>(
stepSize
)
BigBatchSGD<
UpdatePolicy
>(
stepSize, batchSize
)
BigBatchSGD<
UpdatePolicy
>(
stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle, exactObjective
)
The UpdatePolicy
template parameter refers to the way that a new step size
is computed. The AdaptiveStepsize
and BacktrackingLineSearch
classes are
available for use; custom behavior can be achieved by implementing a class
with the same method signatures.
For convenience the following typedefs have been defined:
BBS_Armijo = BigBatchSGD<BacktrackingLineSearch>
BBS_BB = BigBatchSGD<AdaptiveStepsize>
Attributes
type | name | description | default |
---|---|---|---|
size_t |
batchSize |
Initial batch size. | 1000 |
double |
stepSize |
Step size for each iteration. | 0.01 |
double |
batchDelta |
Factor for the batch update step. | 0.1 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the batch order is shuffled; otherwise, each batch is visited in linear order. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be changed via the member methods
BatchSize()
, StepSize()
, BatchDelta()
, MaxIterations()
, Tolerance()
,
Shuffle()
, and ExactObjective()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
// Big-Batch SGD with the adaptive stepsize policy.
BBS_BB optimizer(10, 0.01, 0.1, 8000, 1e-4);
optimizer.Optimize(f, coordinates);
// Big-Batch SGD with backtracking line search.
BBS_Armijo optimizer2(10, 0.01, 0.1, 8000, 1e-4);
optimizer2.Optimize(f, coordinates);
See also:
Coordinate Descent (CD)
An optimizer for partially differentiable functions.
Coordinate descent is a technique for minimizing a function by doing a line search along a single direction at the current point in the iteration. The direction (or “coordinate”) can be chosen cyclically, randomly or in a greedy fashion.
Constructors
CD<
DescentPolicyType
>()
CD<
DescentPolicyType
>(
stepSize, maxIterations
)
CD<
DescentPolicyType
>(
stepSize, maxIterations, tolerance, updateInterval
)
CD<
DescentPolicyType
>(
stepSize, maxIterations, tolerance, updateInterval, descentPolicy
)
The DescentPolicyType
template parameter specifies the behavior of CD when
selecting the next coordinate to descend with. The RandomDescent
,
GreedyDescent
, and CyclicDescent
classes are available for use. Custom
behavior can be achieved by implementing a class with the same method
signatures.
For convenience, the following typedefs have been defined:
RandomCD
(equivalent toCD<RandomDescent>
): selects coordinates randomlyGreedyCD
(equivalent toCD<GreedyDescent>
): selects the coordinate with the maximum guaranteed descent according to the Gauss-Southwell ruleCyclicCD
(equivalent toCD<CyclicDescent>
): selects coordinates sequentially
Note: CD
used to be called SCD
. Use of the name SCD
is deprecated,
and will be removed in ensmallen 3 and later.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate the algorithm. | 1e-5 |
size_t |
updateInterval |
The interval at which the objective is to be reported and checked for convergence. | 1e3 |
DescentPolicyType |
descentPolicy |
The policy to use for selecting the coordinate to descend on. | DescentPolicyType() |
Attributes of the optimizer may also be modified via the member methods
StepSize()
, MaxIterations()
, Tolerance()
, UpdateInterval()
, and
DescentPolicy()
.
Note that the default value for descentPolicy
is the default constructor for
DescentPolicyType
.
Examples
Click to collapse/expand example code.
SparseTestFunction f;
arma::mat coordinates = f.GetInitialPoint();
RandomCD randomscd(0.01, 100000, 1e-5, 1e3);
randomscd.Optimize(f, coordinates);
GreedyCD greedyscd(0.01, 100000, 1e-5, 1e3);
greedyscd.Optimize(f, coordinates);
CyclicCD cyclicscd(0.01, 100000, 1e-5, 1e3);
cyclicscd.Optimize(f, coordinates);
See also:
- Coordinate descent on Wikipedia
- Stochastic Methods for L1-Regularized Loss Minimization
- Partially differentiable functions
CMAES
An optimizer for separable functions.
CMA-ES (Covariance Matrix Adaptation Evolution Strategy) is a stochastic search algorithm. CMA-ES is a second order approach estimating a positive definite matrix within an iterative procedure using the covariance matrix.
Constructors
CMAES<
SelectionPolicyType, TransformationPolicyType
>()
CMAES<
SelectionPolicyType, TransformationPolicyType
>(
lambda, transformationPolicy
)
CMAES<
SelectionPolicyType, TransformationPolicyType
>(
lambda, transformationPolicy, batchSize
)
CMAES<
SelectionPolicyType, TransformationPolicyType
>(
lambda, transformationPolicy, batchSize, maxIterations, tolerance, selectionPolicy
)
CMAES<
SelectionPolicyType, TransformationPolicyType
>(
lambda, transformationPolicy, batchSize, maxIterations, tolerance, selectionPolicy, stepSize
)
The SelectionPolicyType
template parameter refers to the strategy used to
compute the (approximate) objective function. The FullSelection
and
RandomSelection
classes are available for use; custom behavior can be achieved
by implementing a class with the same method signatures.
The TransformationPolicyType
template parameter refers to transformation
strategy used to map decision variables to the desired domain during fitness
evaluation and optimization termination. The EmptyTransformation
and
BoundaryBoxConstraint
classes are available for use; custom behavior can be
achieved by implementing a class with the same method signatures.
For convenience the following types can be used:
CMAES<>
(equivalent toCMAES<FullSelection, EmptyTransformation<>>
): uses all separable functions to compute objectiveApproxCMAES<>
(equivalent toCMAES<RandomSelection, EmptyTransformation<>>
): uses a small amount of separable functions to compute approximate objective
Attributes
type | name | description | default |
---|---|---|---|
size_t |
lambda |
The population size (0 uses a default size). | 0 |
TransformationPolicyType |
transformationPolicy |
Instantiated transformation policy used to map the coordinates to the desired domain. | TransformationPolicyType() |
size_t |
batchSize |
Batch size to use for the objective calculation. | 32 |
size_t |
maxIterations |
Maximum number of iterations. | 1000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
SelectionPolicyType |
selectionPolicy |
Instantiated selection policy used to calculate the objective. | SelectionPolicyType() |
size_t |
stepSize |
Initial step size | 0 |
Attributes of the optimizer may also be changed via the member methods
Lambda()
, TransformationPolicy()
, BatchSize()
, MaxIterations()
,
Tolerance()
, and SelectionPolicy()
.
The selectionPolicy
attribute allows an instantiated SelectionPolicyType
to
be given. The FullSelection
policy has no need to be instantiated and thus
the option is not relevant when the CMAES<>
optimizer type is being used; the
RandomSelection
policy has the constructor RandomSelection(
fraction
)
where fraction
specifies the percentage of separable functions to use to
estimate the objective function.
The transformationPolicy
attribute allows an instantiated
TransformationPolicyType
to be given. The EmptyTransformation<
MatType
>
has no need to be instantiated. BoundaryBoxConstraint<
MatType
>
policy has
the constructor BoundaryBoxConstraint(
lowerBound, upperBound
)
where lowerBound
and lowerBound
are the lower bound and upper bound of
the coordinates respectively.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
// CMAES with the FullSelection and BoundaryBoxConstraint policies.
BoundaryBoxConstraint b(-1, 1);
CMAES optimizer(0, b, 32, 200, 1e-4);
optimizer.Optimize(f, coordinates);
// CMAES with the RandomSelection and BoundaryBoxConstraint policies.
ApproxCMAES<BoundaryBoxConstraint<>> cmaes(0, b, 32, 200, 1e-4);
approxOptimizer.Optimize(f, coordinates);
See also:
- Completely Derandomized Self-Adaptation in Evolution Strategies
- CMA-ES in Wikipedia
- Evolution strategy in Wikipedia
CNE
An optimizer for arbitrary functions.
Conventional Neural Evolution is an optimizer that works like biological evolution which selects best candidates based on their fitness scores and creates new generation by mutation and crossover of population. The initial population is generated based on a random normal distribution centered at the given starting point.
Constructors
CNE()
CNE(
populationSize, maxGenerations
)
CNE(
populationSize, maxGenerations, mutationProb, mutationSize
)
CNE(
populationSize, maxGenerations, mutationProb, mutationSize, selectPercent, tolerance
)
Attributes
type | name | description | default |
---|---|---|---|
size_t |
populationSize |
The number of candidates in the population. This should be at least 4 in size. | 500 |
size_t |
maxGenerations |
The maximum number of generations allowed for CNE. | 5000 |
double |
mutationProb |
Probability that a weight will get mutated. | 0.1 |
double |
mutationSize |
The range of mutation noise to be added. This range is between 0 and mutationSize. | 0.02 |
double |
selectPercent |
The percentage of candidates to select to become the the next generation. | 0.2 |
double |
tolerance |
The final value of the objective function for termination. If set to negative value, tolerance is not considered. | 1e-5 |
Attributes of the optimizer may also be changed via the member methods
PopulationSize()
, MaxGenerations()
, MutationProb()
, SelectPercent()
and Tolerance()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
CNE optimizer(200, 10000, 0.2, 0.2, 0.3, 1e-5);
optimizer.Optimize(f, coordinates);
See also:
DE
An optimizer for arbitrary functions.
Differential Evolution is an evolutionary optimization algorithm which selects best candidates based on their fitness scores and creates new generation by mutation and crossover of population.
Constructors
DE()
DE(
populationSize, maxGenerations
)
DE(
populationSize, maxGenerations, crossoverRate
)
DE(
populationSize, maxGenerations, crossoverRate, differentialWeight
)
DE(
populationSize, maxGenerations, crossoverRate, differentialWeight, tolerance
)
Attributes
type | name | description | default |
---|---|---|---|
size_t |
populationSize |
The number of candidates in the population. This should be at least 3 in size. | 100 |
size_t |
maxGenerations |
The maximum number of generations allowed for DE. | 2000 |
double |
crossoverRate |
Probability that a candidate will undergo crossover. | 0.6 |
double |
differentialWeight |
Amplification factor for differentiation. | 0.8 |
double |
tolerance |
The final value of the objective function for termination. If set to negative value, tolerance is not considered. | 1e-5 |
Attributes of the optimizer may also be changed via the member methods
PopulationSize()
, MaxGenerations()
, CrossoverRate()
, DifferentialWeight()
and Tolerance()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
DE optimizer(200, 1000, 0.6, 0.8, 1e-5);
optimizer.Optimize(f, coordinates);
See also:
- Differential Evolution - A simple and efficient adaptive scheme for global optimization over continuous spaces
- Differential Evolution in Wikipedia
- Arbitrary functions
DemonAdam
An optimizer for differentiable separable functions.
DemonAdam is an Adam based optimizer. DemonAdam is motivated by decaying the total contribution of a gradient to all future updates.
Constructors
DemonAdam()
DemonAdam(
stepSize, batchSize
)
DemonAdam(
stepSize, batchSize, momentum, beta1, beta2, eps, maxIterations, tolerance, shuffle
)
DemonAdam(
stepSize, batchSize, momentum, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy
)
Note that the DemonAdam
class is based on
the DemonAdamType<
UpdateRule
>
class with UpdateRule
= AdamUpdate
.
For convenience the following typedefs have been defined:
DemonAdaMax
(equivalent toDemonAdamType<AdaMaxUpdate>
): DemonAdam that uses the AdaMax update rule.DemonAMSGrad
(equivalent toDemonAdamType<AMSGradUpdate>
): DemonAdam that uses the AMSGrad update rule.DemonNadam
(equivalent toDemonAdamType<NadamUpdate>
): DemonAdam that uses the Nadam update rule.NadamUpdate
(equivalent toDemonAdamType<NadaMaxUpdate>
): DemonAdam that uses the NadaMax update rule.DemonOptimisticAdam
(equivalent toDemonAdamType<OptimisticAdamUpdate>
): DemonAdam that uses the OptimisticAdam update rule.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
momentum |
The initial momentum coefficient. | 0.9 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Momentum()
, MomentumIterations()
, Beta1()
,
Beta2()
, Eps()
, MaxIterations()
, Tolerance()
, Shuffle()
, and
ResetPolicy()
.
Examples
Click to collapse/expand example code.
MatyasFunction f;
arma::mat coordinates = f.GetInitialPoint();
DemonAdam optimizer(0.5, 1, 0.9);
optimizer.Optimize(f, coordinates);
See also:
- SGD in Wikipedia
- SGD
- Decaying momentum helps neural network training
- Differentiable separable functions
DemonSGD
An optimizer for differentiable separable functions.
DemonSGD is an SGD based optimizer. DemonSGD is motivated by decaying the total contribution of a gradient to all future updates.
For convenience ensmallen implements various Adam based versions of the Demon optimizer:
DemonAdam
(equivalent toDemonAdamType<AdamUpdate>
): DemonAdam that uses the Adam update rule.DemonAdaMax
(equivalent toDemonAdamType<AdaMaxUpdate>
): DemonAdam that uses the AdaMax update rule.DemonAMSGrad
(equivalent toDemonAdamType<AMSGradUpdate>
): DemonAdam that uses the AMSGrad update rule.DemonNadam
(equivalent toDemonAdamType<NadamUpdate>
): DemonAdam that uses the Nadam update rule.NadamUpdate
(equivalent toDemonAdamType<NadaMaxUpdate>
): DemonAdam that uses the NadaMax update rule.DemonOptimisticAdam
(equivalent toDemonAdamType<OptimisticAdamUpdate>
): DemonAdam that uses the OptimisticAdam update rule.
Constructors
DemonSGD()
DemonSGD(
stepSize, batchSize
)
DemonSGD(
stepSize, batchSize, momentum, maxIterations, tolerance, shuffle
)
DemonSGD(
stepSize, batchSize, momentum, maxIterations, tolerance, shuffle, resetPolicy
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
momentum |
The initial momentum coefficient. | 0.9 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Momentum()
, MomentumIterations()
,
MaxIterations()
, Tolerance()
, Shuffle()
, and ResetPolicy()
.
Examples
Click to collapse/expand example code.
MatyasFunction f;
arma::mat coordinates = f.GetInitialPoint();
DemonSGD optimizer(0.5, 1, 0.9);
optimizer.Optimize(f, coordinates);
See also:
- SGD in Wikipedia
- SGD
- Decaying momentum helps neural network training
- Differentiable separable functions
Eve
An optimizer for differentiable separable functions.
Eve is a stochastic gradient based optimization method with locally and globally adaptive learning rates.
Constructors
Eve()
Eve(
stepSize, batchSize
)
Eve(
stepSize, batchSize, beta1, beta2, beta3, epsilon, clip, maxIterations, tolerance, shuffle
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
beta3 |
Exponential decay rate for relative change. | 0.999 |
double |
epsilon |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
double |
clip |
Clipping range to avoid extreme valus. | 10 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Beta3()
, Epsilon()
, Clip()
, MaxIterations()
,
Tolerance()
, Shuffle()
, and ExactObjective()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
Eve optimizer(0.001, 32, 0.9, 0.999, 0.999, 10, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
- SGD in Wikipedia
- SGD
- Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- Differentiable separable functions
Frank-Wolfe
An optimizer for differentiable functions that may also be constrained.
Frank-Wolfe is a technique to minimize a continuously differentiable convex function f over a compact convex subset D of a vector space. It is also known as conditional gradient method.
Constructors
FrankWolfe<
LinearConstrSolverType, UpdateRuleType
>(
linearConstrSolver, updateRule
)
FrankWolfe<
LinearConstrSolverType, UpdateRuleType
>(
linearConstrSolver, updateRule, maxIterations, tolerance
)
The LinearConstrSolverType
template parameter specifies the constraint
domain D for the problem. The ConstrLpBallSolver
and
ConstrStructGroupSolver<GroupLpBall>
classes are available for use; the former
restricts D to the unit ball of the specified l-p norm. Other constraint types
may be implemented as a class with the same method signatures as either of the
existing classes.
The UpdateRuleType
template parameter specifies the update rule used by the
optimizer. The UpdateClassic
and UpdateLineSearch
classes are available for
use and represent a simple update step rule and a line search based update rule,
respectively. The UpdateSpan
and UpdateFulLCorrection
classes are also
available and may be used with the FuncSq
function class (which is a squared
matrix loss).
For convenience the following typedefs have been defined:
OMP
(equivalent toFrankWolfe<ConstrLpBallSolver, UpdateSpan>
): a solver for the orthogonal matching pursuit problemStandardFrankWolfe
(equivalent toFrankWolfe<ConstrLpBallSolver, ClassicUpdate>
): the standard Frank-Wolfe algorithm with the solution restricted to lie within the unit ball
Attributes
type | name | description | default |
---|---|---|---|
LinearConstrSolverType |
linearConstrSolver |
Solver for linear constrained problem. | n/a |
UpdateRuleType |
updateRule |
Rule for updating solution in each iteration. | n/a |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
size_t |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-10 |
Attributes of the optimizer may also be changed via the member methods
LinearConstrSolver()
, UpdateRule()
, MaxIterations()
, and Tolerance()
.
Examples:
TODO
See also:
FTML (Follow the Moving Leader)
An optimizer for differentiable separable functions.
Follow the Moving Leader (FTML) is an optimizer where recent samples are weighted more heavily in each iteration, so FTML can adapt more quickly to changes.
Constructors
FTML()
FTML(
stepSize, batchSize
)
FTML(
stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance, shuffle
)
FTML(
stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Epsilon()
, MaxIterations()
,
Tolerance()
, Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
FTML optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
Gradient Descent
An optimizer for differentiable functions.
Gradient Descent is a technique to minimize a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.
Constructors
GradientDescent()
GradientDescent(
stepSize
)
GradientDescent(
stepSize, maxIterations, tolerance
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
size_t |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
Attributes of the optimizer may also be changed via the member methods
StepSize()
, MaxIterations()
, and Tolerance()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
GradientDescent optimizer(0.001, 0, 1e-15);
optimizer.Optimize(f, coordinates);
See also:
Grid Search
An optimizer for categorical functions.
An optimizer that finds the minimum of a given function by iterating through points on a multidimensional grid.
Constructors
GridSearch()
Attributes
The GridSearch
class has no configurable attributes.
Note: the GridSearch
class can only optimize categorical functions where
every parameter is categorical.
See also:
- Categorical functions (includes an example for
GridSearch
) - Grid search on Wikipedia
Hogwild! (Parallel SGD)
An optimizer for sparse differentiable separable functions.
An implementation of parallel stochastic gradient descent using the lock-free
HOGWILD! approach. This implementation requires OpenMP to be enabled during
compilation (i.e., -fopenmp
specified as a compiler flag).
Note that the requirements for Hogwild! are slightly different than for most
differentiable separable functions but it
is often possible to use Hogwild! by implementing Gradient()
with a template
parameter. See the sparse differentiable separable
functions documentation for more
details.
Constructors
ParallelSGD<
DecayPolicyType
>(
maxIterations, threadShareSize
)
ParallelSGD<
DecayPolicyType
>(
maxIterations, threadShareSize, tolerance, shuffle, decayPolicy
)
The DecayPolicyType
template parameter specifies the policy used to update
the step size after each iteration. The ConstantStep
class is available for
use. Custom behavior can be achieved by implementing a class with the same
method signatures.
The default type for DecayPolicyType
is ConstantStep
, so the shorter type
ParallelSGD<>
can be used instead of the equivalent
ParallelSGD<ConstantStep>
.
Attributes
type | name | description | default |
---|---|---|---|
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | n/a |
size_t |
threadShareSize |
Number of datapoints to be processed in one iteration by each thread. | n/a |
double |
tolerance |
Maximum absolute tolerance to terminate the algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
DecayPolicyType |
decayPolicy |
An instantiated step size update policy to use. | DecayPolicyType() |
Attributes of the optimizer may also be modified via the member methods
MaxIterations()
, ThreadShareSize()
, Tolerance()
, Shuffle()
, and
DecayPolicy()
.
Note that the default value for decayPolicy
is the default constructor for the
DecayPolicyType
.
Examples
Click to collapse/expand example code.
GeneralizedRosenbrockFunction f(50); // 50-dimensional Rosenbrock function.
arma::mat coordinates = f.GetInitialPoint();
ParallelSGD<> optimizer(100000, f.NumFunctions(), 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
- SGD in Wikipedia
- SGD
- HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
- Sparse differentiable separable functions
IQN
An optimizer for differentiable separable functions.
The Incremental Quasi-Newton belongs to the family of stochastic and incremental methods that have a cost per iteration independent of n. IQN iterations are a stochastic version of BFGS iterations that use memory to reduce the variance of stochastic approximations.
Constructors
IQN()
IQN(
stepSize
)
IQN(
stepSize, batchSize, maxIterations, tolerance
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Size of each batch. | 10 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
size_t |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
Attributes of the optimizer may also be changed via the member methods
StepSize()
, BatchSize()
, MaxIterations()
, and Tolerance()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
IQN optimizer(0.01, 1, 5000, 1e-5);
optimizer.Optimize(f, coordinates);
See also:
- IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate
- A Stochastic Quasi-Newton Method for Large-Scale Optimization
- Differentiable functions
Katyusha
An optimizer for differentiable separable functions.
Katyusha is a direct, primal-only stochastic gradient method which uses a “negative momentum” on top of Nesterov’s momentum. Two types are available—one that uses a proximal update step, and one that uses the standard update step.
Constructors
KatyushaType<
proximal
>()
KatyushaType<
proximal
>(
convexity, lipschitz
)
KatyushaType<
proximal
>(
convexity, lipschitz, batchSize
)
KatyushaType<
proximal
>(
convexity, lipschitz, batchSize, maxIterations, innerIterations, tolerance, shuffle, exactObjective
)
The proximal
template parameter is a boolean value (true
or false
) that
specifies whether or not the proximal update should be used.
For convenience the following typedefs have been defined:
Katyusha
(equivalent toKatyushaType<false>
): Katyusha with the standard update stepKatyushaProximal
(equivalent toKatyushaType<true>
): Katyusha with the proximal update step
Attributes
type | name | description | default |
---|---|---|---|
double |
convexity |
The regularization parameter. | 1.0 |
double |
lipschitz |
The Lipschitz constant. | 10.0 |
size_t |
batchSize |
Batch size to use for each step. | 32 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 1000 |
size_t |
innerIterations |
The number of inner iterations allowed (0 means n / batchSize). Note that the full gradient is only calculated in the outer iteration. | 0 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be changed via the member methods
Convexity()
, Lipschitz()
, BatchSize()
, MaxIterations()
,
InnerIterations()
, Tolerance()
, Shuffle()
, and ExactObjective()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
// Without proximal update.
Katyusha optimizer(1.0, 10.0, 1, 100, 0, 1e-10, true);
optimizer.Optimize(f, coordinates);
// With proximal update.
KatyushaProximal proximalOptimizer(1.0, 10.0, 1, 100, 0, 1e-10, true);
proximalOptimizer.Optimize(f, coordinates);
See also:
- Katyusha: The First Direct Acceleration of Stochastic Gradient Methods
- Stochastic gradient descent in Wikipedia
- Differentiable separable functions
L-BFGS
An optimizer for differentiable functions
L-BFGS is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm using a limited amount of computer memory.
Constructors
L_BFGS()
L_BFGS(
numBasis, maxIterations
)
L_BFGS(
numBasis, maxIterations, armijoConstant, wolfe, minGradientNorm, factr, maxLineSearchTrials
)
L_BFGS(
numBasis, maxIterations, armijoConstant, wolfe, minGradientNorm, factr, maxLineSearchTrials, minStep, maxStep
)
Attributes
type | name | description | default |
---|---|---|---|
size_t |
numBasis |
Number of memory points to be stored (default 10). | 10 |
size_t |
maxIterations |
Maximum number of iterations for the optimization (0 means no limit and may run indefinitely). | 10000 |
double |
armijoConstant |
Controls the accuracy of the line search routine for determining the Armijo condition. | 1e-4 |
double |
wolfe |
Parameter for detecting the Wolfe condition. | 0.9 |
double |
minGradientNorm |
Minimum gradient norm required to continue the optimization. | 1e-6 |
double |
factr |
Minimum relative function value decrease to continue the optimization. | 1e-15 |
size_t |
maxLineSearchTrials |
The maximum number of trials for the line search (before giving up). | 50 |
double |
minStep |
The minimum step of the line search. | 1e-20 |
double |
maxStep |
The maximum step of the line search. | 1e20 |
Attributes of the optimizer may also be changed via the member methods
NumBasis()
, MaxIterations()
, ArmijoConstant()
, Wolfe()
,
MinGradientNorm()
, Factr()
, MaxLineSearchTrials()
, MinStep()
, and
MaxStep()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
L_BFGS optimizer(20);
optimizer.Optimize(f, coordinates);
See also:
- The solution of non linear finite element equations
- Updating Quasi-Newton Matrices with Limited Storage
- Limited-memory BFGS in Wikipedia
- Differentiable functions
Lookahead
An optimizer for differentiable separable functions.
Lookahead is a stochastic gradient based optimization method which chooses a search direction by looking ahead at the sequence of “fast weights” generated by another optimizer.
Constructors
Lookahead<>()
Lookahead<>(
stepSize
)
Lookahead<>(
stepSize, k
)
Lookahead<>(
stepSize, k, maxIterations, tolerance, decayPolicy, exactObjective
)
Lookahead<>(
baseOptimizer, stepSize, k, maxIterations, tolerance, decayPolicy, exactObjective
)
Note that Lookahead<>
is based on the templated type
LookaheadType<
BaseOptimizerType, DecayPolicyType
>
with BaseOptimizerType
= Adam
and DecayPolicyType
= NoDecay
.
Any optimizer that implements the differentiable separable functions interface
can be paired with the Lookahead
optimizer.
Attributes
type | name | description | default |
---|---|---|---|
BaseOptimizerType |
baseOptimizer |
Optimizer for the forward step. | Adam |
double |
stepSize |
Step size for each iteration. | 0.5 |
size_t |
k |
The synchronization period. | 5 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
DecayPolicyType |
decayPolicy |
Instantiated decay policy used to adjust the step size. | DecayPolicyType() |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
The attributes of the optimizer may also be modified via the member methods
BaseOptimizer()
, StepSize()
, K()
, MaxIterations()
,
Tolerance()
, DecayPolicy()
and ExactObjective()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
Lookahead<> optimizer(0.5, 5, 100000, 1e-5);
optimizer.Optimize(f, coordinates);
See also:
- SGD in Wikipedia
- SGD
- Lookahead Optimizer: k steps forward, 1 step back
- Differentiable separable functions
LRSDP (low-rank SDP solver)
An optimizer for semidefinite programs.
LRSDP is the implementation of Monteiro and Burer’s formulation of low-rank semidefinite programs (LR-SDP). This solver uses the augmented Lagrangian optimizer to solve low-rank semidefinite programs.
The assumption here is that the solution matrix for the SDP is low-rank. If this assumption is not true, the algorithm should not be expected to converge.
Constructors
LRSDP<
SDPType
>()
The SDPType
template parameter specifies the type of SDP to solve. The
SDP<arma::mat>
and SDP<arma::sp_mat>
classes are available for use; these
represent SDPs with dense and sparse C
matrices, respectively. The SDP<>
class is detailed in the semidefinite program
documentation.
Once the LRSDP<>
object is constructed, the SDP may be specified by calling
the SDP()
member method, which returns a reference to the SDPType
.
Attributes
The attributes of the LRSDP optimizer may only be accessed via member methods.
type | method name | description | default |
---|---|---|---|
size_t |
MaxIterations() |
Maximum number of iterations before termination. | 1000 |
AugLagrangian |
AugLag() |
The internally-held Augmented Lagrangian optimizer. | n/a |
See also:
- A Nonlinear Programming Algorithm for Solving Semidefinite Programs via Low-rank Factorization
- Semidefinite programming on Wikipedia
- Semidefinite programs (includes example usage of
PrimalDualSolver
)
Momentum SGD
An optimizer for differentiable separable functions.
Stochastic Gradient Descent is a technique for minimizing a function which can be expressed as a sum of other functions. This is an SGD variant that uses momentum for its updates. Using momentum updates for parameter learning can accelerate the rate of convergence, specifically in the cases where the surface curves much more steeply (a steep hilly terrain with high curvature).
Constructors
MomentumSGD()
MomentumSGD(
stepSize, batchSize
)
MomentumSGD(
stepSize, batchSize, maxIterations, tolerance, shuffle
)
MomentumSGD(
stepSize, batchSize, maxIterations, tolerance, shuffle, momentumPolicy, decayPolicy, resetPolicy, exactObjective
)
Note that MomentumSGD
is based on the templated type
SGD<
UpdatePolicyType, DecayPolicyType
>
with UpdatePolicyType
=
MomentumUpdate
and DecayPolicyType
= NoDecay
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Batch size to use for each step. | 32 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
MomentumUpdate |
updatePolicy |
An instantiated MomentumUpdate . |
MomentumUpdate() |
DecayPolicyType |
decayPolicy |
Instantiated decay policy used to adjust the step size. | DecayPolicyType() |
bool |
resetPolicy |
Flag that determines whether update policy parameters are reset before every Optimize call. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, MaxIterations()
, Tolerance()
, Shuffle()
, UpdatePolicy()
, DecayPolicy()
, ResetPolicy()
, and
ExactObjective()
.
Note that the MomentumUpdate
class has the constructor
MomentumUpdate(
momentum
)
with a default value of 0.5
for the momentum.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));
optimizer.Optimize(f, coordinates);
See also:
Nadam
An optimizer for differentiable separable functions.
Nadam is a variant of Adam based on NAG (Nesterov accelerated gradient). It uses Nesterov momentum for faster convergence.
Constructors
Nadam()
Nadam(
stepSize, batchSize
)
Nadam(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle
)
Nadam(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy
)
Note that the Nadam
class is based on the AdamType<
UpdateRule
>
class
with UpdateRule
= NadamUpdate
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Eps()
, MaxIterations()
,
Tolerance()
, Shuffle()
, and ResetPolicy()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
Nadam optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
NadaMax
An optimizer for differentiable separable functions.
NadaMax is a variant of AdaMax based on NAG (Nesterov accelerated gradient). It uses Nesterov momentum for faster convergence.
Constructors
NadaMax()
NadaMax(
stepSize, batchSize
)
NadaMax(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle
)
NadaMax(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy
)
Note that the NadaMax
class is based on the AdamType<
UpdateRule
>
class
with UpdateRule
= NadaMaxUpdate
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Eps()
, MaxIterations()
,
Tolerance()
, Shuffle()
, and ResetPolicy()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
NadaMax optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
Nesterov Momentum SGD
An optimizer for differentiable separable functions.
Stochastic Gradient Descent is a technique for minimizing a function which can be expressed as a sum of other functions. This is an SGD variant that uses Nesterov momentum for its updates. Nesterov Momentum application can accelerate the rate of convergence to O(1/k^2).
Constructors
NesterovMomentumSGD()
NesterovMomentumSGD(
stepSize, batchSize
)
NesterovMomentumSGD(
stepSize, batchSize, maxIterations, tolerance, shuffle
)
NesterovMomentumSGD(
stepSize, batchSize, maxIterations, tolerance, shuffle, momentumPolicy, decayPolicy, resetPolicy, exactObjective
)
Note that MomentumSGD
is based on the templated type
SGD<
UpdatePolicyType, DecayPolicyType
>
with UpdatePolicyType
=
NesterovMomentumUpdate
and DecayPolicyType
= NoDecay
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Batch size to use for each step. | 32 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
NesterovMomentumUpdate |
updatePolicy |
An instantiated MomentumUpdate . |
NesterovMomentumUpdate() |
DecayPolicyType |
decayPolicy |
Instantiated decay policy used to adjust the step size. | DecayPolicyType() |
bool |
resetPolicy |
Flag that determines whether update policy parameters are reset before every Optimize call. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, MaxIterations()
, Tolerance()
, Shuffle()
, UpdatePolicy()
, DecayPolicy()
, ResetPolicy()
, and
ExactObjective()
.
Note that the NesterovMomentumUpdate
class has the constructor
MomentumUpdate(
momentum
)
with a default value of 0.5
for the momentum.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
NesterovMomentumSGD optimizer(0.01, 32, 100000, 1e-5, true,
MomentumUpdate(0.5));
optimizer.Optimize(f, coordinates);
See also:
MOEA/D-DE
An optimizer for arbitrary multi-objective functions. MOEA/D-DE (Multi Objective Evolutionary Algorithm based on Decomposition - Differential Evolution) is a multi objective optimization algorithm. It works by decomposing the problem into a number of scalar optimization subproblems which are solved simultaneously per generation. MOEA/D in itself is a framework, this particular algorithm uses Differential Crossover followed by Polynomial Mutation to create offsprings which are then decomposed to form a Single Objective Problem. A diversity preserving mechanism is also employed which encourages a varied set of solution.
Constructors
MOEAD<
InitPolicyType, DecompPolicyType
>()
MOEAD<
InitPolicyType, DecompPolicyType
>(
populationSize, maxGenerations, crossoverProb, neighborProb, neighborSize, distributionIndex, differentialWeight, maxReplace, epsilon, lowerBound, upperBound
)
The InitPolicyType
template parameter refers to the strategy used to
initialize the reference directions.
The following types are available:
Uniform
BayesianBootstrap
Dirichlet
The DecompPolicyType
template parameter refers to the strategy used to
decompose the weight vectors to form a scalar objective function.
The following types are available:
Tchebycheff
WeightedAverage
PenaltyBoundaryIntersection
For convenience the following types can be used:
-
DefaultMOEAD
(equivalent toMOEAD<Uniform, Tchebycheff>
): utilizes Uniform method for weight initialization and Tchebycheff for weight decomposition. -
BBSMOEAD
(equivalent toMOEAD<BayesianBootstrap, Tchebycheff>
): utilizes Bayesian Bootstrap method for weight initialization and Tchebycheff for weight decomposition. -
DirichletMOEAD
(equivalent toMOEAD<Dirichlet, Tchebycheff>
): utilizes Dirichlet sampling for weight init and Tchebycheff for weight decomposition.
Attributes
type | name | description | default |
---|---|---|---|
size_t |
populationSize |
The number of candidates in the population. | 150 |
size_t |
maxGenerations |
The maximum number of generations allowed. | 300 |
double |
crossoverProb |
Probability that a crossover will occur. | 1.0 |
double |
neighborProb |
The probability of sampling from neighbor. | 0.9 |
size_t |
neighborSize |
The number of nearest-neighbours to consider per weight vector. | 20 |
double |
distributionIndex |
The crowding degree of the mutation. | 20 |
double |
differentialWeight |
Amplification factor of the differentiation. | 0.5 |
size_t |
maxReplace |
The limit of solutions allowed to be replaced by a child. | 2 |
double |
epsilon |
Handles numerical stability after weight initialization. | 1E-10 |
double , arma::vec |
lowerBound |
Lower bound of the coordinates on the coordinates of the whole population during the search process. | 0 |
double , arma::vec |
upperBound |
Lower bound of the coordinates on the coordinates of the whole population during the search process. | 1 |
InitPolicyType |
initPolicy |
Instantiated init policy used to initialize weights. | InitPolicyType() |
DecompPolicyType |
decompPolicy |
Instantiated decomposition policy used to create scalar objective problem. | DecompPolicyType() |
Attributes of the optimizer may also be changed via the member methods
PopulationSize()
, MaxGenerations()
, CrossoverRate()
, NeighborProb()
, NeighborSize()
, DistributionIndex()
,
DifferentialWeight()
, MaxReplace()
, Epsilon()
, LowerBound()
, UpperBound()
, InitPolicy()
and DecompPolicy()
.
Examples:
Click to collapse/expand example code.
SchafferFunctionN1<arma::mat> SCH;
arma::vec lowerBound("-10");
arma::vec upperBound("10");
DefaultMOEAD opt(300, 300, 1.0, 0.9, 20, 20, 0.5, 2, 1E-10, lowerBound, upperBound);
typedef decltype(SCH.objectiveA) ObjectiveTypeA;
typedef decltype(SCH.objectiveB) ObjectiveTypeB;
arma::mat coords = SCH.GetInitialPoint();
std::tuple<ObjectiveTypeA, ObjectiveTypeB> objectives = SCH.GetObjectives();
// obj will contain the minimum sum of objectiveA and objectiveB found on the best front.
double obj = opt.Optimize(objectives, coords);
// Now obtain the best front.
arma::cube bestFront = opt.ParetoFront();
See also
NSGA2
An optimizer for arbitrary multi-objective functions.
NSGA2 (Non-dominated Sorting Genetic Algorithm - II) is a multi-objective optimization algorithm. The algorithm works by generating a candidate population from a fixed starting point. At each stage of optimization, a new population of children is generated. This new population along with its predecessor is sorted using non-domination as the metric. Following this, the population is further segregated into fronts. A new population is generated from these fronts having size equal to that of the starting population.
Constructors
NSGA2()
NSGA2(
populationSize, maxGenerations, crossoverProb, mutationProb, mutationStrength, epsilon, lowerBound, upperBound
)
Attributes
type | name | description | default |
---|---|---|---|
size_t |
populationSize |
The number of candidates in the population. This should be at least 4 in size and a multiple of 4. | 100 |
size_t |
maxGenerations |
The maximum number of generations allowed for NSGA2. | 2000 |
double |
crossoverProb |
Probability that a crossover will occur. | 0.6 |
double |
mutationProb |
Probability that a weight will get mutated. | 0.3 |
double |
mutationStrength |
The range of mutation noise to be added. This range is between 0 and mutationStrength. | 0.001 |
double |
epsilon |
The value used internally to evaluate approximate equality in crowding distance based sorting. | 1e-6 |
double , arma::vec |
lowerBound |
Lower bound of the coordinates on the coordinates of the whole population during the search process. | 0 |
double , arma::vec |
upperBound |
Lower bound of the coordinates on the coordinates of the whole population during the search process. | 1 |
Note that the parameters lowerBound
and upperBound
are overloaded. Data types of double
or arma::mat
may be used. If they are initialized as single values of double
, then the same value of the bound applies to all the axes, resulting in an initialization following a uniform distribution in a hypercube. If they are initialized as matrices of arma::mat
, then the value of lowerBound[i]
applies to axis [i]
; similarly, for values in upperBound
. This results in an initialization following a uniform distribution in a hyperrectangle within the specified bounds.
Attributes of the optimizer may also be changed via the member methods
PopulationSize()
, MaxGenerations()
, CrossoverRate()
, MutationProbability()
, MutationStrength()
, Epsilon()
, LowerBound()
and UpperBound()
.
Examples:
Click to collapse/expand example code.
SchafferFunctionN1<arma::mat> SCH;
arma::vec lowerBound("-1000");
arma::vec upperBound("1000");
NSGA2 opt(20, 5000, 0.5, 0.5, 1e-3, 1e-6, lowerBound, upperBound);
typedef decltype(SCH.objectiveA) ObjectiveTypeA;
typedef decltype(SCH.objectiveB) ObjectiveTypeB;
arma::mat coords = SCH.GetInitialPoint();
std::tuple<ObjectiveTypeA, ObjectiveTypeB> objectives = SCH.GetObjectives();
// obj will contain the minimum sum of objectiveA and objectiveB found on the best front.
double obj = opt.Optimize(objectives, coords);
// Now obtain the best front.
arma::cube bestFront = opt.ParetoFront();
See also:
OptimisticAdam
An optimizer for differentiable separable functions.
OptimisticAdam is an optimizer which implements the Optimistic Adam algorithm which uses Optmistic Mirror Descent with the Adam Optimizer. It addresses the problem of limit cycling while training GANs (generative adversarial networks). It uses OMD to achieve faster regret rates in solving the zero sum game of training a GAN. It consistently achieves a smaller KL divergence with~ respect to the true underlying data distribution. The implementation here can be used with any differentiable separable function, not just GAN training.
Constructors
OptimisticAdam()
OptimisticAdam(
stepSize, batchSize
)
OptimisticAdam(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle
)
OptimisticAdam(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy
)
Note that the OptimisticAdam
class is based on the
AdamType<
UpdateRule
>
class with UpdateRule
= OptimisticAdamUpdate
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Eps()
, MaxIterations()
,
Tolerance()
, Shuffle()
, and ResetPolicy()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
OptimisticAdam optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
Padam
An optimizer for differentiable separable functions.
Padam is a variant of Adam with a partially adaptive momentum estimation method.
Constructors
Padam()
Padam(
stepSize, batchSize
)
Padam(
stepSize, batchSize, beta1, beta2, partial, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
partial |
Partially adaptive parameter. | 0.25 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Partial()
, Epsilon()
,
MaxIterations()
, Tolerance()
, Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
Padam optimizer(0.001, 32, 0.9, 0.999, 0.25, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
- Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
- SGD in Wikipedia
- SGD
- Adam: A Method for Stochastic Optimization
- Differentiable separable functions
PSO
An optimizer for arbitrary functions.
PSO is an evolutionary approach to optimization that is inspired by flocks or birds or fishes. The fundamental analogy is that every creature (particle in a swarm) is at a measurable position of goodness or fitness, and this information can be shared amongst the creatures in the flock, so that iteratively, the entire flock can get close to the global optimum.
Constructors
PSOType<
VelocityUpdatePolicy, InitPolicy
>()
PSOType<
VelocityUpdatePolicy, InitPolicy
>(
numParticles
)
PSOType<
VelocityUpdatePolicy, InitPolicy
>(
numParticles, lowerBound, upperBound
)
PSOType<
VelocityUpdatePolicy, InitPolicy
>(
numParticles, lowerBound, upperBound, maxIterations
)
PSOType<
VelocityUpdatePolicy, InitPolicy
>(
numParticles, lowerBound, upperBound, maxIterations, horizonSize
)
PSOType<
VelocityUpdatePolicy, InitPolicy
>(
numParticles, lowerBound, upperBound, maxIterations, horizonSize, impTolerance
)
PSOType<
VelocityUpdatePolicy, InitPolicy
>(
numParticles, lowerBound, upperBound, maxIterations, horizonSize, impTolerance, exploitationFactor, explorationFactor
)
Attributes
type | name | description | default |
---|---|---|---|
size_t |
numParticles |
numParticles Number of particles in the swarm. | 64 |
double , arma::mat |
lowerBound |
Lower bound of the coordinates of the initial population. | 1 |
double , arma::mat |
upperBound |
Upper bound of the coordinates of the initial population. | 1 |
size_t |
maxIterations |
Maximum number of iterations allowed. | 3000 |
size_t |
horizonSize |
Size of the lookback-horizon for computing improvement. | 350 |
double |
impTolerance |
The final value of the objective function for termination. If set to negative value, tolerance is not considered. | 1e-5 |
double |
exploitationFactor |
Influence of the personal best of the particle. | 2.05 |
double |
explorationFactor |
Influence of the neighbours of the particle. | 2.05 |
Note that the parameters lowerBound
and upperBound
are overloaded. Data types of double
or arma::mat
may be used. If they are initialized as single values of double
, then the same value of the bound applies to all the axes, resulting in an initialization following a uniform distribution in a hypercube. If they are initialized as matrices of arma::mat
, then the value of lowerBound[i]
applies to axis [i]
; similarly, for values in upperBound
. This results in an initialization following a uniform distribution in a hyperrectangle within the specified bounds.
Attributes of the optimizer may also be changed via the member methods
NumParticles()
, LowerBound()
, UpperBound()
, MaxIterations()
,
HorizonSize()
, ImpTolerance()
,ExploitationFactor()
, and
ExplorationFactor()
.
At present, only the local-best variant of PSO is present in ensmallen. The optimizer may be initialized using the class type LBestPSO
, which is an alias for PSOType<LBestUpdate, DefaultInit>
.
Examples:
Click to collapse/expand example code.
SphereFunction f(4);
arma::vec coordinates = f.GetInitialPoint();
LBestPSO s;
const double result = s.Optimize(f, coordinates)
Click to collapse/expand example code.
RosenbrockFunction f;
arma::vec coordinates = f.GetInitialPoint();
// Setting bounds for the initial swarm population of size 2.
arma::vec lowerBound("50 50");
arma::vec upperBound("60 60");
LBestPSO s(200, lowerBound, upperBound, 3000, 600, 1e-30, 2.05, 2.05);
const double result = s.Optimize(f, coordinates)
Click to collapse/expand example code.
RosenbrockFunction f;
arma::vec coordinates = f.GetInitialPoint();
// Setting bounds for the initial swarm population as type double.
double lowerBound = 50;
double upperBound = 60;
LBestPSO s(64, lowerBound, upperBound, 3000, 400, 1e-30, 2.05, 2.05);
const double result = s.Optimize(f, coordinates)
See also:
Primal-dual SDP Solver
An optimizer for semidefinite programs.
A primal-dual interior point method solver. This can solve semidefinite programs.
Constructors
PrimalDualSolver<>(
maxIterations
)
PrimalDualSolver<>(
maxIterations, tau, normXzTol, primalInfeasTol, dualInfeasTol
)
Attributes
The PrimalDualSolver<>
class has several attributes that are only modifiable
as member methods.
type | method name | description | default |
---|---|---|---|
double |
Tau() |
Value of tau used to compute alpha_hat. | 0.99 |
double |
NormXZTol() |
Tolerance for the norm of X*Z. | 1e-7 |
double |
PrimalInfeasTol() |
Tolerance for primal infeasibility. | 1e-7 |
double |
DualInfeasTol() |
Tolerance for dual infeasibility. | 1e-7 |
size_t |
MaxIterations() |
Maximum number of iterations before convergence. | 1000 |
Optimization
The PrimalDualSolver<>
class offers two overloads of Optimize()
that
optionally return the converged values for the dual variables.
Click to collapse/expand example code.
/**
* Invoke the optimization procedure, returning the converged values for the
* primal and dual variables.
*/
template<typename SDPType>
double Optimize(SDPType& s,
arma::mat& X,
arma::vec& ySparse,
arma::vec& yDense,
arma::mat& Z);
/**
* Invoke the optimization procedure, and only return the primal variable.
*/
template<typename SDPType>
double Optimize(SDPType& s, arma::mat& X);
The SDPType
template parameter specifies the type of SDP to solve. The
SDP<arma::mat>
and SDP<arma::sp_mat>
classes are available for use; these
represent SDPs with dense and sparse C
matrices, respectively. The SDP<>
class is detailed in the semidefinite program
documentation. SDPType
is automatically inferred
when Optimize()
is called with an SDP.
See also:
- Primal-dual interior-point methods for semidefinite programming
- Semidefinite programming on Wikipedia
- Semidefinite programs (includes example usage of
PrimalDualSolver
)
Quasi-Hyperbolic Momentum Update SGD (QHSGD)
An optimizer for differentiable separable functions.
Quasi-hyperbolic momentum update SGD (QHSGD) is an SGD-like optimizer with momentum where quasi-hyperbolic terms are added to the parametrization. The update rule for this optimizer is a weighted average of momentum SGD and vanilla SGD.
Constructors
QHSGD()
QHSGD(
stepSize, batchSize
)
QHSGD(
stepSize, batchSize, maxIterations, tolerance, shuffle, exactObjective
)
Note that QHSGD
is based on the templated type
SGD<
UpdatePolicyType, DecayPolicyType
>
with UpdatePolicyType
=
QHUpdate
and DecayPolicyType
= NoDecay
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Batch size to use for each step. | 32 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, MaxIterations()
, Tolerance()
, Shuffle()
, and ExactObjective()
.
Note that the QHUpdate
class has the constructor QHUpdate(
v,
momentum
)
with a default value of 0.7
for the quasi-hyperbolic term v
and 0.999
for the momentum term.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
QHSGD optimizer(0.01, 32, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
- Quasi-Hyperbolic Momentum and Adam For Deep Learning
- SGD
- Momentum SGD
- Nesterov Momentum SGD
- SGD in Wikipedia
- Differentiable separable functions
QHAdam
An optimizer for differentiable separable functions.
QHAdam is an optimizer that uses quasi-hyperbolic descent with the Adam
optimizer. This replaces the moment estimators of Adam with quasi-hyperbolic
terms, and various values of the v1
and v2
parameters are equivalent to
the following other optimizers:
-
When
v1 = v2 = 1
,QHAdam
is equivalent toAdam
. -
When
v1 = 0
andv2 = 1
,QHAdam
is equivalent toRMSProp
. -
When
v1 = beta1
andv2 = 1
,QHAdam
is equivalent toNadam
.
Constructors
QHAdam()
QHAdam(
stepSize, batchSize
)
QHAdam(
stepSize, batchSize, v1, v2, beta1, beta2, eps, maxIterations
)
QHAdam(
stepSize, batchSize, v1, v2, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
v1 |
The First Quasi Hyperbolic Term. | 0.7 |
double |
v2 |
The Second Quasi Hyperbolic Term. | 1.00 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Eps()
, MaxIterations()
,
Tolerance()
, Shuffle()
, V1()
, V2()
, ResetPolicy()
, and ExactObjective()
.
Examples
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
QHAdam optimizer(0.001, 32, 0.7, 0.9, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
- Quasi-Hyperbolic Momentum and Adam For Deep Learning
- SGD in Wikipedia
- SGD
- Adam
- RMSprop
- Nadam
- Incorporating Nesterov Momentum into Adam
- Differentiable separable functions
RMSProp
An optimizer for differentiable separable functions.
RMSProp utilizes the magnitude of recent gradients to normalize the gradients.
Constructors
RMSProp()
RMSProp(
stepSize, batchSize
)
RMSProp(
stepSize, batchSize, alpha, epsilon, maxIterations, tolerance, shuffle
)
RMSProp(
stepSize, batchSize, alpha, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Number of points to process in each step. | 32 |
double |
alpha |
Smoothing constant, similar to that used in AdaDelta and momentum methods. | 0.99 |
double |
epsilon |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer can also be modified via the member methods
StepSize()
, BatchSize()
, Alpha()
, Epsilon()
, MaxIterations()
,
Tolerance()
, Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
RMSProp optimizer(1e-3, 1, 0.99, 1e-8, 5000000, 1e-9, true);
optimizer.Optimize(f, coordinates);
See also:
- Divide the gradient by a running average of its recent magnitude
- Stochastic gradient descent in Wikipedia
- Differentiable separable functions
Simulated Annealing (SA)
An optimizer for arbitrary functions.
Simulated Annealing is an stochastic optimization algorithm which is able to deliver near-optimal results quickly without knowing the gradient of the function being optimized. It has a unique hill climbing capability that makes it less vulnerable to local minima. This implementation uses exponential cooling schedule and feedback move control by default, but the cooling schedule can be changed via a template parameter.
Constructors
SA<
CoolingScheduleType
>(
coolingSchedule
)
SA<
CoolingScheduleType
>(
coolingSchedule, maxIterations
)
SA<
CoolingScheduleType
>(
coolingSchedule, maxIterations, initT, initMoves, moveCtrlSweep, tolerance, maxToleranceSweep, maxMoveCoef, initMoveCoef, gain
)
The CoolingScheduleType
template parameter implements a policy to update the
temperature. The ExponentialSchedule
class is available for use; it has a
constructor ExponentialSchedule(
lambda
)
where lambda
is the cooling
speed (default 0.001
). Custom schedules may be created by implementing a
class with at least the single member method below:
Click to collapse/expand example code.
// Return the next temperature given the current system status.
double NextTemperature(const double currentTemperature,
const double currentEnergy);
For convenience, the default cooling schedule is ExponentialSchedule
, so the
shorter type SA<>
may be used instead of the equivalent
SA<ExponentialSchedule>
.
Attributes
type | name | description | default |
---|---|---|---|
CoolingScheduleType |
coolingSchedule |
Instantiated cooling schedule (default ExponentialSchedule). | CoolingScheduleType() |
size_t |
maxIterations |
Maximum number of iterations allowed (0 indicates no limit). | 1000000 |
double |
initT |
Initial temperature. | 10000.0 |
size_t |
initMoves |
Number of initial iterations without changing temperature. | 1000 |
size_t |
moveCtrlSweep |
Sweeps per feedback move control. | 100 |
double |
tolerance |
Tolerance to consider system frozen. | 1e-5 |
size_t |
maxToleranceSweep |
Maximum sweeps below tolerance to consider system frozen. | 3 |
double |
maxMoveCoef |
Maximum move size. | 20 |
double |
initMoveCoef |
Initial move size. | 0.3 |
double |
gain |
Proportional control in feedback move control. | 0.3 |
Attributes of the optimizer may also be changed via the member methods
CoolingSchedule()
, MaxIterations()
, InitT()
, InitMoves()
,
MoveCtrlSweep()
, Tolerance()
, MaxToleranceSweep()
, MaxMoveCoef()
,
InitMoveCoef()
, and Gain()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
SA<> optimizer(ExponentialSchedule(), 1000000, 1000., 1000, 100, 1e-10, 3, 1.5,
0.5, 0.3);
optimizer.Optimize(f, coordinates);
See also:
Simultaneous Perturbation Stochastic Approximation (SPSA)
An optimizer for arbitrary functions.
The SPSA algorithm approximates the gradient of the function by finite differences along stochastic directions.
Constructors
SPSA(
alpha, gamma, stepSize, evaluationStepSize, maxIterations, tolerance
)
Attributes
type | name | description | default |
---|---|---|---|
double |
alpha |
Scaling exponent for the step size. | 0.602 |
double |
gamma |
Scaling exponent for evaluation step size. | 0.101 |
double |
stepSize |
Scaling parameter for step size (named as ‘a’ in the paper). | 0.16 |
double |
evaluationStepSize |
Scaling parameter for evaluation step size (named as ‘c’ in the paper). | 0.3 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
Attributes of the optimizer may also be changed via the member methods
Alpha()
, Gamma()
, StepSize()
, EvaluationStepSize()
, and MaxIterations()
.
Examples:
Click to collapse/expand example code.
SphereFunction f(2);
arma::mat coordinates = f.GetInitialPoint();
SPSA optimizer(0.1, 0.102, 0.16, 0.3, 100000, 1e-5);
optimizer.Optimize(f, coordinates);
See also:
- An Overview of the Simultaneous Perturbation Method for Efficient Optimization
- SPSA on Wikipedia
- Stochastic gradient descent in Wikipedia
- Differentiable separable functions
Stochastic Recursive Gradient Algorithm (SARAH/SARAH+)
An optimizer for differentiable separable functions.
StochAstic Recusive gRadient algoritHm (SARAH), is a variance reducing stochastic recursive gradient algorithm which employs the stochastic recursive gradient, for solving empirical loss minimization for the case of nonconvex losses.
Constructors
SARAHType<
UpdatePolicyType
>()
SARAHType<
UpdatePolicyType
>(
stepSize, batchSize
)
SARAHType<
UpdatePolicyType
>(
stepSize, batchSize, maxIterations, innerIterations, tolerance, shuffle, updatePolicy, exactObjective
)
The UpdatePolicyType
template parameter specifies the update step used for
the optimizer. The SARAHUpdate
and SARAHPlusUpdate
classes are available
for use, and implement the standard SARAH update and SARAH+ update,
respectively. A custom update rule can be used by implementing a class with the
same method signatures.
For convenience the following typedefs have been defined:
SARAH
(equivalent toSARAHType<SARAHUpdate>
): the standard SARAH optimizerSARAH_Plus
(equivalent toSARAHType<SARAHPlusUpdate>
): the SARAH+ optimizer
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Batch size to use for each step. | 32 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 1000 |
size_t |
innerIterations |
The number of inner iterations allowed (0 means n / batchSize). Note that the full gradient is only calculated in the outer iteration. | 0 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
UpdatePolicyType |
updatePolicy |
Instantiated update policy used to adjust the given parameters. | UpdatePolicyType() |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be changed via the member methods
StepSize()
, BatchSize()
, MaxIterations()
, InnerIterations()
,
Tolerance()
, Shuffle()
, UpdatePolicy()
, and ExactObjective()
.
Note that the default value for updatePolicy
is the default constructor for
the UpdatePolicyType
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
// Standard stochastic variance reduced gradient.
SARAH optimizer(0.01, 1, 5000, 0, 1e-5, true);
optimizer.Optimize(f, coordinates);
// Stochastic variance reduced gradient with Barzilai-Borwein.
SARAH_Plus optimizerPlus(0.01, 1, 5000, 0, 1e-5, true);
optimizerPlus.Optimize(f, coordinates);
See also:
- Stochastic Recursive Gradient Algorithm for Nonconvex Optimization
- Stochastic gradient descent in Wikipedia
- Differentiable separable functions
Standard SGD
An optimizer for differentiable separable functions.
Stochastic Gradient Descent is a technique for minimizing a function which can be expressed as a sum of other functions. It’s likely better to use any of the other variants of SGD than this class; however, this standard SGD implementation may still be useful in some situations.
Constructors
StandardSGD()
StandardSGD(
stepSize, batchSize
)
StandardSGD(
stepSize, batchSize, maxIterations, tolerance, shuffle, updatePolicy, decayPolicy, resetPolicy, exactObjective
)
Note that StandardSGD
is based on the templated type
SGD<
UpdatePolicyType, DecayPolicyType
>
with UpdatePolicyType
=
VanillaUpdate
and DecayPolicyType
= NoDecay
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Batch size to use for each step. | 32 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
UpdatePolicyType |
updatePolicy |
Instantiated update policy used to adjust the given parameters. | UpdatePolicyType() |
DecayPolicyType |
decayPolicy |
Instantiated decay policy used to adjust the step size. | DecayPolicyType() |
bool |
resetPolicy |
Flag that determines whether update policy parameters are reset before every Optimize call. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, MaxIterations()
, Tolerance()
, Shuffle()
, UpdatePolicy()
, DecayPolicy()
, ResetPolicy()
, and
ExactObjective()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
StandardSGD optimizer(0.01, 32, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
Stochastic Gradient Descent with Restarts (SGDR)
An optimizer for differentiable separable functions.
SGDR is based on Mini-batch Stochastic Gradient Descent class and simulates a new warm-started run/restart once a number of epochs are performed.
Constructors
SGDR<
UpdatePolicyType
>()
SGDR<
UpdatePolicyType
>(
epochRestart, multFactor, batchSize, stepSize
)
SGDR<
UpdatePolicyType
>(
epochRestart, multFactor, batchSize, stepSize, maxIterations, tolerance, shuffle, updatePolicy
)
SGDR<
UpdatePolicyType
>(
epochRestart, multFactor, batchSize, stepSize, maxIterations, tolerance, shuffle, updatePolicy, resetPolicy, exactObjective
)
The UpdatePolicyType
template parameter controls the update policy used
during the iterative update process. The MomentumUpdate
class is available
for use, and custom behavior can be achieved by implementing a class with the
same method signatures as MomentumUpdate
.
For convenience, the default type of UpdatePolicyType
is MomentumUpdate
,
so the shorter type SGDR<>
can be used instead of the equivalent
SGDR<MomentumUpdate>
.
Attributes
type | name | description | default |
---|---|---|---|
size_t |
epochRestart |
Initial epoch where decay is applied. | 50 |
double |
multFactor |
Batch size multiplication factor. | 2.0 |
size_t |
batchSize |
Size of each mini-batch. | 1000 |
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the mini-batch order is shuffled; otherwise, each mini-batch is visited in linear order. | true |
UpdatePolicyType |
updatePolicy |
Instantiated update policy used to adjust the given parameters. | UpdatePolicyType() |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer can also be modified via the member methods
EpochRestart()
, MultFactor()
, BatchSize()
, StepSize()
,
MaxIterations()
, Tolerance()
, Shuffle()
, UpdatePolicy()
, ResetPolicy()
, and
ExactObjective()
.
Note that the default value for updatePolicy
is the default constructor for
the UpdatePolicyType
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
SGDR<> optimizer(50, 2.0, 1, 0.01, 10000, 1e-3);
optimizer.Optimize(f, coordinates);
See also:
- SGDR: Stochastic Gradient Descent with Warm Restarts
- Stochastic gradient descent in Wikipedia
- Differentiable separable functions
Snapshot Stochastic Gradient Descent with Restarts (SnapshotSGDR)
An optimizer for differentiable separable functions.
SnapshotSGDR simulates a new warm-started run/restart once a number of epochs are performed using the Snapshot Ensembles technique.
Constructors
SnapshotSGDR<
UpdatePolicyType
>()
SnapshotSGDR<
UpdatePolicyType
>(
epochRestart, multFactor, batchSize, stepSize
)
SnapshotSGDR<
UpdatePolicyType
>(
epochRestart, multFactor, batchSize, stepSize, maxIterations, tolerance, shuffle, snapshots, accumulate, updatePolicy
)
SnapshotSGDR<
UpdatePolicyType
>(
epochRestart, multFactor, batchSize, stepSize, maxIterations, tolerance, shuffle, snapshots, accumulate, updatePolicy, resetPolicy, exactObjective
)
The UpdatePolicyType
template parameter controls the update policy used
during the iterative update process. The MomentumUpdate
class is available
for use, and custom behavior can be achieved by implementing a class with the
same method signatures as MomentumUpdate
.
For convenience, the default type of UpdatePolicyType
is MomentumUpdate
,
so the shorter type SnapshotSGDR<>
can be used instead of the equivalent
SnapshotSGDR<MomentumUpdate>
.
Attributes
type | name | description | default |
---|---|---|---|
size_t |
epochRestart |
Initial epoch where decay is applied. | 50 |
double |
multFactor |
Batch size multiplication factor. | 2.0 |
size_t |
batchSize |
Size of each mini-batch. | 1000 |
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the mini-batch order is shuffled; otherwise, each mini-batch is visited in linear order. | true |
size_t |
snapshots |
Maximum number of snapshots. | 5 |
bool |
accumulate |
Accumulate the snapshot parameter. | true |
UpdatePolicyType |
updatePolicy |
Instantiated update policy used to adjust the given parameters. | UpdatePolicyType() |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer can also be modified via the member methods
EpochRestart()
, MultFactor()
, BatchSize()
, StepSize()
,
MaxIterations()
, Tolerance()
, Shuffle()
, Snapshots()
, Accumulate()
,
UpdatePolicy()
, ResetPolicy()
, and ExactObjective()
.
The Snapshots()
function returns a std::vector<arma::mat>&
(a vector of
snapshots of the parameters), not a size_t
representing the maximum number of
snapshots.
Note that the default value for updatePolicy
is the default constructor for
the UpdatePolicyType
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
SnapshotSGDR<> optimizer(50, 2.0, 1, 0.01, 10000, 1e-3);
optimizer.Optimize(f, coordinates);
See also:
- Snapshot ensembles: Train 1, get m for free
- SGDR: Stochastic Gradient Descent with Warm Restarts
- Stochastic gradient descent in Wikipedia
- Differentiable separable functions
SMORMS3
An optimizer for differentiable separable functions.
SMORMS3 is a hybrid of RMSprop, which is trying to estimate a safe and optimal distance based on curvature or perhaps just normalizing the step-size in the parameter space.
Constructors
SMORMS3()
SMORMS3(
stepSize, batchSize
)
SMORMS3(
stepSize, batchSize, epsilon, maxIterations, tolerance
)
SMORMS3(
stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process at each step. | 32 |
double |
epsilon |
Value used to initialize the mean squared gradient parameter. | 1e-16 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the mini-batch order is shuffled; otherwise, each mini-batch is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer can also be modified via the member methods
StepSize()
, BatchSize()
, Epsilon()
, MaxIterations()
, Tolerance()
,
Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
SMORMS3 optimizer(0.001, 1, 1e-16, 5000000, 1e-9, true);
optimizer.Optimize(f, coordinates);
See also:
- RMSprop loses to SMORMS3 - Beware the Epsilon!
- RMSProp
- Stochastic gradient descent in Wikipedia
- Differentiable separable functions
Standard stochastic variance reduced gradient (SVRG)
An optimizer for differentiable separable functions.
Stochastic Variance Reduced Gradient is a technique for minimizing smooth and strongly convex problems.
Constructors
SVRGType<
UpdatePolicyType, DecayPolicyType
>()
SVRGType<
UpdatePolicyType, DecayPolicyType
>(
stepSize
)
SVRGType<
UpdatePolicyType, DecayPolicyType
>(
stepSize, batchSize, maxIterations, innerIterations
)
SVRGType<
UpdatePolicyType, DecayPolicyType
>(
stepSize, batchSize, maxIterations, innerIterations, tolerance, shuffle, updatePolicy, decayPolicy, resetPolicy, exactObjective
)
The UpdatePolicyType
template parameter controls the update step used by
SVRG during the optimization. The SVRGUpdate
class is available for use and
custom update behavior can be achieved by implementing a class with the same
method signatures as SVRGUpdate
.
The DecayPolicyType
template parameter controls the decay policy used to
adjust the step size during the optimization. The BarzilaiBorweinDecay
and
NoDecay
classes are available for use. Custom decay functionality can be
achieved by implementing a class with the same method signatures.
For convenience the following typedefs have been defined:
SVRG
(equivalent toSVRGType<SVRGUpdate, NoDecay>
): the standard SVRG techniqueSVRG_BB
(equivalent toSVRGType<SVRGUpdate, BarzilaiBorweinDecay>
): SVRG with the Barzilai-Borwein decay policy
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Initial batch size. | 32 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 1000 |
size_t |
innerIterations |
The number of inner iterations allowed (0 means n / batchSize). Note that the full gradient is only calculated in the outer iteration. | 0 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the batch order is shuffled; otherwise, each batch is visited in linear order. | true |
UpdatePolicyType |
updatePolicy |
Instantiated update policy used to adjust the given parameters. | UpdatePolicyType() |
DecayPolicyType |
decayPolicy |
Instantiated decay policy used to adjust the step size. | DecayPolicyType() |
bool |
resetPolicy |
Flag that determines whether update policy parameters are reset before every Optimize call. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, MaxIterations()
, InnerIterations()
,
Tolerance()
, Shuffle()
, UpdatePolicy()
, DecayPolicy()
, ResetPolicy()
, and
ExactObjective()
.
Note that the default values for the updatePolicy
and decayPolicy
parameters
are simply the default constructors of the UpdatePolicyType
and
DecayPolicyType
classes.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
// Standard stochastic variance reduced gradient.
SVRG optimizer(0.005, 1, 300, 0, 1e-10, true);
optimizer.Optimize(f, coordinates);
// Stochastic variance reduced gradient with Barzilai-Borwein.
SVRG_BB bbOptimizer(0.005, batchSize, 300, 0, 1e-10, true, SVRGUpdate(),
BarzilaiBorweinDecay(0.1));
bbOptimizer.Optimize(f, coordinates);
See also:
- Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
- SGD
- SGD in Wikipedia
- Differentiable separable functions
SPALeRA Stochastic Gradient Descent (SPALeRASGD)
An optimizer for differentiable separable functions.
SPALeRA involves two components: a learning rate adaptation scheme, which ensures that the learning system goes as fast as it can; and a catastrophic event manager, which is in charge of detecting undesirable behaviors and getting the system back on track.
Constructors
SPALeRASGD<
DecayPolicyType
>()
SPALeRASGD<
DecayPolicyType
>(
stepSize, batchSize
)
SPALeRASGD<
DecayPolicyType
>(
stepSize, batchSize, maxIterations, tolerance
)
SPALeRASGD<
DecayPolicyType
>(
stepSize, batchSize, maxIterations, tolerance, lambda, alpha, epsilon, adaptRate, shuffle, decayPolicy, resetPolicy, exactObjective
)
The DecayPolicyType
template parameter controls the decay in the step size
during the course of the optimization. The NoDecay
class is available for
use; custom behavior can be achieved by implementing a class with the same
method signatures.
By default, DecayPolicyType
is set to NoDecay
, so the shorter type
SPALeRASGD<>
can be used instead of the equivalent SPALeRASGD<NoDecay>
.
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.01 |
size_t |
batchSize |
Initial batch size. | 32 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
double |
lambda |
Page-Hinkley update parameter. | 0.01 |
double |
alpha |
Memory parameter of the Agnostic Learning Rate adaptation. | 0.001 |
double |
epsilon |
Numerical stability parameter. | 1e-6 |
double |
adaptRate |
Agnostic learning rate update rate. | 3.10e-8 |
bool |
shuffle |
If true, the batch order is shuffled; otherwise, each batch is visited in linear order. | true |
DecayPolicyType |
decayPolicy |
Instantiated decay policy used to adjust the step size. | DecayPolicyType() |
bool |
resetPolicy |
Flag that determines whether update policy parameters are reset before every Optimize call. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, MaxIterations()
, Tolerance()
, Lambda()
,
Alpha()
, Epsilon()
, AdaptRate()
, Shuffle()
, DecayPolicy()
, ResetPolicy()
, and ExactObjective()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
SPALeRASGD<> optimizer(0.05, 1, 10000, 1e-4);
optimizer.Optimize(f, coordinates);
See also:
- Stochastic Gradient Descent: Going As Fast As Possible But Not Faster
- SGD
- SGD in Wikipedia
- Differentiable separable functions
SWATS
An optimizer for differentiable separable functions.
SWATS is an optimizer that uses a simple strategy to switch from Adam to standard SGD when a triggering condition is satisfied. The condition relates to the projection of Adam steps on the gradient subspace.
Constructors
SWATS()
SWATS(
stepSize, batchSize
)
SWATS(
stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance
)
SWATS(
stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process at each step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
epsilon |
Value used to initialize the mean squared gradient parameter. | 1e-16 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the mini-batch order is shuffled; otherwise, each mini-batch is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer can also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Epsilon()
,
MaxIterations()
, Tolerance()
, Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
SWATS optimizer(0.001, 1, 0.9, 0.999, 1e-16, 5000000, 1e-9, true);
optimizer.Optimize(f, coordinates);
See also:
- Improving generalization performance by switching from Adam to SGD
- Adam
- Standard SGD
- Stochastic gradient descent in Wikipedia
- Differentiable separable functions
WNGrad
An optimizer for differentiable separable functions.
WNGrad is a general nonlinear update rule for the learning rate. WNGrad has near-optimal convergence rates in both the batch and stochastic settings.
Constructors
WNGrad()
WNGrad(
stepSize, batchSize
)
WNGrad(
stepSize, batchSize, maxIterations, tolerance, shuffle
)
WNGrad(
stepSize, batchSize, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.562 |
size_t |
batchSize |
Initial batch size. | 32 |
size_t |
maxIterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the batch order is shuffled; otherwise, each batch is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
Attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, MaxIterations()
, Tolerance()
, Shuffle()
, ResetPolicy()
, and
ExactObjective()
.
Examples
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
WNGrad<> optimizer(0.562, 1, 10000, 1e-4);
optimizer.Optimize(f, coordinates);
See also:
- WNGrad: Learn the Learning Rate in Gradient Descent
- SGD
- SGD in Wikipedia
- Differentiable separable functions
Yogi
An optimizer for differentiable separable functions.
Yogi is an optimization algorithm based on Adam with more fine-grained effective learning rate control, which uses additive updates instead of multiplicative updates for the moving average of the squared gradient. In addition, Yogi has similar theoretical guarantees on convergence as Adam.
Constructors
Yogi()
Yogi(
stepSize, batchSize
)
Yogi(
stepSize, batchSize, beta1, beta2, eps, maxIterations
)
Yogi(
stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy, exactObjective
)
Attributes
type | name | description | default |
---|---|---|---|
double |
stepSize |
Step size for each iteration. | 0.001 |
size_t |
batchSize |
Number of points to process in a single step. | 32 |
double |
beta1 |
Exponential decay rate for the first moment estimates. | 0.9 |
double |
beta2 |
Exponential decay rate for the weighted infinity norm estimates. | 0.999 |
double |
eps |
Value used to initialize the mean squared gradient parameter. | 1e-8 |
size_t |
max_iterations |
Maximum number of iterations allowed (0 means no limit). | 100000 |
double |
tolerance |
Maximum absolute tolerance to terminate algorithm. | 1e-5 |
bool |
shuffle |
If true, the function order is shuffled; otherwise, each function is visited in linear order. | true |
bool |
resetPolicy |
If true, parameters are reset before every Optimize call; otherwise, their values are retained. | true |
bool |
exactObjective |
Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data). | false |
The attributes of the optimizer may also be modified via the member methods
StepSize()
, BatchSize()
, Beta1()
, Beta2()
, Eps()
, MaxIterations()
,
Tolerance()
, Shuffle()
, ResetPolicy()
, and ExactObjective()
.
Examples
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
Yogi optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);
See also:
- Adaptive Methods for Nonconvex Optimization
- SGD in Wikipedia
- SGD
- Adam
- Differentiable separable functions
Callback documentation
Callbacks in ensmallen are methods that are called at various states during the optimization process, which can be used to implement and control behaviors such as:
- Changing the learning rate.
- Printing of the current objective.
- Sending a message when the optimization hits a specific state such us a minimal objective.
Callbacks can be passed as an argument to the Optimize()
function:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));
// Pass the built-in *PrintLoss* callback as the last argument to the
// *Optimize()* function.
optimizer.Optimize(f, coordinates, PrintLoss());
Passing multiple callbacks is just the same as passing a single callback:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));
// Pass the built-in *PrintLoss* and *EarlyStopAtMinLoss* callback as the last
// argument to the *Optimize()* function.
optimizer.Optimize(f, coordinates, PrintLoss(), EarlyStopAtMinLoss());
It is also possible to pass a callback instantiation that allows accessing of internal callback parameters at a later state:
Click to collapse/expand example code.
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));
// Create an instantiation of the built-in *StoreBestCoordinates* callback,
// which will store the best objective and the corresponding model parameter
// that can be accessed later.
StoreBestCoordinates<> callback;
// Pass an instantiation of the built-in *StoreBestCoordinates* callback as the
// last argument to the *Optimize()* function.
optimizer.Optimize(f, coordinates, callback);
// Print the minimum objective that is stored inside the *StoreBestCoordinates*
// callback that was passed to the *Optimize()* call.
std::cout << callback.BestObjective() << std::endl;
Built-in Callbacks
EarlyStopAtMinLoss
Stops the optimization process if the loss stops decreasing or no improvement has been made.
Constructors
EarlyStopAtMinLoss()
EarlyStopAtMinLoss(
patience
)
EarlyStopAtMinLoss(
func
)
EarlyStopAtMinLoss(
func
,
patience
)
Attributes
type | name | description | default |
---|---|---|---|
size_t |
patience |
The number of epochs to wait after the minimum loss has been reached. | 10 |
std::function<double(const arma::mat&)> |
func |
A callback to return immediate loss evaluated by the function. |
Note that for the func
argument above, if a
different matrix type is desired, instead of using
the class EarlyStopAtMinLoss
, the class EarlyStopAtMinLossType<MatType>
should be used.
Examples:
Click to collapse/expand example code.
AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, EarlyStopAtMinLoss());
Another example of using lambda in the constructor.
// Generate random training data and labels.
arma::mat trainingData(5, 100, arma::fill::randu);
arma::Row<size_t> trainingLabels =
arma::randi<arma::Row<size_t>>(100, arma::distr_param(0, 1));
// Generate a validation set.
arma::mat validationData(5, 100, arma::fill::randu);
arma::Row<size_t> validationLabels =
arma::randi<arma::Row<size_t>>(100, arma::distr_param(0, 1));
// Create a LogisticRegressionFunction for both the training and validation data.
LogisticRegressionFunction lrfTrain(trainingData, trainingLabels);
LogisticRegressionFunction lrfValidation(validationData, validationLabels);
// Create a callback that will terminate when the validation loss starts to
// increase.
EarlyStopAtMinLoss cb(
[&](const arma::mat& coordinates)
{
// You could also, e.g., print the validation loss here to watch it converge.
return lrfValidation.Evaluate(coordinates);
});
arma::mat coordinates = lrfTrain.GetInitialPoint();
SMORMS3 smorms3;
smorms3.Optimize(lrfTrain, coordinates, cb);
GradClipByNorm
One difficulty with optimization is that large parameter gradients can lead an optimizer to update the parameters strongly into a region where the loss function is much greater, effectively undoing much of the work done to get to the current solution. Such large updates during the optimization can cause a numerical overflow or underflow, often referred to as “exploding gradients”. The exploding gradient problem can be caused by: Choosing the wrong learning rate which leads to huge updates in the gradients. Failing to scale a data set leading to very large differences between data points. Applying a loss function that computes very large error values.
A common answer to the exploding gradients problem is to change the derivative
of the error before applying the update step. One option is to clip the norm
||g||
of the gradient g
before a parameter update. So given the gradient,
and a maximum norm value, the callback normalizes the gradient so that its
L2-norm is less than or equal to the given maximum norm value.
Constructors
GradClipByNorm(
maxNorm
)
Attributes
type | name | description | default |
---|---|---|---|
double |
maxNorm |
The maximum clipping value. |
Examples:
Click to collapse/expand example code.
AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, GradClipByNorm(0.3));
GradClipByValue
One difficulty with optimization is that large parameter gradients can lead an optimizer to update the parameters strongly into a region where the loss function is much greater, effectively undoing much of the work done to get to the current solution. Such large updates during the optimization can cause a numerical overflow or underflow, often referred to as “exploding gradients”. The exploding gradient problem can be caused by: Choosing the wrong learning rate which leads to huge updates in the gradients. Failing to scale a data set leading to very large differences between data points. Applying a loss function that computes very large error values.
A common answer to the exploding gradients problem is to change the derivative of the error before applying the update step. One option is to clip the parameter gradient element-wise before a parameter update.
Constructors
GradClipByValue(
min, max
)
Attributes
type | name | description | default |
---|---|---|---|
double |
min |
The minimum value to clip to. | |
double |
max |
The maximum value to clip to. |
Examples:
Click to collapse/expand example code.
AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, GradClipByValue(0, 1.3));
PrintLoss
Callback that prints loss to stdout or a specified output stream.
Constructors
PrintLoss()
PrintLoss(
output
)
Attributes
type | name | description | default |
---|---|---|---|
std::ostream |
output |
Ostream which receives output from this object. | stdout |
Examples:
Click to collapse/expand example code.
AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, PrintLoss());
ProgressBar
Callback that prints a progress bar to stdout or a specified output stream.
Constructors
ProgressBar()
ProgressBar(
width
)
ProgressBar(
width, output
)
Attributes
type | name | description | default |
---|---|---|---|
size_t |
width |
Width of the bar. | 70 |
std::ostream |
output |
Ostream which receives output from this object. | stdout |
Examples:
Click to collapse/expand example code.
AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, ProgressBar());
Report
Callback that prints a optimizer report to stdout or a specified output stream.
Constructors
Report()
Report(
iterationsPercentage
)
Report(
iterationsPercentage, output
)
Report(
iterationsPercentage, output, outputMatrixSize
)
Attributes
type | name | description | default |
---|---|---|---|
double |
iterationsPercentage |
The number of iterations to report in percent, between [0, 1]. | 0.1 |
std::ostream |
output |
Ostream which receives output from this object. | stdout |
size_t |
outputMatrixSize |
The number of values to output for the function coordinates. | 4 |
Examples:
Click to collapse/expand example code.
AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, Report(0.1));
Click to collapse/expand example output.
Optimization Report
--------------------------------------------------------------------------------
Initial Coordinates:
-1.2000 1.0000
Final coordinates:
-1.0490 1.1070
iter loss loss change |gradient| step size total time
0 24.2 0 233 1 4.27e-05
100 8.6 15.6 104 1 0.000215
200 5.26 3.35 48.7 1 0.000373
300 4.49 0.767 23.4 1 0.000533
400 4.31 0.181 11.3 1 0.000689
500 4.27 0.0431 5.4 1 0.000846
600 4.26 0.012 2.86 1 0.00101
700 4.25 0.00734 2.09 1 0.00117
800 4.24 0.00971 1.95 1 0.00132
900 4.22 0.0146 1.91 1 0.00148
--------------------------------------------------------------------------------
Version:
ensmallen: 2.13.0 (Automatically Automated Automation)
armadillo: 9.900.1 (Nocturnal Misbehaviour)
Function:
Number of functions: 1
Coordinates rows: 2
Coordinates columns: 1
Loss:
Initial 24.2
Final 4.2
Change 20
Optimizer:
Maximum iterations: 1000
Reached maximum iterations: true
Batchsize: 1
Iterations: 1000
Number of epochs: 1001
Initial step size: 1
Final step size: 1
Coordinates max. norm: 233
Evaluate calls: 1000
Gradient calls: 1000
Time (in seconds): 0.00163
StoreBestCoordinates
Callback that stores the model parameter after every epoch if the objective decreased.
Constructors
StoreBestCoordinates<
ModelMatType
>()
The ModelMatType
template parameter refers to the matrix type of the model
parameter.
Attributes
The stored model parameter can be accessed via the member method
BestCoordinates()
and the best objective via BestObjective()
.
Examples:
Click to collapse/expand example code.
AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
StoreBestCoordinates<arma::mat> cb;
optimizer.Optimize(f, coordinates, cb);
std::cout << "The optimized model found by AdaDelta has the "
<< "parameters " << cb.BestCoordinates();
Callback States
Callbacks are called at several states during the optimization process:
- At the beginning and end of the optimization process.
- After any call to
Evaluate()
andEvaluateConstraint()
. - After any call to
Gradient()
andGradientConstraint()
. - At the start and end of an epoch.
Each callback provides optimization-relevant information that can be accessed or modified.
BeginOptimization
Called at the beginning of the optimization process.
void BeginOptimization(
optimizer, function, coordinates
)
Attributes
type | name | description |
---|---|---|
OptimizerType |
optimizer |
The optimizer used to update the function. |
FunctionType |
function |
The function to be optimized. |
MatType |
coordinates |
The current function parameter. |
EndOptimization
Called at the end of the optimization process.
void EndOptimization(
optimizer, function, coordinates
)
Attributes
type | name | description |
---|---|---|
OptimizerType |
optimizer |
The optimizer used to update the function. |
FunctionType |
function |
The function to be optimized. |
MatType |
coordinates |
The current function parameter. |
Evaluate
Called after any call to Evaluate()
.
bool Evaluate(
optimizer, function, coordinates, objective
)
If the callback returns true
, the optimization will be terminated.
Attributes
type | name | description |
---|---|---|
OptimizerType |
optimizer |
The optimizer used to update the function. |
FunctionType |
function |
The function to be optimized. |
MatType |
coordinates |
The current function parameter. |
double |
objective |
Objective value of the current point. |
EvaluateConstraint
Called after any call to EvaluateConstraint()
.
bool EvaluateConstraint(
optimizer, function, coordinates, constraint, constraintValue
)
If the callback returns true
, the optimization will be terminated.
Attributes
type | name | description |
---|---|---|
OptimizerType |
optimizer |
The optimizer used to update the function. |
FunctionType |
function |
The function to be optimized. |
MatType |
coordinates |
The current function parameter. |
size_t |
constraint |
The index of the constraint. |
double |
constraintValue |
Constraint value of the current point. |
Gradient
Called after any call to Gradient()
.
bool Gradient(
optimizer, function, coordinates, gradient
)
If the callback returns true
, the optimization will be terminated.
Attributes
type | name | description |
---|---|---|
OptimizerType |
optimizer |
The optimizer used to update the function. |
FunctionType |
function |
The function to be optimized. |
MatType |
coordinates |
The current function parameter. |
GradType |
gradient |
Matrix that holds the gradient. |
GradientConstraint
Called after any call to GradientConstraint()
.
bool GradientConstraint(
optimizer, function, coordinates, constraint, gradient
)
If the callback returns true
, the optimization will be terminated.
Attributes
type | name | description |
---|---|---|
OptimizerType |
optimizer |
The optimizer used to update the function. |
FunctionType |
function |
The function to be optimized. |
MatType |
coordinates |
The current function parameter. |
size_t |
constraint |
The index of the constraint. |
GradType |
gradient |
Matrix that holds the gradient. |
BeginEpoch
Called at the beginning of a pass over the data. The objective may be exact or
an estimate depending on exactObjective
value.
bool BeginEpoch(
optimizer, function, coordinates, epoch, objective
)
If the callback returns true
, the optimization will be terminated.
Attributes
type | name | description |
---|---|---|
OptimizerType |
optimizer |
The optimizer used to update the function. |
FunctionType |
function |
The function to be optimized. |
MatType |
coordinates |
The current function parameter. |
size_t |
epoch |
The index of the current epoch. |
double |
objective |
Objective value of the current point. |
EndEpoch
Called at the end of a pass over the data. The objective may be exact or
an estimate depending on exactObjective
value.
bool EndEpoch(
optimizer, function, coordinates, epoch, objective
)
If the callback returns true
, the optimization will be terminated.
Attributes
type | name | description |
---|---|---|
OptimizerType |
optimizer |
The optimizer used to update the function. |
FunctionType |
function |
The function to be optimized. |
MatType |
coordinates |
The current function parameter. |
size_t |
epoch |
The index of the current epoch. |
double |
objective |
Objective value of the current point. |
GenerationalStepTaken
Called after the evolution of a single generation. Intended specifically for MultiObjective Optimizers.
bool GenerationalStepTaken(
optimizer, function, coordinates, objectives, frontIndices
)
If the callback returns true
, the optimization will be terminated.
Attributes
type | name | description |
---|---|---|
OptimizerType |
optimizer |
The optimizer used to update the function. |
FunctionType |
function |
The function to be optimized. |
MatType |
coordinates |
The current function parameter. |
ObjectivesVecType |
objectives |
The set of calculated objectives so far. |
IndicesType |
frontIndices |
The indices of the members belonging to Pareto Front. |
Custom Callbacks
Learning rate scheduling
Setting the learning rate is crucially important when training because it
controls both the speed of convergence and the ultimate performance of the
model. One of the simplest learning rate strategies is to have a fixed learning
rate throughout the training process. Choosing a small learning rate allows the
optimizer to find good solutions, but this comes at the expense of limiting the
initial speed of convergence. To overcome this tradeoff, changing the learning
rate as more epochs have passed is commonly done in model training. The
Evaluate
method in combination with the StepSize
method of the optimizer
can be used to update the variables.
Example code showing how to implement a custom callback to change the learning rate is given below.
Click to collapse/expand example code.
class ExponentialDecay
{
// Set up the exponential decay learning rate scheduler with the user
// specified decay value.
ExponentialDecay(const double decay) : decay(decay), learningRate(0) { }
// Callback function called at the start of the optimization process.
// In this example we will use this to save the initial learning rate.
template<typename OptimizerType, typename FunctionType, typename MatType>
void BeginOptimization(OptimizerType& /* optimizer */,
FunctionType& /* function */,
MatType& /* coordinates */)
{
// Save the initial learning rate.
learningRate = optimizer.StepSize();
}
// Callback function called at the end of a pass over the data. We are only
// interested in the current epoch and the optimizer, we ignore the rest.
template<typename OptimizerType, typename FunctionType, typename MatType>
bool EndEpoch(OptimizerType& optimizer,
FunctionType& /* function */,
const MatType& /* coordinates */,
const size_t epoch,
const double objective)
{
// Update the learning rate.
optimizer.StepSize() = learningRate * (1.0 - std::pow(decay,
(double) epoch));
// Do not terminate the optimization.
return false;
}
double learningRate;
};
int main()
{
// First, generate some random data, with 10000 points and 10 dimensions.
// This data has no pattern and as such will make a model that's not very
// useful---but the purpose here is just demonstration. :)
//
// For a more "real world" situation, load a dataset from file using X.load()
// and y.load() (but make sure the matrix is column-major, so that each
// observation/data point corresponds to a *column*, *not* a row.
arma::mat data(10, 10000, arma::fill::randn);
arma::rowvec responses(10000, arma::fill::randn);
// Create a starting point for our optimization randomly. The model has 10
// parameters, so the shape is 10x1.
arma::mat startingPoint(10, 1, arma::fill::randn);
// Construct the objective function.
LinearRegressionFunction lrf(data, responses);
arma::mat lrfParams(startingPoint);
// Create the StandardSGD optimizer with specified parameters.
// The ens::StandardSGD type can be replaced with any ensmallen optimizer
//that can handle differentiable functions.
StandardSGD optimizer(0.001, 1, 0, 1e-15, true);
// Use the StandardSGD optimizer with specified parameters to minimize the
// LinearRegressionFunction and pass the *exponential decay*
// callback function from above.
optimizer.Optimize(lrf, lrfParams, ExponentialDecay(0.01));
// Print the trained model parameter.
std::cout << lrfParams.t();
}
Early stopping at minimum loss
Early stopping is a technique for controlling overfitting in machine learning models, especially neural networks, by stopping the optimization process before the model has trained for the maximum number of iterations.
Example code showing how to implement a custom callback to stop the optimization when the minimum of loss has been reached is given below.
Click to collapse/expand example code.
#include <ensmallen.hpp>
// This class implements early stopping at minimum loss callback function to
// terminate the optimization process early if the loss stops decreasing.
class EarlyStop
{
public:
// Set up the early stop at min loss class, which keeps track of the minimum
// loss.
EarlyStop() : bestObjective(std::numeric_limits<double>::max()) { }
// Callback function called at the end of a pass over the data, which provides
// the current objective. We are only interested in the objective and ignore
// the rest.
template<typename OptimizerType, typename FunctionType, typename MatType>
bool EndEpoch(OptimizerType& /* optimizer */,
FunctionType& /* function */,
const MatType& /* coordinates */,
const size_t /* epoch */,
const double objective)
{
// Check if the given objective is lower as the previous objective.
if (objective < bestObjective)
{
// Update the local objective.
bestObjective = objective;
}
else
{
// Stop the optimization process.
return true;
}
// Do not stop the optimization process.
return false;
}
// Locally-stored best objective.
double bestObjective;
};
int main()
{
// First, generate some random data, with 10000 points and 10 dimensions.
// This data has no pattern and as such will make a model that's not very
// useful---but the purpose here is just demonstration. :)
//
// For a more "real world" situation, load a dataset from file using X.load()
// and y.load() (but make sure the matrix is column-major, so that each
// observation/data point corresponds to a *column*, *not* a row.
arma::mat data(10, 10000, arma::fill::randn);
arma::rowvec responses(10000, arma::fill::randn);
// Create a starting point for our optimization randomly. The model has 10
// parameters, so the shape is 10x1.
arma::mat startingPoint(10, 1, arma::fill::randn);
// Construct the objective function.
LinearRegressionFunction lrf(data, responses);
arma::mat lrfParams(startingPoint);
// Create the L_BFGS optimizer with default parameters.
// The ens::L_BFGS type can be replaced with any ensmallen optimizer that can
// handle differentiable functions.
ens::L_BFGS lbfgs;
// Use the L_BFGS optimizer with default parameters to minimize the
// LinearRegressionFunction and pass the *early stopping at minimum loss*
// callback function from above.
lbfgs.Optimize(lrf, lrfParams, EarlyStop());
// Print the trained model parameter.
std::cout << lrfParams.t();
}
Note that we have simply passed an instantiation of EarlyStop
the
rest is handled inside the optimizer.
ensmallen provides a more complete and general implementation of a early stopping at minimum loss callback function.
History/changelog
ensmallen 2.21.1: “Bent Antenna”
2024-02-15
-
Fix numerical precision issues for small-gradient L-BFGS scaling factor computations (#392).
-
Ensure the tests are built with optimisation enabled (#394).
ensmallen 2.21.0: “Bent Antenna”
2023-11-27
-
Clarify return values for different callback types (#383).
-
Fix return types of callbacks (#382).
-
Minor cleanup for printing optimization reports via
Report()
(#385).
ensmallen 2.20.0: “Stripped Bolt Head”
2023-10-02
-
Implementation of Active CMAES (#367).
-
LBFGS: avoid generation of NaNs, and add checks for finite values (#368).
-
Fix CNE test tolerances (#360).
-
Rename
SCD
optimizer, toCD
(#379).
ensmallen 2.19.1: “Eight Ball Deluxe”
2023-01-30
- Avoid deprecation warnings in Armadillo 11.2+ (#347).
ensmallen 2.19.0: “Eight Ball Deluxe”
2022-04-06
-
Added DemonSGD and DemonAdam optimizers (#211).
-
Fix bug with Adam-like optimizers not resetting when
resetPolicy
istrue
. (#340). -
Add Yogi optimizer (#232).
-
Add AdaBelief optimizer (#233).
-
Add AdaSqrt optimizer (#234).
-
Bump check for minimum supported version of Armadillo (#342).
ensmallen 2.18.2: “Fairmount Bagel”
2022-02-13
ensmallen 2.18.1: “Fairmount Bagel”
2021-11-19
-
Accelerate SGD test time (#330).
-
Fix potential infinite loop in CMAES (#331).
-
Fix SCD partial gradient test (#332).
ensmallen 2.18.0: “Fairmount Bagel”
2021-10-20
-
Add gradient value clipping and gradient norm scaling callback (#315).
-
Remove superfluous CMake option to build the tests (#313).
-
Bump minimum Armadillo version to 9.800 (#318).
-
Update Catch2 to 2.13.7 (#322).
-
Remove redundant template argument for C++20 compatibility (#324).
-
Fix MOEAD test stability (#327).
ensmallen 2.17.0: “Pachis Din Me Pesa Double”
2021-07-06
-
CheckArbitraryFunctionTypeAPI extended for MOO support (#283).
-
Add Indicators for Multiobjective optimizers (#285).
-
Make Callback flexible for MultiObjective Optimizers (#289).
-
Add ZDT Test Suite (#273).
-
Add MOEA-D/DE Optimizer (#269).
-
Introduce Policy Methods for MOEA/D-DE (#293).
-
Add Das-Dennis weight initialization method (#295).
-
Add Dirichlet Weight Initialization (#296).
-
Improved installation and compilation instructions (#300).
-
Disable building the tests by default for faster installation (#303).
-
Modify matrix initialisation to take into account default element zeroing in Armadillo 10.5 (#305).
ensmallen 2.16.2: “Severely Dented Can Of Polyurethane”
2021-03-24
-
Fix CNE test trials (#267).
-
Update Catch2 to 2.13.4 (#268).
-
Add clarifying comments in problems/ implementations (#276).
ensmallen 2.16.1: “Severely Dented Can Of Polyurethane”
2021-03-02
-
Fix test compilation issue when
ENS_USE_OPENMP
is set (#255). -
Fix CNE initial population generation to use normal distribution (#258).
-
Fix compilation warnings (#259).
-
Remove
AdamSchafferFunctionN2Test
test from Adam test suite to prevent spurious issue on some aarch64 (#265).
ensmallen 2.16.0: “Severely Dented Can Of Polyurethane”
2021-02-11
-
Expand README with example installation and add simple example program showing usage of the L-BFGS optimizer (#248).
-
Refactor tests to increase stability and reduce random errors (#249).
ensmallen 2.15.1: “Why Can’t I Manage To Grow Any Plants?”
2020-11-05
- Fix include order to ensure traits is loaded before reports (#239).
ensmallen 2.15.0: “Why Can’t I Manage To Grow Any Plants?”
2020-11-01
-
Make a few tests more robust (#228).
-
Add release date to version information. (#226)
-
Fix typo in release script (#236).
ensmallen 2.14.2: “No Direction Home”
2020-08-31
- Fix implementation of fonesca fleming problem function f1 and f2 type usage and negative signs. (#223)
ensmallen 2.14.1: “No Direction Home”
2020-08-19
-
Fix release script (remove hardcoded information, trim leading whitespaces introduced by
wc -l
in MacOS) (#216, #220). -
Adjust tolerance for AugLagrangian convergence based on element type (#217).
ensmallen 2.14.0: “No Direction Home”
2020-08-10
-
Add NSGA2 optimizer for multi-objective functions (#149).
-
Update automatic website update release script (#207).
-
Clarify and fix documentation for constrained optimizers (#201).
-
Fix L-BFGS convergence when starting from a minimum (#201).
-
Add optimizer summary report callback (#213).
ensmallen 2.13.0: “Automatically Automated Automation”
2020-07-15
ensmallen 2.12.1: “Stir Crazy”
2020-04-20
-
Fix total number of epochs and time estimation for ProgressBar callback (#181).
-
Handle SpSubview_col and SpSubview_row in Armadillo 9.870 (#194).
-
Minor documentation fixes (#197).
ensmallen 2.12.0: “Stir Crazy”
2020-03-28
-
Correction in the formulation of sigma in CMA-ES (#183).
-
Remove deprecated methods from PrimalDualSolver implementation (#185.
-
Update logo (#186).
ensmallen 2.11.5: “The Poster Session Is Full”
2020-03-11
- Change “mathematical optimization” term to “numerical optimization” in the documentation (#177).
ensmallen 2.11.4: “The Poster Session Is Full”
2020-03-03
-
Require new HISTORY.md entry for each PR. (#171, #172, #175).
-
Update/fix example documentation (#174).
ensmallen 2.11.3: “The Poster Session Is Full”
2020-02-19
-
Prevent spurious compiler warnings (#161).
-
Fix minor memory leaks (#167).
-
Revamp CMake configuration (#152).
ensmallen 2.11.2: “The Poster Session Is Full”
2020-01-16
-
Allow callback instantiation for SGD based optimizer (#138).
-
Minor test stability fixes on i386 (#156).
-
Fix Lookahead MaxIterations() check. (#159).
ensmallen 2.11.1: “The Poster Session Is Full”
2019-12-28
- Fix Lookahead Synchronization period type (#153).
ensmallen 2.11.0: “The Poster Session Is Full”
2019-12-24
ensmallen 2.10.5: “Fried Chicken”
2019-12-13
-
SGD callback test 32-bit safety (big number) (#143).
-
Use “arbitrary” and “separable” terms in static function type checks (#145).
-
Remove ‘using namespace std’ from
problems/
files (#147).
ensmallen 2.10.4: “Fried Chicken”
2019-11-18
ensmallen 2.10.3: “Fried Chicken”
2019-09-26
ensmallen 2.10.2: “Fried Chicken”
2019-09-11
ensmallen 2.10.1: “Fried Chicken”
2019-09-10
ensmallen 2.10.0: “Fried Chicken”
2019-09-07
-
All
Optimize()
functions now take any matrix type; so, e.g.,arma::fmat
orarma::sp_mat
can be used for optimization. See the documentation for more details (#113, #119). -
Introduce callback support. Callbacks can be appended as the last arguments of an
Optimize()
call, and can perform custom behavior at different points during the optimization. See the documentation for more details (#119). -
Slight speedups for
FrankWolfe
optimizer (#127).
ensmallen 1.16.2: “Loud Alarm Clock”
2019-08-12
- Fix PSO return type bug (#126).
ensmallen 1.16.1: “Loud Alarm Clock”
2019-08-11
-
Update HISTORY.md to use Markdown links to the PR and add release names.
-
Fix PSO return type bug (#124).
ensmallen 1.16.0: “Loud Alarm Clock”
2019-08-09
-
Add option to avoid computing exact objective at the end of the optimization (#109).
-
Fix handling of curvature for BigBatchSGD (#118).
-
Reduce runtime of tests (#118).
-
Introduce local-best particle swarm optimization,
LBestPSO
, for unconstrained optimization problems (#86).
ensmallen 1.15.1: “Wrong Side Of The Road”
2019-05-22
-
Fix
-Wreorder
inqhadam
warning (#115). -
Fix
-Wunused-private-field
warning inspsa
(#115). -
Add more warning output for gcc/clang (#116).
ensmallen 1.15.0: “Wrong Side Of The Road”
2019-05-14
- Added QHAdam and QHSGD optimizers (#81).
ensmallen 1.14.4: “Difficult Crimp”
2019-05-12
- Fixes for BigBatchSGD (#91).
ensmallen 1.14.3: “Difficult Crimp”
2019-05-06
- Handle
eig_sym()
failures correctly (#100).
ensmallen 1.14.2: “Difficult Crimp”
2019-03-14
-
SPSA test tolerance fix (#97).
-
Minor documentation fixes (#95, #98).
-
Fix newlines at end of file (#92).
ensmallen 1.14.1: “Difficult Crimp”
2019-03-09
-
Fixes for SPSA (#87).
-
Optimized CNE and DE (#90). Changed initial population generation in CNE to be a normal distribution about the given starting point, which should accelerate convergence.
ensmallen 1.14.0: “Difficult Crimp”
2019-02-20
ensmallen 1.13.2: “Coronavirus Invasion”
2019-02-18
- Minor documentation fixes (#82).
ensmallen 1.13.1: “Coronavirus Invasion”
2019-01-24
- Fix -Wreorder warning (#75).
ensmallen 1.13.0: “Coronavirus Invasion”
2019-01-14
ensmallen 1.12.2: “New Year’s Party”
2019-01-05
- Fix list of contributors.
ensmallen 1.12.1: “New Year’s Party”
2019-01-03
- Make sure all files end with newlines.
ensmallen 1.12.0: “New Year’s Party”
2018-12-30
-
Add link to ensmallen PDF to README.md.
-
Minor documentation fixes. Remove too-verbose documentation from source for each optimizer (#61).
-
Add FTML optimizer (#48).
-
Add SWATS optimizer (#42).
-
Add Padam optimizer (#46).
-
Add Eve optimizer (#45).
-
Add ResetPolicy() to SGD-like optimizers (#60).
ensmallen 1.11.1: “Jet Lag”
2018-11-29
- Minor documentation fixes.
ensmallen 1.11.0: “Jet Lag”
2018-11-28
-
Add WNGrad optimizer.
-
Fix header name in documentation samples.
ensmallen 1.10.1: “Corporate Catabolism”
2018-11-16
-
Fixes for GridSearch optimizer.
-
Include documentation with release.
ensmallen 1.10.0: “Corporate Catabolism”
2018-10-20
-
Initial release.
-
Includes the ported optimization framework from mlpack (http://www.mlpack.org/).