ensmallen documentation

ensmallen is a library for function optimization. It is devoted to efficiently solving the problem

\[\operatorname{argmin} f(x)\]

for many different types of $f(x)$. The documentation is split into four parts:

Function type documentation: how to implement the different types of functions $f(x)$ that ensmallen can handle
Optimizer documentation: how to use each of ensmallen’s optimizers to find $\operatorname{argmin} f(x)$
Callback documentation: documentation for ensmallen callbacks, which can be used to specify custom behavior during optimization
History/changelog: changes for each version of ensmallen
Example program: simple example program demonstrating ensmallen usage

Everything in ensmallen is in the ens namespace so it is often useful to put a using directive in your code:

using namespace ens;

Citation

Please cite the following paper if you use ensmallen in your research and/or software. Citations are useful for the continued development and maintenance of the library.

Ryan R. Curtin, Marcus Edel, Rahul Ganesh Prabhu, Suryoday Basak, Zhihao Lou, Conrad Sanderson.
The ensmallen library for flexible numerical optimization.
Journal of Machine Learning Research, Vol. 22, No. 166, 2021. (bibtex)

See also the following paper detailing the internal design of ensmallen:

R.R. Curtin, M. Edel, R. Prabhu, S. Basak, Z. Lou, C. Sanderson. Flexible numerical optimization with ensmallen. arXiv preprint 2003.01403. (bibtex)

Function type documentation
Optimizer documentation
Callback documentation
History/changelog

Function type documentation

Arbitrary functions

The least restrictive type of function that can be implemented in ensmallen is a function for which only the objective can be evaluated. For this, a class with the following API must be implemented:

Click to collapse/expand example code.

class ArbitraryFunctionType
{
 public:
  // This should return f(x).
  double Evaluate(const arma::mat& x);
};

For this type of function, we assume that the gradient f'(x) is not computable. If it is, see differentiable functions.

The Evaluate() method is allowed to have additional cv-modifiers (static, const, etc.).

The following optimizers can be used to optimize an arbitrary function:

Simulated Annealing
CNE
DE
PSO
SPSA

Each of these optimizers has an Optimize() function that is called as Optimize(f, x) where f is the function to be optimized (which implements Evaluate()) and x holds the initial point of the optimization. After Optimize() is called, x will hold the final result of the optimization (that is, the best x found that minimizes f(x)).

Example: squared function optimization

An example program that implements the objective function f(x) = 2 |x|^2 is shown below, using the simulated annealing optimizer.

Click to collapse/expand example code.

#include <ensmallen.hpp>

class SquaredFunction
{
 public:
  // This returns f(x) = 2 |x|^2.
  double Evaluate(const arma::mat& x)
  {
    return 2 * std::pow(arma::norm(x), 2.0);
  }
};

int main()
{
  // The minimum is at x = [0 0 0].  Our initial point is chosen to be
  // [1.0, -1.0, 1.0].
  arma::mat x("1.0 -1.0 1.0");

  // Create simulated annealing optimizer with default options.
  // The ens::SA<> type can be replaced with any suitable ensmallen optimizer
  // that is able to handle arbitrary functions.
  ens::SA<> optimizer;
  SquaredFunction f; // Create function to be optimized.
  optimizer.Optimize(f, x);

  std::cout << "Minimum of squared function found with simulated annealing is "
      << x;
}

Differentiable functions

Probably the most common type of function that can be optimized with ensmallen is a differentiable function, where both f(x) and f’(x) can be calculated. To optimize a differentiable function with ensmallen, a class must be implemented that follows the API below:

Click to collapse/expand example code.

class DifferentiableFunctionType
{
 public:
  // Given parameters x, return the value of f(x).
  double Evaluate(const arma::mat& x);

  // Given parameters x and a matrix g, store f'(x) in the provided matrix g.
  // g should have the same size (rows, columns) as x.
  void Gradient(const arma::mat& x, arma::mat& gradient);

  // OPTIONAL: this may be implemented in addition to---or instead
  // of---Evaluate() and Gradient().  If this is the only function implemented,
  // implementations of Evaluate() and Gradient() will be automatically
  // generated using template metaprogramming.  Often, implementing
  // EvaluateWithGradient() can result in more efficient optimizations.
  //
  // Given parameters x and a matrix g, return the value of f(x) and store
  // f'(x) in the provided matrix g.  g should have the same size (rows,
  // columns) as x.
  double EvaluateWithGradient(const arma::mat& x, arma::mat& g);
};

Note that you may implement either Evaluate() and Gradient() or EvaluateWithGradient(), but it is not mandatory to implement both. (Of course, supplying both is okay too.) It often results in faster code when EvaluateWithGradient() is implemented, especially for functions where f(x) and f’(x) compute some of the same intermediate quantities.

Each of the implemented methods is allowed to have additional cv-modifiers (static, const, etc.).

The following optimizers can be used with differentiable functions:

L-BFGS (ens::L_BFGS)
FrankWolfe (ens::FrankWolfe)
GradientDescent (ens::GradientDescent)
Any optimizer for arbitrary functions

Each of these optimizers has an Optimize() function that is called as Optimize(f, x) where f is the function to be optimized and x holds the initial point of the optimization. After Optimize() is called, x will hold the final result of the optimization (that is, the best x found that minimizes f(x)).

An example program is shown below. In this program, we optimize a linear regression model. In this setting, we have some matrix data of data points that we’ve observed, and some vector responses of the observed responses to this data. In our model, we assume that each response is the result of a weighted linear combination of the data:

\[\operatorname{responses}_i = x * \operatorname{data}_i\]

where $x$ is a vector of parameters. This gives the objective function $f(x) = (\operatorname{responses} - x’\operatorname{data})^2$.

In the example program, we optimize this objective function and compare the runtime of an implementation that uses Evaluate() and Gradient(), and the runtime of an implementation that uses EvaluateWithGradient().

Click to collapse/expand example code.

#include <ensmallen.hpp>

// Define a differentiable objective function by implementing both Evaluate()
// and Gradient() separately.
class LinearRegressionFunction
{
 public:
  // Construct the object with the given data matrix and responses.
  LinearRegressionFunction(const arma::mat& dataIn,
                           const arma::rowvec& responsesIn) :
      data(dataIn), responses(responsesIn) { }

  // Return the objective function for model parameters x.
  double Evaluate(const arma::mat& x)
  {
    return std::pow(arma::norm(responses - x.t() * data), 2.0);
  }

  // Compute the gradient for model parameters x.
  void Gradient(const arma::mat& x, arma::mat& g)
  {
    g = -2 * data * (responses - x.t() * data);
  }

 private:
  // The data.
  const arma::mat& data;
  // The responses to each data point.
  const arma::rowvec& responses;
};

// Define the same function, but only implement EvaluateWithGradient().
class LinearRegressionEWGFunction
{
 public:
  // Construct the object with the given data matrix and responses.
  LinearRegressionEWGFunction(const arma::mat& dataIn,
                              const arma::rowvec& responsesIn) :
      data(dataIn), responses(responsesIn) { }

  // Simultaneously compute both the objective function and gradient for model
  // parameters x.  Note that this is faster than implementing Evaluate() and
  // Gradient() individually because it caches the computation of
  // (responses - x.t() * data)!
  double EvaluateWithGradient(const arma::mat& x, arma::mat& g)
  {
    const arma::rowvec v = (responses - x.t() * data);
    g = -2 * data * v;
    return arma::accu(v % v); // equivalent to \| v \|^2
  }
};

int main()
{
  // We'll run a simple speed comparison between both objective functions.

  // First, generate some random data, with 10000 points and 10 dimensions.
  // This data has no pattern and as such will make a model that's not very
  // useful---but the purpose here is just demonstration. :)
  //
  // For a more "real world" situation, load a dataset from file using X.load()
  // and y.load() (but make sure the matrix is column-major, so that each
  // observation/data point corresponds to a *column*, *not* a row.
  arma::mat data(10, 10000, arma::fill::randn);
  arma::rowvec responses(10000, arma::fill::randn);

  // Create a starting point for our optimization randomly.  The model has 10
  // parameters, so the shape is 10x1.
  arma::mat startingPoint(10, 1, arma::fill::randn);

  // We'll use Armadillo's wall_clock class to do a timing comparison.
  arma::wall_clock clock;

  // Construct the first objective function.
  LinearRegressionFunction lrf1(data, responses);
  arma::mat lrf1Params(startingPoint);

  // Create the L_BFGS optimizer with default parameters.
  // The ens::L_BFGS type can be replaced with any ensmallen optimizer that can
  // handle differentiable functions.
  ens::L_BFGS lbfgs;

  // Time how long L-BFGS takes for the first Evaluate() and Gradient()
  // objective function.
  clock.tic();
  lbfgs.Optimize(lrf1, lrf1Params);
  const double time1 = clock.toc();

  std::cout << "LinearRegressionFunction with Evaluate() and Gradient() took "
    << time1 << " seconds to converge to the model: " << std::endl;
  std::cout << lrf1Params.t();

  // Create the second objective function, which uses EvaluateWithGradient().
  LinearRegressionEWGFunction lrf2(data, responses);
  arma::mat lrf2Params(startingPoint);

  // Time how long L-BFGS takes for the EvaluateWithGradient() objective
  // function.
  clock.tic();
  lbfgs.Optimize(lrf2, lrf2Params);
  const double time2 = clock.toc();

  std::cout << "LinearRegressionEWGFunction with EvaluateWithGradient() took "
    << time2 << " seconds to converge to the model: " << std::endl;
  std::cout << lrf2Params.t();

  // When this runs, the output parameters will be exactly on the same, but the
  // LinearRegressionEWGFunction will run more quickly!
}

Partially differentiable functions

Some differentiable functions have the additional property that the gradient f'(x) can be decomposed along a different axis j such that the gradient is sparse. This makes the most sense in machine learning applications where the function f(x) is being optimized for some dataset data, which has d dimensions. A partially differentiable separable function is partially differentiable with respect to each dimension j of data. This property is useful for coordinate descent type algorithms.

To use ensmallen optimizers to minimize these types of functions, only two functions needs to be added to the differentiable function type:

Click to collapse/expand example code.

// Compute the partial gradient f'_j(x) with respect to data coordinate j and
// store it in the sparse matrix g.
void Gradient(const arma::mat& x, const size_t j, arma::sp_mat& g);

// Get the number of features that f(x) can be partially differentiated with.
size_t NumFeatures();

Note: many partially differentiable function optimizers do not require a regular implementation of the Gradient(), so that function may be omitted.

If these functions are implemented, the following partially differentiable function optimizers can be used:

Coordinate Descent

Arbitrary separable functions

Often, an objective function f(x) may be represented as the sum of many functions:

f(x) = f_0(x) + f_1(x) + ... + f_N(x).

In this function type, we assume the gradient f'(x) is not computable. If it is, see differentiable separable functions.

For machine learning tasks, the objective function may be, e.g., the sum of a function taken across many data points. Implementing an arbitrary separable function type in ensmallen is similar to implementing an arbitrary objective function, but with a few extra utility methods:

Click to collapse/expand example code.

class ArbitrarySeparableFunctionType
{
 public:
  // Given parameters x, return the sum of the individual functions
  // f_i(x) + ... + f_{i + batchSize - 1}(x).  i will always be greater than 0,
  // and i + batchSize will be less than or equal to the value of NumFunctions().
  double Evaluate(const arma::mat& x, const size_t i, const size_t batchSize);

  // Shuffle the ordering of the functions f_i(x).
  // (For machine learning problems, this would be equivalent to shuffling the
  // data points, e.g., before an epoch of learning.)
  void Shuffle();

  // Get the number of functions f_i(x).
  // (For machine learning problems, this is often just the number of points in
  // the dataset.)
  size_t NumFunctions();
};

Each of the implemented methods is allowed to have additional cv-modifiers (static, const, etc.).

The following optimizers can be used with arbitrary separable functions:

CMAES

Note: using an arbitrary non-separable function optimizer will call Evaluate(x, 0, NumFunctions() - 1); if this is a very computationally intensive operation for your objective function, it may be best to avoid using a non-separable arbitrary function optimizer.

Note: if possible, it’s often better to try and use a gradient-based approach. See differentiable separable functions for separable f(x) where the gradient f’(x) can be computed.

The example program below demonstrates the implementation and use of an arbitrary separable function. The function used is the linear regression objective function, described in differentiable functions. Given some dataset data and responses responses, the linear regression objective function is separable as

\[f_i(x) = (\operatorname{responses}(i) - x' * \operatorname{data}(i))^2\]

where $\operatorname{data}(i)$ represents the data point indexed by $i$ and $\operatorname{responses}(i)$ represents the observed response indexed by $i$.

Click to collapse/expand example code.

#include <ensmallen.hpp>

// This class implements the linear regression objective function as an
// arbitrary separable function type.
class LinearRegressionFunction
{
 public:
  // Create the linear regression function with the given data and the given
  // responses.
  LinearRegressionFunction(const arma::mat& dataIn,
                           const arma::rowvec& responsesIn) :
      data(data), responses(responses) { }

  // Given parameters x, compute the sum of the separable objective
  // functions starting with f_i(x) and ending with
  // f_{i + batchSize - 1}(x).
  double Evaluate(const arma::mat& x, const size_t i, const size_t batchSize)
  {
    // A more complex implementation could avoid the for loop and use
    // submatrices, but it is easier to understand when implemented this way.
    double objective = 0.0;
    for (size_t j = i; j < i + batchSize; ++j)
    {
      objective += std::pow(responses[j] - x.t() * data.col(j), 2.0);
    }
  }

  // Shuffle the ordering of the functions f_i(x).  We do this by simply
  // shuffling the data and responses.
  void Shuffle()
  {
    // Generate a random ordering of data points.
    arma::uvec ordering = arma::shuffle(
        arma::linspace<arma::uvec>(0, data.n_cols - 1, data.n_cols));

    // This reorders the data and responses with our randomly-generated
    // ordering above.
    data = data.cols(ordering);
    responses = responses.cols(ordering);
  }

  // Return the number of functions f_i(x).  In our case this is simply the
  // number of data points.
  size_t NumFunctions() { return data.n_cols; }
};

int main()
{
  // First, generate some random data, with 10000 points and 10 dimensions.
  // This data has no pattern and as such will make a model that's not very
  // useful---but the purpose here is just demonstration. :)
  //
  // For a more "real world" situation, load a dataset from file using X.load()
  // and y.load() (but make sure the matrix is column-major, so that each
  // observation/data point corresponds to a *column*, *not* a row.
  arma::mat data(10, 10000, arma::fill::randn);
  arma::rowvec responses(10000, arma::fill::randn);

  // Create a starting point for our optimization randomly.  The model has 10
  // parameters, so the shape is 10x1.
  arma::mat params(10, 1, arma::fill::randn);

  // Use the CMAES optimizer with default parameters to minimize the
  // LinearRegressionFunction.
  // The ens::CMAES type can be replaced with any suitable ensmallen optimizer
  // that can handle arbitrary separable functions.
  ens::CMAES cmaes;
  LinearRegressionFunction lrf(data, responses);
  cmaes.Optimize(lrf, params);

  std::cout << "The optimized linear regression model found by CMAES has the "
      << "parameters " << params.t();
}

Differentiable separable functions

Likely the most important type of function to be optimized in machine learning and some other fields is the differentiable separable function. A differentiable separable function f(x) can be represented as the sum of many functions:

f(x) = f_0(x) + f_1(x) + ... + f_N(x).

And it also has a computable separable gradient f'(x):

f'(x) = f'_0(x) + f'_1(x) + ... + f'_N(x).

For machine learning tasks, the objective function may be, e.g., the sum of a function taken across many data points. Implementing a differentiable separable function type in ensmallen is similar to implementing an ordinary differentiable function, but with a few extra utility methods:

Click to collapse/expand example code.

class ArbitrarySeparableFunctionType
{
 public:
  // Given parameters x, return the sum of the individual functions
  // f_i(x) + ... + f_{i + batchSize - 1}(x).  i will always be greater than 0,
  // and i + batchSize will be less than or equal to the value of NumFunctions().
  double Evaluate(const arma::mat& x, const size_t i, const size_t batchSize);

  // Given parameters x and a matrix g, store the sum of the gradient of
  // individual functions f'_i(x) + ... + f'_{i + batchSize - 1}(x) into g.  i
  // will always be greater than 0, and i + batchSize will be less than or
  // equal to the value of NumFunctions().
  void Gradient(const arma::mat& x,
                const size_t i,
                arma::mat& g,
                const size_t batchSize);

  // Shuffle the ordering of the functions f_i(x).
  // (For machine learning problems, this would be equivalent to shuffling the
  // data points, e.g., before an epoch of learning.)
  void Shuffle();

  // Get the number of functions f_i(x).
  // (For machine learning problems, this is often just the number of points in
  // the dataset.)
  size_t NumFunctions();

  // OPTIONAL: this may be implemented in addition to---or instead
  // of---Evaluate() and Gradient().  If this is the only function implemented,
  // implementations of Evaluate() and Gradient() will be automatically
  // generated using template metaprogramming.  Often, implementing
  // EvaluateWithGradient() can result in more efficient optimizations.
  //
  // Given parameters x and a matrix g, return the sum of the individual
  // functions f_i(x) + ... + f_{i + batchSize - 1}(x), and store the sum of
  // the gradient of individual functions f'_i(x) + ... +
  // f'_{i + batchSize - 1}(x) into the provided matrix g.  g should have the
  // same size (rows, columns) as x.  i will always be greater than 0, and i +
  // batchSize will be less than or equal to the value of NumFunctions().
  double EvaluateWithGradient(const arma::mat& x,
                              const size_t i,
                              arma::mat& g,
                              const size_t batchSize);
};

Each of the implemented methods is allowed to have additional cv-modifiers (static, const, etc.).

The following optimizers can be used with differentiable separable functions:

AdaBelief
AdaBound
AdaDelta
AdaGrad
AdaSqrt
Adam
AdaMax
AMSBound
AMSGrad
Big Batch SGD
Eve
FTML
IQN
Katyusha
Lookahead
Momentum SGD
Nadam
NadaMax
NesterovMomentumSGD
OptimisticAdam
QHAdam
QHSGD
RMSProp
SARAH/SARAH+
SGD
Stochastic Gradient Descent with Restarts (SGDR)
Snapshot SGDR
SMORMS3
SPALeRA
SWATS
SVRG
WNGrad

f_i(x) = (responses(i) - x' * data(i))^2

where data(i) represents the data point indexed by i and responses(i) represents the observed response indexed by i. This example implementation only implements EvaluateWithGradient() in order to avoid redundant calculations.

Click to collapse/expand example code.

#include <ensmallen.hpp>

// This class implements the linear regression objective function as an
// arbitrary separable function type.
class LinearRegressionFunction
{
 public:
  // Create the linear regression function with the given data and the given
  // responses.
  LinearRegressionFunction(const arma::mat& dataIn,
                           const arma::rowvec& responsesIn) :
      data(data), responses(responses) { }

  // Given parameters x, compute the sum of the separable objective
  // functions starting with f_i(x) and ending with
  // f_{i + batchSize - 1}(x), and also compute the gradient of those functions
  // and store them in g.
  double EvaluateWithGradient(const arma::mat& x,
                              const size_t i,
                              arma::mat& g,
                              const size_t batchSize)
  {
    // This slightly complex implementation uses Armadillo submatrices to
    // compute the objective functions and gradients simultaneously for
    // multiple points.
    //
    // The shared computation between the objective and gradient is the term
    // (response - x * data) so we compute that first, only for points in the
    // batch.
    const arma::rowvec v = (responses.cols(i, i + batchSize - 1) - x.t() *
        data.cols(i, i + batchSize - 1));
    g = -2 * data.cols(i, i + batchSize - 1) * v;
    return arma::accu(v % v); // equivalent to |v|^2
  }

  // Shuffle the ordering of the functions f_i(x).  We do this by simply
  // shuffling the data and responses.
  void Shuffle()
  {
    // Generate a random ordering of data points.
    arma::uvec ordering = arma::shuffle(
        arma::linspace<arma::uvec>(0, data.n_cols - 1, data.n_cols));

    // This reorders the data and responses with our randomly-generated
    // ordering above.
    data = data.cols(ordering);
    responses = responses.cols(ordering);
  }

  // Return the number of functions f_i(x).  In our case this is simply the
  // number of data points.
  size_t NumFunctions() { return data.n_cols; }
};

int main()
{
  // First, generate some random data, with 10000 points and 10 dimensions.
  // This data has no pattern and as such will make a model that's not very
  // useful---but the purpose here is just demonstration. :)
  //
  // For a more "real world" situation, load a dataset from file using X.load()
  // and y.load() (but make sure the matrix is column-major, so that each
  // observation/data point corresponds to a *column*, *not* a row.
  arma::mat data(10, 10000, arma::fill::randn);
  arma::rowvec responses(10000, arma::fill::randn);

  // Create a starting point for our optimization randomly.  The model has 10
  // parameters, so the shape is 10x1.
  arma::mat params(10, 1, arma::fill::randn);

  // Use RMSprop to find the best parameters for the linear regression model.
  // The type 'ens::RMSprop' can be changed for any ensmallen optimizer able to
  // handle differentiable separable functions.
  ens::RMSProp rmsprop;
  LinearRegressionFunction lrf(data, responses);
  rmsprop.Optimize(lrf, params);

  std::cout << "The optimized linear regression model found by RMSprop has the"
      << " parameters " << params.t();
}

Sparse differentiable separable functions

Some differentiable separable functions have the additional property that the gradient f'_i(x) is sparse. When this is true, one additional method can be implemented as part of the class to be optimized:

Click to collapse/expand example code.

// Add this definition to use sparse differentiable separable function
// optimizers.  Given x, store the sum of the sparse gradient f'_i(x) + ... +
// f'_{i + batchSize - 1}(x) into the provided matrix g.
void Gradient(const arma::mat& x,
              const size_t i,
              arma::sp_mat& g,
              const size_t batchSize);

It’s also possible to instead use templates to provide only one Gradient() function for both sparse and non-sparse optimizers:

Click to collapse/expand example code.

// This provides Gradient() for both sparse and non-sparse optimizers.
template<typename GradType>
void Gradient(const arma::mat& x,
              const size_t i,
              GradType& g,
              const size_t batchSize);

If either of these methods are available, then any ensmallen optimizer that optimizes sparse separable differentiable functions may be used. This includes:

Hogwild! (Parallel SGD)

Categorical functions

A categorical function is a function f(x) where some of the values of x are categorical variables (i.e. they take integer values [0, c - 1] and each value 0, 1, …, c - 1 is unrelated). In this situation, the function is not differentiable, and so only the objective f(x) can be implemented. Therefore the class requirements for a categorical function are exactly the same as for an ArbitraryFunctionType—but for any categorical dimension x_i in x, the value will be in the range [0, c_i - 1] where c_i is the number of categories in dimension x_i.

Click to collapse/expand example code.

class CategoricalFunction
{
 public:
  // Return the objective function for the given parameters x.
  double Evaluate(const arma::mat& x);
};

However, when an optimizer’s Optimize() method is called, two additional parameters must be specified, in addition to the function to optimize and the matrix holding the parameters:

const std::vector<bool>& categoricalDimensions: a vector of length equal to the number of elements in x (the number of dimensions). If an element is true, then the dimension is categorical.
const arma::Row<size_t>& numCategories: a vector of length equal to the number of elements in x (the number of dimensions). If a dimension is categorical, then the corresponding value in numCategories should hold the number of categories in that dimension.

The following optimizers can be used in this way to optimize a categorical function:

Grid Search (all parameters must be categorical)

An example program showing usage of categorical optimization is shown below.

Click to collapse/expand example code.

#include <ensmallen.hpp>

// An implementation of a simple categorical function.  The parameters can be
// understood as x = [c1 c2 c3].  When c1 = 0, c2 = 2, and c3 = 1, the value of
// f(x) is 0.  In any other case, the value of f(x) is 10.  Therefore, the
// optimum is found at [0, 2, 1].
class SimpleCategoricalFunction
{
 public:
  // Return the objective function f(x) as described above.
  double Evaluate(const arma::mat& x)
  {
    if (size_t(x[0]) == 0 &&
        size_t(x[1]) == 2 &&
        size_t(x[2]) == 1)
      return 0.0;
    else
      return 10.0;
  }
};

int main()
{
  // Create and optimize the categorical function with the GridSearch
  // optimizer.  We must also create a std::vector<bool> that holds the types
  // of each dimension, and an arma::Row<size_t> that holds the number of
  // categories in each dimension.
  SimpleCategoricalFunction c;

  // We have three categorical dimensions only.
  std::vector<bool> categoricalDimensions;
  categoricalDimensions.push_back(true);
  categoricalDimensions.push_back(true);
  categoricalDimensions.push_back(true);

  // The first category can take 5 values; the second can take 3; the third can
  // take 12.
  arma::Row<size_t> numCategories("5 3 12");

  // The initial point for our optimization will be to set all categories to 0.
  arma::mat params("0 0 0");

  // Now create the GridSearch optimizer with default parameters, and run the
  // optimization.
  // The ens::GridSearch type can be replaced with any ensmallen optimizer that
  // is able to handle categorical functions.
  ens::GridSearch gs;
  gs.Optimize(c, params, categoricalDimensions, numCategories);

  std::cout << "The ens::GridSearch optimizer found the optimal parameters to "
      << "be " << params;
}

Multi-objective functions

A multi-objective optimizer does not return just one set of coordinates at the minimum of all objective functions, but instead finds a front or frontier of possible coordinates that are Pareto-optimal (that is, no individual objective function’s value can be reduced without increasing at least one other objective function).

In order to optimize a multi-objective function with ensmallen, a std::tuple<> containing multiple ArbitraryFunctionTypes (see here) should be passed to a multi-objective optimizer’s Optimize() function.

An example below simultaneously optimizes the generalized Rosenbrock function in 6 dimensions and the Wood function using NSGA2.

Click to collapse/expand example code.

GeneralizedRosenbrockFunction rf(6);
WoodFunction wf;
std::tuple<GeneralizedRosenbrockFunction, WoodFunction> objectives(rf, wf);

// Create an initial point (a random point in 6 dimensions).
arma::mat coordinates(6, 1, arma::fill::randu);

// `coordinates` will be set to the coordinates on the best front that minimize the
// sum of objective functions, and `bestFrontSum` will be the sum of all objectives
// at that coordinate set.
NSGA2 nsga;
double bestFrontSum = nsga.Optimize(objectives, coordinates);

// Set `bestFront` to contain all of the coordinates on the best front.
arma::cube bestFront = optimizer.ParetoFront();
}

Note: all multi-objective function optimizers have both the function Optimize() to find the best front, and also the function ParetoFront() to return all sets of solutions that are on the front.

The following optimizers can be used with multi-objective functions:

NSGA2
MOEA/D-DE

Constrained functions

A constrained function is an objective function f(x) that is also subject to some constraints on x. (For instance, perhaps a constraint could be that x is a positive semidefinite matrix.) ensmallen is able to handle differentiable objective functions of this type—so, f'(x) must also be computable. Given some set of constraints c_0(x), …, c_M(x), we can re-express our constrained objective function as

f_C(x) = f(x) + c_0(x) + ... + c_M(x)

where the (soft) constraint c_i(x) is a positive value if it is not satisfied, and 0 if it is satisfied. The soft constraint c_i(x) should take some value representing how far from a feasible solution x is. It should be differentiable, since ensmallen’s constrained optimizers will use the gradient of the constraint to find a feasible solution.

In order to optimize a constrained function with ensmallen, a class implementing the API below is required.

Click to collapse/expand example code.

class ConstrainedFunctionType
{
 public:
  // Return the objective function f(x) for the given x.
  double Evaluate(const arma::mat& x);

  // Compute the gradient of f(x) for the given x and store the result in g.
  void Gradient(const arma::mat& x, arma::mat& g);

  // Get the number of constraints on the objective function.
  size_t NumConstraints();

  // Evaluate constraint i at the parameters x.  If the constraint is
  // unsatisfied, a value greater than 0 should be returned.  If the constraint
  // is satisfied, 0 should be returned.  The optimizer will add this value to
  // its overall objective that it is trying to minimize.
  double EvaluateConstraint(const size_t i, const arma::mat& x);

  // Evaluate the gradient of constraint i at the parameters x, storing the
  // result in the given matrix g.  If the constraint is not satisfied, the
  // gradient should be set in such a way that the gradient points in the
  // direction where the constraint would be satisfied.
  void GradientConstraint(const size_t i, const arma::mat& x, arma::mat& g);
};

A constrained function can be optimized with the following optimizers:

Augmented Lagrangian

Semidefinite programs

A special class of constrained function is a semidefinite program. ensmallen has support for creating and optimizing semidefinite programs via the ens::SDP<> class. For this, the SDP must be expressed in the primal form:

min_x dot(C, x) such that

dot(A_i, x) = b_i, for i = 1, ..., M;
x >= 0

In this case, A_i and C are square matrices (sparse or dense), and b_i is a scalar.

Once the problem is expressed in this form, a class of type ens::SDP<> can be created. SDP<arma::mat> indicates an SDP with a dense C, and SDP<arma::sp_mat> indicates an SDP with a sparse C. The class has the following useful methods:

SDP(cMatrixSize, numDenseConstraints, numSparseConstraints): create a new SDP
size_t NumSparseConstraints(): get number of sparse constraint matrices A_i
size_t NumDenseConstraints(): get number of dense constraint matrices A_i
std::vector<arma::mat>& DenseA(): get vector of dense A_i matrices
std::vector<arma::sp_mat>& SparseA(): get vector of sparse A_i matrices
arma::vec& DenseB(): get vector of b_i values for dense A_i constraints
arma::vec& SparseB(): get vector of b_i values for sparse A_i constraints

Once these methods are used to set each A_i matrix and corresponding b_i value, and C objective matrix, the SDP object can be used with any ensmallen SDP solver. The list of SDP solvers is below:

Primal-dual SDP solver
Low-rank accelerated SDP solver (LRSDP)

Example code showing how to solve an SDP is given below.

Click to collapse/expand example code.

int main()
{
  // We will build a toy semidefinite program and then use the PrimalDualSolver to find a solution

  // The semi-definite constraint looks like:
  //
  // [ 1  x_12  x_13  0  0  0  0 ]
  // [     1    x_23  0  0  0  0 ]
  // [            1   0  0  0  0 ]
  // [               s1  0  0  0 ]  >= 0
  // [                  s2  0  0 ]
  // [                     s3  0 ]
  // [                        s4 ]

  // x_11 == 0
  arma::sp_mat A0(7, 7); A0.zeros();
  A0(0, 0) = 1.;

  // x_22 == 0
  arma::sp_mat A1(7, 7); A1.zeros();
  A1(1, 1) = 1.;

  // x_33 == 0
  arma::sp_mat A2(7, 7); A2.zeros();
  A2(2, 2) = 1.;

  // x_12 <= -0.1  <==>  x_12 + s1 == -0.1, s1 >= 0
  arma::sp_mat A3(7, 7); A3.zeros();
  A3(1, 0) = A3(0, 1) = 1.; A3(3, 3) = 2.;

  // -0.2 <= x_12  <==>  x_12 - s2 == -0.2, s2 >= 0
  arma::sp_mat A4(7, 7); A4.zeros();
  A4(1, 0) = A4(0, 1) = 1.; A4(4, 4) = -2.;

  // x_23 <= 0.5  <==>  x_23 + s3 == 0.5, s3 >= 0
  arma::sp_mat A5(7, 7); A5.zeros();
  A5(2, 1) = A5(1, 2) = 1.; A5(5, 5) = 2.;

  // 0.4 <= x_23  <==>  x_23 - s4 == 0.4, s4 >= 0
  arma::sp_mat A6(7, 7); A6.zeros();
  A6(2, 1) = A6(1, 2) = 1.; A6(6, 6) = -2.;

  std::vector<arma::sp_mat> ais({A0, A1, A2, A3, A4, A5, A6});

  SDP<arma::sp_mat> sdp(7, 7 + 4 + 4 + 4 + 3 + 2 + 1, 0);

  for (size_t j = 0; j < 3; j++)
  {
    // x_j4 == x_j5 == x_j6 == x_j7 == 0
    for (size_t i = 0; i < 4; i++)
    {
      arma::sp_mat A(7, 7); A.zeros();
      A(i + 3, j) = A(j, i + 3) = 1;
      ais.emplace_back(A);
    }
  }

  // x_45 == x_46 == x_47 == 0
  for (size_t i = 0; i < 3; i++)
  {
    arma::sp_mat A(7, 7); A.zeros();
    A(i + 4, 3) = A(3, i + 4) = 1;
    ais.emplace_back(A);
  }

  // x_56 == x_57 == 0
  for (size_t i = 0; i < 2; i++)
  {
    arma::sp_mat A(7, 7); A.zeros();
    A(i + 5, 4) = A(4, i + 5) = 1;
    ais.emplace_back(A);
  }

  // x_67 == 0
  arma::sp_mat A(7, 7); A.zeros();
  A(6, 5) = A(5, 6) = 1;
  ais.emplace_back(A);

  std::swap(sdp.SparseA(), ais);

  sdp.SparseB().zeros();
  sdp.SparseB()[0] = sdp.SparseB()[1] = sdp.SparseB()[2] = 1.;
  sdp.SparseB()[3] = -0.2; sdp.SparseB()[4] = -0.4;
  sdp.SparseB()[5] = 1.; sdp.SparseB()[6] = 0.8;

  sdp.C().zeros();
  sdp.C()(0, 2) = sdp.C()(2, 0) = 1.;

  // That took a long time but we finally set up the problem right!  Now we can
  // use the PrimalDualSolver to solve it.
  // ens::PrimalDualSolver could be replaced with ens::LRSDP or other ensmallen
  // SDP solvers.
  PrimalDualSolver solver;
  arma::mat X, Z;
  arma::vec ysparse, ydense;
  // ysparse, ydense, and Z hold the primal and dual variables found during the
  // optimization.
  const double obj = solver.Optimize(sdp, X, ysparse, ydense, Z);

  std::cout << "SDP optimized with objective " << obj << "." << std::endl;
}

Alternate matrix types

All of the examples above (and throughout the rest of the documentation) generally assume that the matrix being optimized has type arma::mat. But ensmallen’s optimizers are capable of optimizing more types than just dense Armadillo matrices. In fact, the full signature of each optimizer’s Optimize() method is this:

template<typename FunctionType, typename MatType>
typename MatType::elem_type Optimize(FunctionType& function,
                                     MatType& coordinates);

The return type, typename MatType::elem_type, is just the numeric type held by the given matrix type. So, for arma::mat, the return type is just double. In addition, optimizers for differentiable functions have a third template parameter, GradType, which specifies the type of the gradient. GradType can be manually specified in the situation where, e.g., a sparse gradient is desired.

It is easy to write a function to optimize, e.g., an arma::fmat. Here is an example, adapted from the SquaredFunction example from the arbitrary function documentation.

Click to collapse/expand example code.

#include <ensmallen.hpp>

class SquaredFunction
{
 public:
  // This returns f(x) = 2 |x|^2.
  float Evaluate(const arma::fmat& x)
  {
    return 2 * std::pow(arma::norm(x), 2.0);
  }

  void Gradient(const arma::fmat& x, arma::fmat& gradient)
  {
    gradient = 4 * x;
  }
};

int main()
{
  // The minimum is at x = [0 0 0].  Our initial point is chosen to be
  // [1.0, -1.0, 1.0].
  arma::fmat x("1.0 -1.0 1.0");

  // Create simulated annealing optimizer with default options.
  // The ens::SA<> type can be replaced with any suitable ensmallen optimizer
  // that is able to handle arbitrary functions.
  ens::L_BFGS optimizer;
  SquaredFunction f; // Create function to be optimized.
  optimizer.Optimize(f, x); // The optimizer will infer arma::fmat!

  std::cout << "Minimum of squared function found with simulated annealing is "
      << x;
}

Note that we have simply changed the SquaredFunction to accept arma::fmat instead of arma::mat as parameters to Evaluate(), and the return type has accordingly been changed to float from double. It would even be possible to optimize functions with sparse coordinates by having Evaluate() take a sparse matrix (i.e. arma::sp_mat).

If it were desired to represent the gradient as a sparse type, the Gradient() function would need to be modified to take a sparse matrix (i.e. arma::sp_mat or similar), and then you could call optimizer.Optimize<SquaredFunction, arma::mat, arma::sp_mat>(f, x); to perform the optimization while using sparse matrix types to represent the gradient. Using sparse MatType or GradType should only be done when it is known that the objective matrix and/or gradients will be sparse; otherwise the code may run very slow!

ensmallen will automatically infer MatType from the call to Optimize(), and check that the given FunctionType has all of the necessary functions for the given MatType, throwing a static_assert error if not. If you would like to disable these checks, define the macro ENS_DISABLE_TYPE_CHECKS before including ensmallen:

#define ENS_DISABLE_TYPE_CHECKS
#include <ensmallen.hpp>

This can be useful for situations where you know that the checks should be ignored. However, be aware that the code may fail to compile and give more confusing and difficult error messages!

Optimizer documentation

ActiveCMAES

An optimizer for separable functions.

Active CMA-ES is a variant of the stochastic search algorithm CMA-ES - Covariance Matrix Adaptation Evolution Strategy. Active CMA-ES actively reduces the uncertainty in unfavourable directions by exploiting the information about bad mutations in the covariance matrix update step. This isn’t for the purpose of accelerating progress, but instead for speeding up the adaptation of the covariance matrix (which, in turn, will lead to faster progress).

Constructors

ActiveCMAES<SelectionPolicyType, TransformationPolicyType>()
ActiveCMAES<SelectionPolicyType, TransformationPolicyType>(lambda, transformationPolicy)
ActiveCMAES<SelectionPolicyType, TransformationPolicyType>(lambda, transformationPolicy, batchSize)
ActiveCMAES<SelectionPolicyType, TransformationPolicyType>(lambda, transformationPolicy, batchSize, maxIterations, tolerance, selectionPolicy)
ActiveCMAES<SelectionPolicyType, TransformationPolicyType>(lambda, transformationPolicy, batchSize, maxIterations, tolerance, selectionPolicy, stepSize)

The SelectionPolicyType template parameter refers to the strategy used to compute the (approximate) objective function. The FullSelection and RandomSelection classes are available for use; custom behavior can be achieved by implementing a class with the same method signatures. The TransformationPolicyType template parameter refers to transformation strategy used to map decision variables to the desired domain during fitness evaluation and optimization termination. The EmptyTransformation and BoundaryBoxConstraint classes are available for use; custom behavior can be achieved by implementing a class with the same method signatures.

For convenience the following types can be used:

ActiveCMAES<> (equivalent to ActiveCMAES<FullSelection, EmptyTransformation<>>): uses all separable functions to compute objective
ApproxActiveCMAES<> (equivalent to ActiveCMAES<RandomSelection, EmptyTransformation<>>): uses a small amount of separable functions to compute approximate objective

Attributes

type	name	description	default
`size_t`	`lambda`	The population size (0 uses a default size).	`0`
`TransformationPolicyType`	`transformationPolicy`	Instantiated transformation policy used to map the coordinates to the desired domain.	`TransformationPolicyType()`
`size_t`	`batchSize`	Batch size to use for the objective calculation.	`32`
`size_t`	`maxIterations`	Maximum number of iterations.	`1000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`SelectionPolicyType`	`selectionPolicy`	Instantiated selection policy used to calculate the objective.	`SelectionPolicyType()`
`size_t`	`stepSize`	Initial step size	`0`

Attributes of the optimizer may also be changed via the member methods Lambda(), TransformationPolicy(), BatchSize(), MaxIterations(), Tolerance(), and SelectionPolicy().

The selectionPolicy attribute allows an instantiated SelectionPolicyType to be given. The FullSelection policy has no need to be instantiated and thus the option is not relevant when the ActiveCMAES<> optimizer type is being used; the RandomSelection policy has the constructor RandomSelection(fraction) where fraction specifies the percentage of separable functions to use to estimate the objective function. The transformationPolicy attribute allows an instantiated TransformationPolicyType to be given. The EmptyTransformation<MatType> has no need to be instantiated. BoundaryBoxConstraint<MatType> policy has the constructor BoundaryBoxConstraint(lowerBound, upperBound) where lowerBound and lowerBound are the lower bound and upper bound of the coordinates respectively.

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

// ActiveCMAES with the FullSelection and BoundaryBoxConstraint policies.
BoundaryBoxConstraint b(-1, 1);
ActiveCMAES optimizer(0, b, 32, 200, 1e-4);
optimizer.Optimize(f, coordinates);

// ActiveCMAES with the RandomSelection and BoundaryBoxConstraint policies.
ApproxActiveCMAES<BoundaryBoxConstraint<>> cmaes(0, b, 32, 200, 1e-4);
approxOptimizer.Optimize(f, coordinates);

AdaBelief

An optimizer for differentiable separable functions.

AdaBelief uses a different denominator from Adam, and is orthogonal to other techniques such as recification, decoupled weight decay. The intuition for AdaBelief is to adapt the stepsize according to the “belief” in the current gradient direction.

Constructors

AdaBelief()
AdaBelief(stepSize, batchSize)
AdaBelief(stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance, shuffle)
AdaBelief(stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	The exponential decay rate for the 1st moment estimates.	`0.9`
`double`	`beta2`	The exponential decay rate for the 2nd moment estimates.	`0.999`
`double`	`epsilon`	A small constant for numerical stability.	`1e-8`
`size_t`	`max_iterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

The attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), Beta1(), Beta2(), Epsilon(), MaxIterations(), Tolerance(), Shuffle(), ResetPolicy(), and ExactObjective().

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

AdaBelief optimizer(0.001, 32, 0.9, 0.999, 1e-12, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

AdaBound

An optimizer for differentiable separable functions.

AdaBound is a variant of Adam which employs dynamic bounds on learning rates.

Constructors

AdaBound()
AdaBound(stepSize, batchSize)
AdaBound(stepSize, batchSize, finalLr, gamma, beta1, beta2, epsilon, maxIterations, tolerance, shuffle)
AdaBound(stepSize, batchSize, finalLr, gamma, beta1, beta2, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Note that the AdaBound class is based on the AdaBoundType<UpdateRule> class with UpdateRule = AdaBoundUpdate.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`finalLr`	The final (SGD) learning rate.	`0.1`
`double`	`gamma`	The convergence speed of the bound functions.	`0.001`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`epsilon`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

The attributes of the optimizer may also be modified via the member methods FinalLr(), Gamma(), StepSize(), BatchSize(), Beta1(), Beta2(), Eps(), MaxIterations(), Tolerance(), Shuffle(), ResetPolicy(), and ExactObjective().

Examples

Click to collapse/expand example code.

SphereFunction f(2);
arma::mat coordinates = f.GetInitialPoint();

AdaBound optimizer(0.001, 2, 0.1, 1e-3, 0.9, 0.999, 1e-8, 500000, 1e-3);
optimizer.Optimize(f, coordinates);

AdaDelta

An optimizer for differentiable separable functions.

AdaDelta is an extension of AdaGrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. Instead of accumulating all past squared gradients, the sum of gradients is recursively defined as a decaying average of all past squared gradients.

Constructors

AdaDelta()
AdaDelta(stepSize)
AdaDelta(stepSize, batchSize)
AdaDelta(stepSize, batchSize, rho, epsilon, maxIterations, tolerance, shuffle)
AdaDelta(stepSize, batchSize, rho, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`1.0`
`size_t`	`batchSize`	Number of points to process in one step.	`32`
`double`	`rho`	Smoothing constant. Corresponding to fraction of gradient to keep at each time step.	`0.95`
`double`	`epsilon`	Value used to initialize the mean squared gradient parameter.	`1e-6`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer may also be changed via the member methods StepSize(), BatchSize(), Rho(), Epsilon(), MaxIterations(), Shuffle(), ResetPolicy(), and ExactObjective().

Examples:

Click to collapse/expand example code.

AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates);

Adagrad

An optimizer for differentiable separable functions.

AdaGrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. Larger updates for more sparse parameters and smaller updates for less sparse parameters.

Constructors

AdaGrad()
AdaGrad(stepSize)
AdaGrad(stepSize, batchSize)
AdaGrad(stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle)
AdaGrad(stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Number of points to process in one step.	`32`
`double`	`epsilon`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`tolerance`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer may also be changed via the member methods StepSize(), BatchSize(), Epsilon(), MaxIterations(), Tolerance(), Shuffle(), ResetPolicy(), and ExactObjective().

Examples:

Click to collapse/expand example code.

AdaGrad optimizer(1.0, 1, 1e-8, 1000, 1e-9, true);

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates);

AdaSqrt

An optimizer for differentiable separable functions.

AdaSqrt is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. Larger updates for more sparse parameters and smaller updates for less sparse parameters. AdaSqrt, removes the square root in the denominator and scales the learning rate by sqrt(T).

Constructors

AdaSqrt()
AdaSqrt(stepSize)
AdaSqrt(stepSize, batchSize)
AdaSqrt(stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle)
AdaSqrt(stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Number of points to process in one step.	`32`
`double`	`epsilon`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`tolerance`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Examples:

Click to collapse/expand example code.

AdaSqrt optimizer(1.0, 1, 1e-8, 1000, 1e-9, true);

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates);

Adam

An optimizer for differentiable separable functions.

Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.

Constructors

Adam()
Adam(stepSize, batchSize)
Adam(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle)
Adam(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Note that the Adam class is based on the AdamType<UpdateRule> class with UpdateRule = AdamUpdate.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

Adam optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

AdaMax

An optimizer for differentiable separable functions.

AdaMax is simply a variant of Adam based on the infinity norm.

Constructors

AdaMax()
AdaMax(stepSize, batchSize)
AdaMax(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle)
AdaMax(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, exactObjective, resetPolicy)

Note that the AdaMax class is based on the AdamType<UpdateRule> class with UpdateRule = AdaMaxUpdate.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`

The attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), Beta1(), Beta2(), Eps(), MaxIterations(), Tolerance(), Shuffle(), ExactObjective(), and ResetPolicy().

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

AdaMax optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

AMSBound

An optimizer for differentiable separable functions.

AMSBound is a variant of Adam which employs dynamic bounds on learning rates.

Constructors

AMSBound()
AMSBound(stepSize, batchSize)
AMSBound(stepSize, batchSize, finalLr, gamma, beta1, beta2, epsilon, maxIterations, tolerance, shuffle)
AMSBound(stepSize, batchSize, finalLr, gamma, beta1, beta2, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Note that the AMSBound class is based on the AdaBoundType<UpdateRule> class with UpdateRule = AdaBoundUpdate.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`finalLr`	The final (SGD) learning rate.	`0.1`
`double`	`gamma`	The convergence speed of the bound functions.	`0.001`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`epsilon`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Examples

Click to collapse/expand example code.

SphereFunction f(2);
arma::mat coordinates = f.GetInitialPoint();

AMSBound optimizer(0.001, 2, 0.1, 1e-3, 0.9, 0.999, 1e-8, 500000, 1e-3);
optimizer.Optimize(f, coordinates);

AMSGrad

An optimizer for differentiable separable functions.

AMSGrad is a variant of Adam with guaranteed convergence.

Constructors

AMSGrad()
AMSGrad(stepSize, batchSize)
AMSGrad(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle)
AMSGrad(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, exactObjective, resetPolicy)

Note that the AMSGrad class is based on the AdamType<UpdateRule> class with UpdateRule = AMSGradUpdate.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

AMSGrad optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

Augmented Lagrangian

An optimizer for differentiable constrained functions.

The AugLagrangian class implements the Augmented Lagrangian method of optimization. In this scheme, a penalty term is added to the Lagrangian. This method is also called the “method of multipliers”. Internally, the optimizer uses L-BFGS.

Constructors

AugLagrangian(maxIterations, penaltyThresholdFactor, sigmaUpdateFactor)

Attributes

type	name	description	default
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`1000`
`double`	`penaltyThresholdFactor`	When penalty threshold is updated, set it to this multiplied by the penalty.	`0.25`
`double`	`sigmaUpdateFactor`	When sigma is updated, multiply it by this.	`10.0`
`L_BFGS&`	`lbfgs`	Internal l-bfgs optimizer.	`L_BFGS()`

The attributes of the optimizer may also be modified via the member methods MaxIterations(), PenaltyThresholdFactor(), SigmaUpdateFactor() and LBFGS().

Click to collapse/expand example code.

/**
 * Optimize the function.  The value '1' is used for the initial value of each
 * Lagrange multiplier.  To set the Lagrange multipliers yourself, use the
 * other overload of Optimize().
 *
 * @tparam LagrangianFunctionType Function which can be optimized by this
 *     class.
 * @param function The function to optimize.
 * @param coordinates Output matrix to store the optimized coordinates in.
 */
template<typename LagrangianFunctionType>
bool Optimize(LagrangianFunctionType& function,
              arma::mat& coordinates);

/**
 * Optimize the function, giving initial estimates for the Lagrange
 * multipliers.  The vector of Lagrange multipliers will be modified to
 * contain the Lagrange multipliers of the final solution (if one is found).
 *
 * @tparam LagrangianFunctionType Function which can be optimized by this
 *      class.
 * @param function The function to optimize.
 * @param coordinates Output matrix to store the optimized coordinates in.
 * @param initLambda Vector of initial Lagrange multipliers.  Should have
 *     length equal to the number of constraints.
 * @param initSigma Initial penalty parameter.
 */
template<typename LagrangianFunctionType>
bool Optimize(LagrangianFunctionType& function,
              arma::mat& coordinates,
              const arma::vec& initLambda,
              const double initSigma);

Examples

Click to collapse/expand example code.

GockenbachFunction f;
arma::mat coordinates = f.GetInitialPoint();

AugLagrangian optimizer;
optimizer.Optimize(f, coords);

Big Batch SGD

An optimizer for differentiable separable functions.

Big-batch stochastic gradient descent adaptively grows the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation, so the Big Batch SGD optimizer is able to adaptively adjust batch sizes without user oversight.

Constructors

BigBatchSGD<UpdatePolicy>()
BigBatchSGD<UpdatePolicy>(stepSize)
BigBatchSGD<UpdatePolicy>(stepSize, batchSize)
BigBatchSGD<UpdatePolicy>(stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle, exactObjective)

The UpdatePolicy template parameter refers to the way that a new step size is computed. The AdaptiveStepsize and BacktrackingLineSearch classes are available for use; custom behavior can be achieved by implementing a class with the same method signatures.

For convenience the following typedefs have been defined:

BBS_Armijo = BigBatchSGD<BacktrackingLineSearch>
BBS_BB = BigBatchSGD<AdaptiveStepsize>

Attributes

type	name	description	default
`size_t`	`batchSize`	Initial batch size.	`1000`
`double`	`stepSize`	Step size for each iteration.	`0.01`
`double`	`batchDelta`	Factor for the batch update step.	`0.1`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the batch order is shuffled; otherwise, each batch is visited in linear order.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer may also be changed via the member methods BatchSize(), StepSize(), BatchDelta(), MaxIterations(), Tolerance(), Shuffle(), and ExactObjective().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

// Big-Batch SGD with the adaptive stepsize policy.
BBS_BB optimizer(10, 0.01, 0.1, 8000, 1e-4);
optimizer.Optimize(f, coordinates);

// Big-Batch SGD with backtracking line search.
BBS_Armijo optimizer2(10, 0.01, 0.1, 8000, 1e-4);
optimizer2.Optimize(f, coordinates);

Coordinate Descent (CD)

An optimizer for partially differentiable functions.

Coordinate descent is a technique for minimizing a function by doing a line search along a single direction at the current point in the iteration. The direction (or “coordinate”) can be chosen cyclically, randomly or in a greedy fashion.

Constructors

CD<DescentPolicyType>()
CD<DescentPolicyType>(stepSize, maxIterations)
CD<DescentPolicyType>(stepSize, maxIterations, tolerance, updateInterval)
CD<DescentPolicyType>(stepSize, maxIterations, tolerance, updateInterval, descentPolicy)

The DescentPolicyType template parameter specifies the behavior of CD when selecting the next coordinate to descend with. The RandomDescent, GreedyDescent, and CyclicDescent classes are available for use. Custom behavior can be achieved by implementing a class with the same method signatures.

For convenience, the following typedefs have been defined:

RandomCD (equivalent to CD<RandomDescent>): selects coordinates randomly
GreedyCD (equivalent to CD<GreedyDescent>): selects the coordinate with the maximum guaranteed descent according to the Gauss-Southwell rule
CyclicCD (equivalent to CD<CyclicDescent>): selects coordinates sequentially

Note: CD used to be called SCD. Use of the name SCD is deprecated, and will be removed in ensmallen 3 and later.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate the algorithm.	`1e-5`
`size_t`	`updateInterval`	The interval at which the objective is to be reported and checked for convergence.	`1e3`
`DescentPolicyType`	`descentPolicy`	The policy to use for selecting the coordinate to descend on.	`DescentPolicyType()`

Attributes of the optimizer may also be modified via the member methods StepSize(), MaxIterations(), Tolerance(), UpdateInterval(), and DescentPolicy().

Note that the default value for descentPolicy is the default constructor for DescentPolicyType.

Examples

Click to collapse/expand example code.

SparseTestFunction f;
arma::mat coordinates = f.GetInitialPoint();

RandomCD randomscd(0.01, 100000, 1e-5, 1e3);
randomscd.Optimize(f, coordinates);

GreedyCD greedyscd(0.01, 100000, 1e-5, 1e3);
greedyscd.Optimize(f, coordinates);

CyclicCD cyclicscd(0.01, 100000, 1e-5, 1e3);
cyclicscd.Optimize(f, coordinates);

CMAES

An optimizer for separable functions.

CMA-ES (Covariance Matrix Adaptation Evolution Strategy) is a stochastic search algorithm. CMA-ES is a second order approach estimating a positive definite matrix within an iterative procedure using the covariance matrix.

Constructors

CMAES<SelectionPolicyType, TransformationPolicyType>()
CMAES<SelectionPolicyType, TransformationPolicyType>(lambda, transformationPolicy)
CMAES<SelectionPolicyType, TransformationPolicyType>(lambda, transformationPolicy, batchSize)
CMAES<SelectionPolicyType, TransformationPolicyType>(lambda, transformationPolicy, batchSize, maxIterations, tolerance, selectionPolicy)
CMAES<SelectionPolicyType, TransformationPolicyType>(lambda, transformationPolicy, batchSize, maxIterations, tolerance, selectionPolicy, stepSize)

For convenience the following types can be used:

CMAES<> (equivalent to CMAES<FullSelection, EmptyTransformation<>>): uses all separable functions to compute objective
ApproxCMAES<> (equivalent to CMAES<RandomSelection, EmptyTransformation<>>): uses a small amount of separable functions to compute approximate objective

Attributes

type	name	description	default
`size_t`	`lambda`	The population size (0 uses a default size).	`0`
`TransformationPolicyType`	`transformationPolicy`	Instantiated transformation policy used to map the coordinates to the desired domain.	`TransformationPolicyType()`
`size_t`	`batchSize`	Batch size to use for the objective calculation.	`32`
`size_t`	`maxIterations`	Maximum number of iterations.	`1000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`SelectionPolicyType`	`selectionPolicy`	Instantiated selection policy used to calculate the objective.	`SelectionPolicyType()`
`size_t`	`stepSize`	Initial step size	`0`

Attributes of the optimizer may also be changed via the member methods Lambda(), TransformationPolicy(), BatchSize(), MaxIterations(), Tolerance(), and SelectionPolicy().

The selectionPolicy attribute allows an instantiated SelectionPolicyType to be given. The FullSelection policy has no need to be instantiated and thus the option is not relevant when the CMAES<> optimizer type is being used; the RandomSelection policy has the constructor RandomSelection(fraction) where fraction specifies the percentage of separable functions to use to estimate the objective function. The transformationPolicy attribute allows an instantiated TransformationPolicyType to be given. The EmptyTransformation<MatType> has no need to be instantiated. BoundaryBoxConstraint<MatType> policy has the constructor BoundaryBoxConstraint(lowerBound, upperBound) where lowerBound and lowerBound are the lower bound and upper bound of the coordinates respectively.

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

// CMAES with the FullSelection and BoundaryBoxConstraint policies.
BoundaryBoxConstraint b(-1, 1);
CMAES optimizer(0, b, 32, 200, 1e-4);
optimizer.Optimize(f, coordinates);

// CMAES with the RandomSelection and BoundaryBoxConstraint policies.
ApproxCMAES<BoundaryBoxConstraint<>> cmaes(0, b, 32, 200, 1e-4);
approxOptimizer.Optimize(f, coordinates);

CNE

An optimizer for arbitrary functions.

Conventional Neural Evolution is an optimizer that works like biological evolution which selects best candidates based on their fitness scores and creates new generation by mutation and crossover of population. The initial population is generated based on a random normal distribution centered at the given starting point.

Constructors

CNE()
CNE(populationSize, maxGenerations)
CNE(populationSize, maxGenerations, mutationProb, mutationSize)
CNE(populationSize, maxGenerations, mutationProb, mutationSize, selectPercent, tolerance)

Attributes

type	name	description	default
`size_t`	`populationSize`	The number of candidates in the population. This should be at least 4 in size.	`500`
`size_t`	`maxGenerations`	The maximum number of generations allowed for CNE.	`5000`
`double`	`mutationProb`	Probability that a weight will get mutated.	`0.1`
`double`	`mutationSize`	The range of mutation noise to be added. This range is between 0 and mutationSize.	`0.02`
`double`	`selectPercent`	The percentage of candidates to select to become the the next generation.	`0.2`
`double`	`tolerance`	The final value of the objective function for termination. If set to negative value, tolerance is not considered.	`1e-5`

Attributes of the optimizer may also be changed via the member methods PopulationSize(), MaxGenerations(), MutationProb(), SelectPercent() and Tolerance().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

CNE optimizer(200, 10000, 0.2, 0.2, 0.3, 1e-5);
optimizer.Optimize(f, coordinates);

DE

An optimizer for arbitrary functions.

Differential Evolution is an evolutionary optimization algorithm which selects best candidates based on their fitness scores and creates new generation by mutation and crossover of population.

Constructors

DE()
DE(populationSize, maxGenerations)
DE(populationSize, maxGenerations, crossoverRate)
DE(populationSize, maxGenerations, crossoverRate, differentialWeight)
DE(populationSize, maxGenerations, crossoverRate, differentialWeight, tolerance)

Attributes

type	name	description	default
`size_t`	`populationSize`	The number of candidates in the population. This should be at least 3 in size.	`100`
`size_t`	`maxGenerations`	The maximum number of generations allowed for DE.	`2000`
`double`	`crossoverRate`	Probability that a candidate will undergo crossover.	`0.6`
`double`	`differentialWeight`	Amplification factor for differentiation.	`0.8`
`double`	`tolerance`	The final value of the objective function for termination. If set to negative value, tolerance is not considered.	`1e-5`

Attributes of the optimizer may also be changed via the member methods PopulationSize(), MaxGenerations(), CrossoverRate(), DifferentialWeight() and Tolerance().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

DE optimizer(200, 1000, 0.6, 0.8, 1e-5);
optimizer.Optimize(f, coordinates);

DemonAdam

An optimizer for differentiable separable functions.

DemonAdam is an Adam based optimizer. DemonAdam is motivated by decaying the total contribution of a gradient to all future updates.

Constructors

DemonAdam()
DemonAdam(stepSize, batchSize)
DemonAdam(stepSize, batchSize, momentum, beta1, beta2, eps, maxIterations, tolerance, shuffle)
DemonAdam(stepSize, batchSize, momentum, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy)

Note that the DemonAdam class is based on the DemonAdamType<UpdateRule> class with UpdateRule = AdamUpdate.

For convenience the following typedefs have been defined:

DemonAdaMax (equivalent to DemonAdamType<AdaMaxUpdate>): DemonAdam that uses the AdaMax update rule.
DemonAMSGrad (equivalent to DemonAdamType<AMSGradUpdate>): DemonAdam that uses the AMSGrad update rule.
DemonNadam (equivalent to DemonAdamType<NadamUpdate>): DemonAdam that uses the Nadam update rule.
NadamUpdate (equivalent to DemonAdamType<NadaMaxUpdate>): DemonAdam that uses the NadaMax update rule.
DemonOptimisticAdam (equivalent to DemonAdamType<OptimisticAdamUpdate>): DemonAdam that uses the OptimisticAdam update rule.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`momentum`	The initial momentum coefficient.	`0.9`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`

The attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), Momentum(), MomentumIterations(), Beta1(), Beta2(), Eps(), MaxIterations(), Tolerance(), Shuffle(), and ResetPolicy().

Examples

Click to collapse/expand example code.

MatyasFunction f;
arma::mat coordinates = f.GetInitialPoint();

DemonAdam optimizer(0.5, 1, 0.9);
optimizer.Optimize(f, coordinates);

DemonSGD

An optimizer for differentiable separable functions.

DemonSGD is an SGD based optimizer. DemonSGD is motivated by decaying the total contribution of a gradient to all future updates.

For convenience ensmallen implements various Adam based versions of the Demon optimizer:

DemonAdam (equivalent to DemonAdamType<AdamUpdate>): DemonAdam that uses the Adam update rule.
DemonAdaMax (equivalent to DemonAdamType<AdaMaxUpdate>): DemonAdam that uses the AdaMax update rule.
DemonAMSGrad (equivalent to DemonAdamType<AMSGradUpdate>): DemonAdam that uses the AMSGrad update rule.
DemonNadam (equivalent to DemonAdamType<NadamUpdate>): DemonAdam that uses the Nadam update rule.
NadamUpdate (equivalent to DemonAdamType<NadaMaxUpdate>): DemonAdam that uses the NadaMax update rule.
DemonOptimisticAdam (equivalent to DemonAdamType<OptimisticAdamUpdate>): DemonAdam that uses the OptimisticAdam update rule.

Constructors

DemonSGD()
DemonSGD(stepSize, batchSize)
DemonSGD(stepSize, batchSize, momentum, maxIterations, tolerance, shuffle)
DemonSGD(stepSize, batchSize, momentum, maxIterations, tolerance, shuffle, resetPolicy)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`momentum`	The initial momentum coefficient.	`0.9`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`

The attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), Momentum(), MomentumIterations(), MaxIterations(), Tolerance(), Shuffle(), and ResetPolicy().

Examples

Click to collapse/expand example code.

MatyasFunction f;
arma::mat coordinates = f.GetInitialPoint();

DemonSGD optimizer(0.5, 1, 0.9);
optimizer.Optimize(f, coordinates);

Eve

An optimizer for differentiable separable functions.

Eve is a stochastic gradient based optimization method with locally and globally adaptive learning rates.

Constructors

Eve()
Eve(stepSize, batchSize)
Eve(stepSize, batchSize, beta1, beta2, beta3, epsilon, clip, maxIterations, tolerance, shuffle)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`beta3`	Exponential decay rate for relative change.	`0.999`
`double`	`epsilon`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`double`	`clip`	Clipping range to avoid extreme valus.	`10`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

The attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), Beta1(), Beta2(), Beta3(), Epsilon(), Clip(), MaxIterations(), Tolerance(), Shuffle(), and ExactObjective().

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

Eve optimizer(0.001, 32, 0.9, 0.999, 0.999, 10, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

Frank-Wolfe

An optimizer for differentiable functions that may also be constrained.

Frank-Wolfe is a technique to minimize a continuously differentiable convex function f over a compact convex subset D of a vector space. It is also known as conditional gradient method.

Constructors

FrankWolfe<LinearConstrSolverType, UpdateRuleType>(linearConstrSolver, updateRule)
FrankWolfe<LinearConstrSolverType, UpdateRuleType>(linearConstrSolver, updateRule, maxIterations, tolerance)

The LinearConstrSolverType template parameter specifies the constraint domain D for the problem. The ConstrLpBallSolver and ConstrStructGroupSolver<GroupLpBall> classes are available for use; the former restricts D to the unit ball of the specified l-p norm. Other constraint types may be implemented as a class with the same method signatures as either of the existing classes.

The UpdateRuleType template parameter specifies the update rule used by the optimizer. The UpdateClassic and UpdateLineSearch classes are available for use and represent a simple update step rule and a line search based update rule, respectively. The UpdateSpan and UpdateFulLCorrection classes are also available and may be used with the FuncSq function class (which is a squared matrix loss).

For convenience the following typedefs have been defined:

OMP (equivalent to FrankWolfe<ConstrLpBallSolver, UpdateSpan>): a solver for the orthogonal matching pursuit problem
StandardFrankWolfe (equivalent to FrankWolfe<ConstrLpBallSolver, ClassicUpdate>): the standard Frank-Wolfe algorithm with the solution restricted to lie within the unit ball

Attributes

type	name	description	default
`LinearConstrSolverType`	`linearConstrSolver`	Solver for linear constrained problem.	n/a
`UpdateRuleType`	`updateRule`	Rule for updating solution in each iteration.	n/a
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`size_t`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-10`

Attributes of the optimizer may also be changed via the member methods LinearConstrSolver(), UpdateRule(), MaxIterations(), and Tolerance().

Examples:

TODO

FTML (Follow the Moving Leader)

An optimizer for differentiable separable functions.

Follow the Moving Leader (FTML) is an optimizer where recent samples are weighted more heavily in each iteration, so FTML can adapt more quickly to changes.

Constructors

FTML()
FTML(stepSize, batchSize)
FTML(stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance, shuffle)
FTML(stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

FTML optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

Gradient Descent

An optimizer for differentiable functions.

Gradient Descent is a technique to minimize a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.

Constructors

GradientDescent()
GradientDescent(stepSize)
GradientDescent(stepSize, maxIterations, tolerance)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`size_t`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`

Attributes of the optimizer may also be changed via the member methods StepSize(), MaxIterations(), and Tolerance().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

GradientDescent optimizer(0.001, 0, 1e-15);
optimizer.Optimize(f, coordinates);

Grid Search

An optimizer for categorical functions.

An optimizer that finds the minimum of a given function by iterating through points on a multidimensional grid.

Constructors

GridSearch()

Attributes

The GridSearch class has no configurable attributes.

Note: the GridSearch class can only optimize categorical functions where every parameter is categorical.

Hogwild! (Parallel SGD)

An optimizer for sparse differentiable separable functions.

An implementation of parallel stochastic gradient descent using the lock-free HOGWILD! approach. This implementation requires OpenMP to be enabled during compilation (i.e., -fopenmp specified as a compiler flag).

Note that the requirements for Hogwild! are slightly different than for most differentiable separable functions but it is often possible to use Hogwild! by implementing Gradient() with a template parameter. See the sparse differentiable separable functions documentation for more details.

Constructors

ParallelSGD<DecayPolicyType>(maxIterations, threadShareSize)
ParallelSGD<DecayPolicyType>(maxIterations, threadShareSize, tolerance, shuffle, decayPolicy)

The DecayPolicyType template parameter specifies the policy used to update the step size after each iteration. The ConstantStep class is available for use. Custom behavior can be achieved by implementing a class with the same method signatures.

The default type for DecayPolicyType is ConstantStep, so the shorter type ParallelSGD<> can be used instead of the equivalent ParallelSGD<ConstantStep>.

Attributes

type	name	description	default
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	n/a
`size_t`	`threadShareSize`	Number of datapoints to be processed in one iteration by each thread.	n/a
`double`	`tolerance`	Maximum absolute tolerance to terminate the algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`DecayPolicyType`	`decayPolicy`	An instantiated step size update policy to use.	`DecayPolicyType()`

Attributes of the optimizer may also be modified via the member methods MaxIterations(), ThreadShareSize(), Tolerance(), Shuffle(), and DecayPolicy().

Note that the default value for decayPolicy is the default constructor for the DecayPolicyType.

Examples

Click to collapse/expand example code.

GeneralizedRosenbrockFunction f(50); // 50-dimensional Rosenbrock function.
arma::mat coordinates = f.GetInitialPoint();

ParallelSGD<> optimizer(100000, f.NumFunctions(), 1e-5, true);
optimizer.Optimize(f, coordinates);

IQN

An optimizer for differentiable separable functions.

The Incremental Quasi-Newton belongs to the family of stochastic and incremental methods that have a cost per iteration independent of n. IQN iterations are a stochastic version of BFGS iterations that use memory to reduce the variance of stochastic approximations.

Constructors

IQN()
IQN(stepSize)
IQN(stepSize, batchSize, maxIterations, tolerance)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Size of each batch.	`10`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`size_t`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`

Attributes of the optimizer may also be changed via the member methods StepSize(), BatchSize(), MaxIterations(), and Tolerance().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

IQN optimizer(0.01, 1, 5000, 1e-5);
optimizer.Optimize(f, coordinates);

Katyusha

An optimizer for differentiable separable functions.

Katyusha is a direct, primal-only stochastic gradient method which uses a “negative momentum” on top of Nesterov’s momentum. Two types are available—one that uses a proximal update step, and one that uses the standard update step.

Constructors

KatyushaType<proximal>()
KatyushaType<proximal>(convexity, lipschitz)
KatyushaType<proximal>(convexity, lipschitz, batchSize)
KatyushaType<proximal>(convexity, lipschitz, batchSize, maxIterations, innerIterations, tolerance, shuffle, exactObjective)

The proximal template parameter is a boolean value (true or false) that specifies whether or not the proximal update should be used.

For convenience the following typedefs have been defined:

Katyusha (equivalent to KatyushaType<false>): Katyusha with the standard update step
KatyushaProximal (equivalent to KatyushaType<true>): Katyusha with the proximal update step

Attributes

type	name	description	default
`double`	`convexity`	The regularization parameter.	`1.0`
`double`	`lipschitz`	The Lipschitz constant.	`10.0`
`size_t`	`batchSize`	Batch size to use for each step.	`32`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`1000`
`size_t`	`innerIterations`	The number of inner iterations allowed (0 means n / batchSize). Note that the full gradient is only calculated in the outer iteration.	`0`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer may also be changed via the member methods Convexity(), Lipschitz(), BatchSize(), MaxIterations(), InnerIterations(), Tolerance(), Shuffle(), and ExactObjective().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

// Without proximal update.
Katyusha optimizer(1.0, 10.0, 1, 100, 0, 1e-10, true);
optimizer.Optimize(f, coordinates);

// With proximal update.
KatyushaProximal proximalOptimizer(1.0, 10.0, 1, 100, 0, 1e-10, true);
proximalOptimizer.Optimize(f, coordinates);

L-BFGS

An optimizer for differentiable functions

L-BFGS is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm using a limited amount of computer memory.

Constructors

L_BFGS()
L_BFGS(numBasis, maxIterations)
L_BFGS(numBasis, maxIterations, armijoConstant, wolfe, minGradientNorm, factr, maxLineSearchTrials)
L_BFGS(numBasis, maxIterations, armijoConstant, wolfe, minGradientNorm, factr, maxLineSearchTrials, minStep, maxStep)

Attributes

type	name	description	default
`size_t`	`numBasis`	Number of memory points to be stored (default 10).	`10`
`size_t`	`maxIterations`	Maximum number of iterations for the optimization (0 means no limit and may run indefinitely).	`10000`
`double`	`armijoConstant`	Controls the accuracy of the line search routine for determining the Armijo condition.	`1e-4`
`double`	`wolfe`	Parameter for detecting the Wolfe condition.	`0.9`
`double`	`minGradientNorm`	Minimum gradient norm required to continue the optimization.	`1e-6`
`double`	`factr`	Minimum relative function value decrease to continue the optimization.	`1e-15`
`size_t`	`maxLineSearchTrials`	The maximum number of trials for the line search (before giving up).	`50`
`double`	`minStep`	The minimum step of the line search.	`1e-20`
`double`	`maxStep`	The maximum step of the line search.	`1e20`

Attributes of the optimizer may also be changed via the member methods NumBasis(), MaxIterations(), ArmijoConstant(), Wolfe(), MinGradientNorm(), Factr(), MaxLineSearchTrials(), MinStep(), and MaxStep().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

L_BFGS optimizer(20);
optimizer.Optimize(f, coordinates);

Lookahead

An optimizer for differentiable separable functions.

Lookahead is a stochastic gradient based optimization method which chooses a search direction by looking ahead at the sequence of “fast weights” generated by another optimizer.

Constructors

Lookahead<>()
Lookahead<>(stepSize)
Lookahead<>(stepSize, k)
Lookahead<>(stepSize, k, maxIterations, tolerance, decayPolicy, exactObjective)
Lookahead<>(baseOptimizer, stepSize, k, maxIterations, tolerance, decayPolicy, exactObjective)

Note that Lookahead<> is based on the templated type LookaheadType<BaseOptimizerType, DecayPolicyType> with BaseOptimizerType = Adam and DecayPolicyType = NoDecay.

Any optimizer that implements the differentiable separable functions interface can be paired with the Lookahead optimizer.

Attributes

type	name	description	default
`BaseOptimizerType`	`baseOptimizer`	Optimizer for the forward step.	Adam
`double`	`stepSize`	Step size for each iteration.	`0.5`
`size_t`	`k`	The synchronization period.	`5`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`DecayPolicyType`	`decayPolicy`	Instantiated decay policy used to adjust the step size.	`DecayPolicyType()`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

The attributes of the optimizer may also be modified via the member methods BaseOptimizer(), StepSize(), K(), MaxIterations(), Tolerance(), DecayPolicy() and ExactObjective().

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

Lookahead<> optimizer(0.5, 5, 100000, 1e-5);
optimizer.Optimize(f, coordinates);

LRSDP (low-rank SDP solver)

An optimizer for semidefinite programs.

LRSDP is the implementation of Monteiro and Burer’s formulation of low-rank semidefinite programs (LR-SDP). This solver uses the augmented Lagrangian optimizer to solve low-rank semidefinite programs.

The assumption here is that the solution matrix for the SDP is low-rank. If this assumption is not true, the algorithm should not be expected to converge.

Constructors

LRSDP<SDPType>()

The SDPType template parameter specifies the type of SDP to solve. The SDP<arma::mat> and SDP<arma::sp_mat> classes are available for use; these represent SDPs with dense and sparse C matrices, respectively. The SDP<> class is detailed in the semidefinite program documentation.

Once the LRSDP<> object is constructed, the SDP may be specified by calling the SDP() member method, which returns a reference to the SDPType.

Attributes

The attributes of the LRSDP optimizer may only be accessed via member methods.

type	method name	description	default
`size_t`	`MaxIterations()`	Maximum number of iterations before termination.	`1000`
`AugLagrangian`	`AugLag()`	The internally-held Augmented Lagrangian optimizer.	n/a

Momentum SGD

An optimizer for differentiable separable functions.

Stochastic Gradient Descent is a technique for minimizing a function which can be expressed as a sum of other functions. This is an SGD variant that uses momentum for its updates. Using momentum updates for parameter learning can accelerate the rate of convergence, specifically in the cases where the surface curves much more steeply (a steep hilly terrain with high curvature).

Constructors

MomentumSGD()
MomentumSGD(stepSize, batchSize)
MomentumSGD(stepSize, batchSize, maxIterations, tolerance, shuffle)
MomentumSGD(stepSize, batchSize, maxIterations, tolerance, shuffle, momentumPolicy, decayPolicy, resetPolicy, exactObjective)

Note that MomentumSGD is based on the templated type SGD<UpdatePolicyType, DecayPolicyType> with UpdatePolicyType= MomentumUpdate and DecayPolicyType = NoDecay.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Batch size to use for each step.	`32`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`MomentumUpdate`	`updatePolicy`	An instantiated `MomentumUpdate`.	`MomentumUpdate()`
`DecayPolicyType`	`decayPolicy`	Instantiated decay policy used to adjust the step size.	`DecayPolicyType()`
`bool`	`resetPolicy`	Flag that determines whether update policy parameters are reset before every Optimize call.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), MaxIterations(), Tolerance(), Shuffle(), UpdatePolicy(), DecayPolicy(), ResetPolicy(), and ExactObjective().

Note that the MomentumUpdate class has the constructor MomentumUpdate(momentum) with a default value of 0.5 for the momentum.

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));
optimizer.Optimize(f, coordinates);

Nadam

An optimizer for differentiable separable functions.

Nadam is a variant of Adam based on NAG (Nesterov accelerated gradient). It uses Nesterov momentum for faster convergence.

Constructors

Nadam()
Nadam(stepSize, batchSize)
Nadam(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle)
Nadam(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy)

Note that the Nadam class is based on the AdamType<UpdateRule> class with UpdateRule = NadamUpdate.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

Nadam optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

NadaMax

An optimizer for differentiable separable functions.

NadaMax is a variant of AdaMax based on NAG (Nesterov accelerated gradient). It uses Nesterov momentum for faster convergence.

Constructors

NadaMax()
NadaMax(stepSize, batchSize)
NadaMax(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle)
NadaMax(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy)

Note that the NadaMax class is based on the AdamType<UpdateRule> class with UpdateRule = NadaMaxUpdate.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

NadaMax optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

Nesterov Momentum SGD

An optimizer for differentiable separable functions.

Stochastic Gradient Descent is a technique for minimizing a function which can be expressed as a sum of other functions. This is an SGD variant that uses Nesterov momentum for its updates. Nesterov Momentum application can accelerate the rate of convergence to O(1/k^2).

Constructors

NesterovMomentumSGD()
NesterovMomentumSGD(stepSize, batchSize)
NesterovMomentumSGD(stepSize, batchSize, maxIterations, tolerance, shuffle)
NesterovMomentumSGD(stepSize, batchSize, maxIterations, tolerance, shuffle, momentumPolicy, decayPolicy, resetPolicy, exactObjective)

Note that MomentumSGD is based on the templated type SGD<UpdatePolicyType, DecayPolicyType> with UpdatePolicyType= NesterovMomentumUpdate and DecayPolicyType = NoDecay.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Batch size to use for each step.	`32`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`NesterovMomentumUpdate`	`updatePolicy`	An instantiated `MomentumUpdate`.	`NesterovMomentumUpdate()`
`DecayPolicyType`	`decayPolicy`	Instantiated decay policy used to adjust the step size.	`DecayPolicyType()`
`bool`	`resetPolicy`	Flag that determines whether update policy parameters are reset before every Optimize call.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Note that the NesterovMomentumUpdate class has the constructor MomentumUpdate(momentum) with a default value of 0.5 for the momentum.

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

NesterovMomentumSGD optimizer(0.01, 32, 100000, 1e-5, true,
    MomentumUpdate(0.5));
optimizer.Optimize(f, coordinates);

MOEA/D-DE

An optimizer for arbitrary multi-objective functions. MOEA/D-DE (Multi Objective Evolutionary Algorithm based on Decomposition - Differential Evolution) is a multi objective optimization algorithm. It works by decomposing the problem into a number of scalar optimization subproblems which are solved simultaneously per generation. MOEA/D in itself is a framework, this particular algorithm uses Differential Crossover followed by Polynomial Mutation to create offsprings which are then decomposed to form a Single Objective Problem. A diversity preserving mechanism is also employed which encourages a varied set of solution.

Constructors

MOEAD<InitPolicyType, DecompPolicyType>()
MOEAD<InitPolicyType, DecompPolicyType>(populationSize, maxGenerations, crossoverProb, neighborProb, neighborSize, distributionIndex, differentialWeight, maxReplace, epsilon, lowerBound, upperBound)

The InitPolicyType template parameter refers to the strategy used to initialize the reference directions.

The following types are available:

Uniform
BayesianBootstrap
Dirichlet

The DecompPolicyType template parameter refers to the strategy used to decompose the weight vectors to form a scalar objective function.

The following types are available:

Tchebycheff
WeightedAverage
PenaltyBoundaryIntersection

For convenience the following types can be used:

DefaultMOEAD (equivalent to MOEAD<Uniform, Tchebycheff>): utilizes Uniform method for weight initialization and Tchebycheff for weight decomposition.
BBSMOEAD (equivalent to MOEAD<BayesianBootstrap, Tchebycheff>): utilizes Bayesian Bootstrap method for weight initialization and Tchebycheff for weight decomposition.
DirichletMOEAD (equivalent to MOEAD<Dirichlet, Tchebycheff>): utilizes Dirichlet sampling for weight init and Tchebycheff for weight decomposition.

Attributes

type	name	description	default
`size_t`	`populationSize`	The number of candidates in the population.	`150`
`size_t`	`maxGenerations`	The maximum number of generations allowed.	`300`
`double`	`crossoverProb`	Probability that a crossover will occur.	`1.0`
`double`	`neighborProb`	The probability of sampling from neighbor.	`0.9`
`size_t`	`neighborSize`	The number of nearest-neighbours to consider per weight vector.	`20`
`double`	`distributionIndex`	The crowding degree of the mutation.	`20`
`double`	`differentialWeight`	Amplification factor of the differentiation.	`0.5`
`size_t`	`maxReplace`	The limit of solutions allowed to be replaced by a child.	`2`
`double`	`epsilon`	Handles numerical stability after weight initialization.	`1E-10`
`double`, `arma::vec`	`lowerBound`	Lower bound of the coordinates on the coordinates of the whole population during the search process.	`0`
`double`, `arma::vec`	`upperBound`	Lower bound of the coordinates on the coordinates of the whole population during the search process.	`1`
`InitPolicyType`	`initPolicy`	Instantiated init policy used to initialize weights.	`InitPolicyType()`
`DecompPolicyType`	`decompPolicy`	Instantiated decomposition policy used to create scalar objective problem.	`DecompPolicyType()`

Attributes of the optimizer may also be changed via the member methods PopulationSize(), MaxGenerations(), CrossoverRate(), NeighborProb(), NeighborSize(), DistributionIndex(), DifferentialWeight(), MaxReplace(), Epsilon(), LowerBound(), UpperBound(), InitPolicy() and DecompPolicy().

Examples:

Click to collapse/expand example code.

SchafferFunctionN1<arma::mat> SCH;
arma::vec lowerBound("-10");
arma::vec upperBound("10");
DefaultMOEAD opt(300, 300, 1.0, 0.9, 20, 20, 0.5, 2, 1E-10, lowerBound, upperBound);
typedef decltype(SCH.objectiveA) ObjectiveTypeA;
typedef decltype(SCH.objectiveB) ObjectiveTypeB;
arma::mat coords = SCH.GetInitialPoint();
std::tuple<ObjectiveTypeA, ObjectiveTypeB> objectives = SCH.GetObjectives();
// obj will contain the minimum sum of objectiveA and objectiveB found on the best front.
double obj = opt.Optimize(objectives, coords);
// Now obtain the best front.
arma::cube bestFront = opt.ParetoFront();

NSGA2

An optimizer for arbitrary multi-objective functions.

NSGA2 (Non-dominated Sorting Genetic Algorithm - II) is a multi-objective optimization algorithm. The algorithm works by generating a candidate population from a fixed starting point. At each stage of optimization, a new population of children is generated. This new population along with its predecessor is sorted using non-domination as the metric. Following this, the population is further segregated into fronts. A new population is generated from these fronts having size equal to that of the starting population.

Constructors

NSGA2()
NSGA2(populationSize, maxGenerations, crossoverProb, mutationProb, mutationStrength, epsilon, lowerBound, upperBound)

Attributes

type	name	description	default
`size_t`	`populationSize`	The number of candidates in the population. This should be at least 4 in size and a multiple of 4.	`100`
`size_t`	`maxGenerations`	The maximum number of generations allowed for NSGA2.	`2000`
`double`	`crossoverProb`	Probability that a crossover will occur.	`0.6`
`double`	`mutationProb`	Probability that a weight will get mutated.	`0.3`
`double`	`mutationStrength`	The range of mutation noise to be added. This range is between 0 and mutationStrength.	`0.001`
`double`	`epsilon`	The value used internally to evaluate approximate equality in crowding distance based sorting.	`1e-6`
`double`, `arma::vec`	`lowerBound`	Lower bound of the coordinates on the coordinates of the whole population during the search process.	`0`
`double`, `arma::vec`	`upperBound`	Lower bound of the coordinates on the coordinates of the whole population during the search process.	`1`

Note that the parameters lowerBound and upperBound are overloaded. Data types of double or arma::mat may be used. If they are initialized as single values of double, then the same value of the bound applies to all the axes, resulting in an initialization following a uniform distribution in a hypercube. If they are initialized as matrices of arma::mat, then the value of lowerBound[i] applies to axis [i]; similarly, for values in upperBound. This results in an initialization following a uniform distribution in a hyperrectangle within the specified bounds.

Attributes of the optimizer may also be changed via the member methods PopulationSize(), MaxGenerations(), CrossoverRate(), MutationProbability(), MutationStrength(), Epsilon(), LowerBound() and UpperBound().

Examples:

Click to collapse/expand example code.

SchafferFunctionN1<arma::mat> SCH;
arma::vec lowerBound("-1000");
arma::vec upperBound("1000");
NSGA2 opt(20, 5000, 0.5, 0.5, 1e-3, 1e-6, lowerBound, upperBound);

typedef decltype(SCH.objectiveA) ObjectiveTypeA;
typedef decltype(SCH.objectiveB) ObjectiveTypeB;

arma::mat coords = SCH.GetInitialPoint();
std::tuple<ObjectiveTypeA, ObjectiveTypeB> objectives = SCH.GetObjectives();

// obj will contain the minimum sum of objectiveA and objectiveB found on the best front.
double obj = opt.Optimize(objectives, coords);
// Now obtain the best front.
arma::cube bestFront = opt.ParetoFront();

OptimisticAdam

An optimizer for differentiable separable functions.

OptimisticAdam is an optimizer which implements the Optimistic Adam algorithm which uses Optmistic Mirror Descent with the Adam Optimizer. It addresses the problem of limit cycling while training GANs (generative adversarial networks). It uses OMD to achieve faster regret rates in solving the zero sum game of training a GAN. It consistently achieves a smaller KL divergence with~ respect to the true underlying data distribution. The implementation here can be used with any differentiable separable function, not just GAN training.

Constructors

OptimisticAdam()
OptimisticAdam(stepSize, batchSize)
OptimisticAdam(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle)
OptimisticAdam(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy)

Note that the OptimisticAdam class is based on the AdamType<UpdateRule> class with UpdateRule = OptimisticAdamUpdate.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

OptimisticAdam optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

Padam

An optimizer for differentiable separable functions.

Padam is a variant of Adam with a partially adaptive momentum estimation method.

Constructors

Padam()
Padam(stepSize, batchSize)
Padam(stepSize, batchSize, beta1, beta2, partial, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`partial`	Partially adaptive parameter.	`0.25`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

The attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), Beta1(), Beta2(), Partial(), Epsilon(), MaxIterations(), Tolerance(), Shuffle(), ResetPolicy(), and ExactObjective().

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

Padam optimizer(0.001, 32, 0.9, 0.999, 0.25, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

PSO

An optimizer for arbitrary functions.

PSO is an evolutionary approach to optimization that is inspired by flocks or birds or fishes. The fundamental analogy is that every creature (particle in a swarm) is at a measurable position of goodness or fitness, and this information can be shared amongst the creatures in the flock, so that iteratively, the entire flock can get close to the global optimum.

Constructors

PSOType<VelocityUpdatePolicy, InitPolicy>()
PSOType<VelocityUpdatePolicy, InitPolicy>(numParticles)
PSOType<VelocityUpdatePolicy, InitPolicy>(numParticles, lowerBound, upperBound)
PSOType<VelocityUpdatePolicy, InitPolicy>(numParticles, lowerBound, upperBound, maxIterations)
PSOType<VelocityUpdatePolicy, InitPolicy>(numParticles, lowerBound, upperBound, maxIterations, horizonSize)
PSOType<VelocityUpdatePolicy, InitPolicy>(numParticles, lowerBound, upperBound, maxIterations, horizonSize, impTolerance)
PSOType<VelocityUpdatePolicy, InitPolicy>(numParticles, lowerBound, upperBound, maxIterations, horizonSize, impTolerance, exploitationFactor, explorationFactor)

Attributes

type	name	description	default
`size_t`	`numParticles`	numParticles Number of particles in the swarm.	`64`
`double`, `arma::mat`	`lowerBound`	Lower bound of the coordinates of the initial population.	`1`
`double`, `arma::mat`	`upperBound`	Upper bound of the coordinates of the initial population.	`1`
`size_t`	`maxIterations`	Maximum number of iterations allowed.	`3000`
`size_t`	`horizonSize`	Size of the lookback-horizon for computing improvement.	`350`
`double`	`impTolerance`	The final value of the objective function for termination. If set to negative value, tolerance is not considered.	`1e-5`
`double`	`exploitationFactor`	Influence of the personal best of the particle.	`2.05`
`double`	`explorationFactor`	Influence of the neighbours of the particle.	`2.05`

Attributes of the optimizer may also be changed via the member methods NumParticles(), LowerBound(), UpperBound(), MaxIterations(), HorizonSize(), ImpTolerance(),ExploitationFactor(), and ExplorationFactor().

At present, only the local-best variant of PSO is present in ensmallen. The optimizer may be initialized using the class type LBestPSO, which is an alias for PSOType<LBestUpdate, DefaultInit>.

Examples:

Click to collapse/expand example code.

SphereFunction f(4);
arma::vec coordinates = f.GetInitialPoint();

LBestPSO s;
const double result = s.Optimize(f, coordinates)

Click to collapse/expand example code.

RosenbrockFunction f;
arma::vec coordinates = f.GetInitialPoint();

// Setting bounds for the initial swarm population of size 2.
arma::vec lowerBound("50 50");
arma::vec upperBound("60 60");

LBestPSO s(200, lowerBound, upperBound, 3000, 600, 1e-30, 2.05, 2.05);
const double result = s.Optimize(f, coordinates)

Click to collapse/expand example code.

RosenbrockFunction f;
arma::vec coordinates = f.GetInitialPoint();

// Setting bounds for the initial swarm population as type double.
double lowerBound = 50;
double upperBound = 60;

LBestPSO s(64, lowerBound, upperBound, 3000, 400, 1e-30, 2.05, 2.05);
const double result = s.Optimize(f, coordinates)

Primal-dual SDP Solver

An optimizer for semidefinite programs.

A primal-dual interior point method solver. This can solve semidefinite programs.

Constructors

PrimalDualSolver<>(maxIterations)
PrimalDualSolver<>(maxIterations, tau, normXzTol, primalInfeasTol, dualInfeasTol)

Attributes

The PrimalDualSolver<> class has several attributes that are only modifiable as member methods.

type	method name	description	default
`double`	`Tau()`	Value of tau used to compute alpha_hat.	`0.99`
`double`	`NormXZTol()`	Tolerance for the norm of X*Z.	`1e-7`
`double`	`PrimalInfeasTol()`	Tolerance for primal infeasibility.	`1e-7`
`double`	`DualInfeasTol()`	Tolerance for dual infeasibility.	`1e-7`
`size_t`	`MaxIterations()`	Maximum number of iterations before convergence.	`1000`

Optimization

The PrimalDualSolver<> class offers two overloads of Optimize() that optionally return the converged values for the dual variables.

Click to collapse/expand example code.

/**
 * Invoke the optimization procedure, returning the converged values for the
 * primal and dual variables.
 */
template<typename SDPType>
double Optimize(SDPType& s,
                arma::mat& X,
                arma::vec& ySparse,
                arma::vec& yDense,
                arma::mat& Z);

/**
 * Invoke the optimization procedure, and only return the primal variable.
 */
template<typename SDPType>
double Optimize(SDPType& s, arma::mat& X);

Quasi-Hyperbolic Momentum Update SGD (QHSGD)

An optimizer for differentiable separable functions.

Quasi-hyperbolic momentum update SGD (QHSGD) is an SGD-like optimizer with momentum where quasi-hyperbolic terms are added to the parametrization. The update rule for this optimizer is a weighted average of momentum SGD and vanilla SGD.

Constructors

QHSGD()
QHSGD(stepSize, batchSize)
QHSGD(stepSize, batchSize, maxIterations, tolerance, shuffle, exactObjective)

Note that QHSGD is based on the templated type SGD<UpdatePolicyType, DecayPolicyType> with UpdatePolicyType= QHUpdate and DecayPolicyType = NoDecay.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Batch size to use for each step.	`32`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), MaxIterations(), Tolerance(), Shuffle(), and ExactObjective().

Note that the QHUpdate class has the constructor QHUpdate(v, momentum) with a default value of 0.7 for the quasi-hyperbolic term v and 0.999 for the momentum term.

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

QHSGD optimizer(0.01, 32, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

QHAdam

An optimizer for differentiable separable functions.

QHAdam is an optimizer that uses quasi-hyperbolic descent with the Adam optimizer. This replaces the moment estimators of Adam with quasi-hyperbolic terms, and various values of the v1 and v2 parameters are equivalent to the following other optimizers:

When v1 = v2 = 1, QHAdam is equivalent to Adam.
When v1 = 0 and v2 = 1, QHAdam is equivalent to RMSProp.
When v1 = beta1 and v2 = 1, QHAdam is equivalent to Nadam.

Constructors

QHAdam()
QHAdam(stepSize, batchSize)
QHAdam(stepSize, batchSize, v1, v2, beta1, beta2, eps, maxIterations)
QHAdam(stepSize, batchSize, v1, v2, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`v1`	The First Quasi Hyperbolic Term.	`0.7`
`double`	`v2`	The Second Quasi Hyperbolic Term.	`1.00`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Examples

 RosenbrockFunction f;
 arma::mat coordinates = f.GetInitialPoint();

 QHAdam optimizer(0.001, 32, 0.7, 0.9, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
 optimizer.Optimize(f, coordinates);

RMSProp

An optimizer for differentiable separable functions.

RMSProp utilizes the magnitude of recent gradients to normalize the gradients.

Constructors

RMSProp()
RMSProp(stepSize, batchSize)
RMSProp(stepSize, batchSize, alpha, epsilon, maxIterations, tolerance, shuffle)
RMSProp(stepSize, batchSize, alpha, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Number of points to process in each step.	`32`
`double`	`alpha`	Smoothing constant, similar to that used in AdaDelta and momentum methods.	`0.99`
`double`	`epsilon`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer can also be modified via the member methods StepSize(), BatchSize(), Alpha(), Epsilon(), MaxIterations(), Tolerance(), Shuffle(), ResetPolicy(), and ExactObjective().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

RMSProp optimizer(1e-3, 1, 0.99, 1e-8, 5000000, 1e-9, true);
optimizer.Optimize(f, coordinates);

Simulated Annealing (SA)

An optimizer for arbitrary functions.

Simulated Annealing is an stochastic optimization algorithm which is able to deliver near-optimal results quickly without knowing the gradient of the function being optimized. It has a unique hill climbing capability that makes it less vulnerable to local minima. This implementation uses exponential cooling schedule and feedback move control by default, but the cooling schedule can be changed via a template parameter.

Constructors

SA<CoolingScheduleType>(coolingSchedule)
SA<CoolingScheduleType>(coolingSchedule, maxIterations)
SA<CoolingScheduleType>(coolingSchedule, maxIterations, initT, initMoves, moveCtrlSweep, tolerance, maxToleranceSweep, maxMoveCoef, initMoveCoef, gain)

The CoolingScheduleType template parameter implements a policy to update the temperature. The ExponentialSchedule class is available for use; it has a constructor ExponentialSchedule(lambda) where lambda is the cooling speed (default 0.001). Custom schedules may be created by implementing a class with at least the single member method below:

Click to collapse/expand example code.

// Return the next temperature given the current system status.
double NextTemperature(const double currentTemperature,
                       const double currentEnergy);

For convenience, the default cooling schedule is ExponentialSchedule, so the shorter type SA<> may be used instead of the equivalent SA<ExponentialSchedule>.

Attributes

type	name	description	default
`CoolingScheduleType`	`coolingSchedule`	Instantiated cooling schedule (default ExponentialSchedule).	CoolingScheduleType()
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 indicates no limit).	`1000000`
`double`	`initT`	Initial temperature.	`10000.0`
`size_t`	`initMoves`	Number of initial iterations without changing temperature.	`1000`
`size_t`	`moveCtrlSweep`	Sweeps per feedback move control.	`100`
`double`	`tolerance`	Tolerance to consider system frozen.	`1e-5`
`size_t`	`maxToleranceSweep`	Maximum sweeps below tolerance to consider system frozen.	`3`
`double`	`maxMoveCoef`	Maximum move size.	`20`
`double`	`initMoveCoef`	Initial move size.	`0.3`
`double`	`gain`	Proportional control in feedback move control.	`0.3`

Attributes of the optimizer may also be changed via the member methods CoolingSchedule(), MaxIterations(), InitT(), InitMoves(), MoveCtrlSweep(), Tolerance(), MaxToleranceSweep(), MaxMoveCoef(), InitMoveCoef(), and Gain().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

SA<> optimizer(ExponentialSchedule(), 1000000, 1000., 1000, 100, 1e-10, 3, 1.5,
    0.5, 0.3);
optimizer.Optimize(f, coordinates);

Simultaneous Perturbation Stochastic Approximation (SPSA)

An optimizer for arbitrary functions.

The SPSA algorithm approximates the gradient of the function by finite differences along stochastic directions.

Constructors

SPSA(alpha, gamma, stepSize, evaluationStepSize, maxIterations, tolerance)

Attributes

type	name	description	default
`double`	`alpha`	Scaling exponent for the step size.	`0.602`
`double`	`gamma`	Scaling exponent for evaluation step size.	`0.101`
`double`	`stepSize`	Scaling parameter for step size (named as ‘a’ in the paper).	`0.16`
`double`	`evaluationStepSize`	Scaling parameter for evaluation step size (named as ‘c’ in the paper).	`0.3`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`

Attributes of the optimizer may also be changed via the member methods Alpha(), Gamma(), StepSize(), EvaluationStepSize(), and MaxIterations().

Examples:

Click to collapse/expand example code.

SphereFunction f(2);
arma::mat coordinates = f.GetInitialPoint();

SPSA optimizer(0.1, 0.102, 0.16, 0.3, 100000, 1e-5);
optimizer.Optimize(f, coordinates);

Stochastic Recursive Gradient Algorithm (SARAH/SARAH+)

An optimizer for differentiable separable functions.

StochAstic Recusive gRadient algoritHm (SARAH), is a variance reducing stochastic recursive gradient algorithm which employs the stochastic recursive gradient, for solving empirical loss minimization for the case of nonconvex losses.

Constructors

SARAHType<UpdatePolicyType>()
SARAHType<UpdatePolicyType>(stepSize, batchSize)
SARAHType<UpdatePolicyType>(stepSize, batchSize, maxIterations, innerIterations, tolerance, shuffle, updatePolicy, exactObjective)

The UpdatePolicyType template parameter specifies the update step used for the optimizer. The SARAHUpdate and SARAHPlusUpdate classes are available for use, and implement the standard SARAH update and SARAH+ update, respectively. A custom update rule can be used by implementing a class with the same method signatures.

For convenience the following typedefs have been defined:

SARAH (equivalent to SARAHType<SARAHUpdate>): the standard SARAH optimizer
SARAH_Plus (equivalent to SARAHType<SARAHPlusUpdate>): the SARAH+ optimizer

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Batch size to use for each step.	`32`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`1000`
`size_t`	`innerIterations`	The number of inner iterations allowed (0 means n / batchSize). Note that the full gradient is only calculated in the outer iteration.	`0`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`UpdatePolicyType`	`updatePolicy`	Instantiated update policy used to adjust the given parameters.	`UpdatePolicyType()`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer may also be changed via the member methods StepSize(), BatchSize(), MaxIterations(), InnerIterations(), Tolerance(), Shuffle(), UpdatePolicy(), and ExactObjective().

Note that the default value for updatePolicy is the default constructor for the UpdatePolicyType.

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

// Standard stochastic variance reduced gradient.
SARAH optimizer(0.01, 1, 5000, 0, 1e-5, true);
optimizer.Optimize(f, coordinates);

// Stochastic variance reduced gradient with Barzilai-Borwein.
SARAH_Plus optimizerPlus(0.01, 1, 5000, 0, 1e-5, true);
optimizerPlus.Optimize(f, coordinates);

Standard SGD

An optimizer for differentiable separable functions.

Stochastic Gradient Descent is a technique for minimizing a function which can be expressed as a sum of other functions. It’s likely better to use any of the other variants of SGD than this class; however, this standard SGD implementation may still be useful in some situations.

Constructors

StandardSGD()
StandardSGD(stepSize, batchSize)
StandardSGD(stepSize, batchSize, maxIterations, tolerance, shuffle, updatePolicy, decayPolicy, resetPolicy, exactObjective)

Note that StandardSGD is based on the templated type SGD<UpdatePolicyType, DecayPolicyType> with UpdatePolicyType= VanillaUpdate and DecayPolicyType = NoDecay.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Batch size to use for each step.	`32`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`UpdatePolicyType`	`updatePolicy`	Instantiated update policy used to adjust the given parameters.	`UpdatePolicyType()`
`DecayPolicyType`	`decayPolicy`	Instantiated decay policy used to adjust the step size.	`DecayPolicyType()`
`bool`	`resetPolicy`	Flag that determines whether update policy parameters are reset before every Optimize call.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

StandardSGD optimizer(0.01, 32, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

Stochastic Gradient Descent with Restarts (SGDR)

An optimizer for differentiable separable functions.

SGDR is based on Mini-batch Stochastic Gradient Descent class and simulates a new warm-started run/restart once a number of epochs are performed.

Constructors

SGDR<UpdatePolicyType>()
SGDR<UpdatePolicyType>(epochRestart, multFactor, batchSize, stepSize)
SGDR<UpdatePolicyType>(epochRestart, multFactor, batchSize, stepSize, maxIterations, tolerance, shuffle, updatePolicy)
SGDR<UpdatePolicyType>(epochRestart, multFactor, batchSize, stepSize, maxIterations, tolerance, shuffle, updatePolicy, resetPolicy, exactObjective)

The UpdatePolicyType template parameter controls the update policy used during the iterative update process. The MomentumUpdate class is available for use, and custom behavior can be achieved by implementing a class with the same method signatures as MomentumUpdate.

For convenience, the default type of UpdatePolicyType is MomentumUpdate, so the shorter type SGDR<> can be used instead of the equivalent SGDR<MomentumUpdate>.

Attributes

type	name	description	default
`size_t`	`epochRestart`	Initial epoch where decay is applied.	`50`
`double`	`multFactor`	Batch size multiplication factor.	`2.0`
`size_t`	`batchSize`	Size of each mini-batch.	`1000`
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the mini-batch order is shuffled; otherwise, each mini-batch is visited in linear order.	`true`
`UpdatePolicyType`	`updatePolicy`	Instantiated update policy used to adjust the given parameters.	`UpdatePolicyType()`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer can also be modified via the member methods EpochRestart(), MultFactor(), BatchSize(), StepSize(), MaxIterations(), Tolerance(), Shuffle(), UpdatePolicy(), ResetPolicy(), and ExactObjective().

Note that the default value for updatePolicy is the default constructor for the UpdatePolicyType.

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

SGDR<> optimizer(50, 2.0, 1, 0.01, 10000, 1e-3);
optimizer.Optimize(f, coordinates);

Snapshot Stochastic Gradient Descent with Restarts (SnapshotSGDR)

An optimizer for differentiable separable functions.

SnapshotSGDR simulates a new warm-started run/restart once a number of epochs are performed using the Snapshot Ensembles technique.

Constructors

SnapshotSGDR<UpdatePolicyType>()
SnapshotSGDR<UpdatePolicyType>(epochRestart, multFactor, batchSize, stepSize)
SnapshotSGDR<UpdatePolicyType>(epochRestart, multFactor, batchSize, stepSize, maxIterations, tolerance, shuffle, snapshots, accumulate, updatePolicy)
SnapshotSGDR<UpdatePolicyType>(epochRestart, multFactor, batchSize, stepSize, maxIterations, tolerance, shuffle, snapshots, accumulate, updatePolicy, resetPolicy, exactObjective)

For convenience, the default type of UpdatePolicyType is MomentumUpdate, so the shorter type SnapshotSGDR<> can be used instead of the equivalent SnapshotSGDR<MomentumUpdate>.

Attributes

type	name	description	default
`size_t`	`epochRestart`	Initial epoch where decay is applied.	`50`
`double`	`multFactor`	Batch size multiplication factor.	`2.0`
`size_t`	`batchSize`	Size of each mini-batch.	`1000`
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the mini-batch order is shuffled; otherwise, each mini-batch is visited in linear order.	`true`
`size_t`	`snapshots`	Maximum number of snapshots.	`5`
`bool`	`accumulate`	Accumulate the snapshot parameter.	`true`
`UpdatePolicyType`	`updatePolicy`	Instantiated update policy used to adjust the given parameters.	`UpdatePolicyType()`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer can also be modified via the member methods EpochRestart(), MultFactor(), BatchSize(), StepSize(), MaxIterations(), Tolerance(), Shuffle(), Snapshots(), Accumulate(), UpdatePolicy(), ResetPolicy(), and ExactObjective().

The Snapshots() function returns a std::vector<arma::mat>& (a vector of snapshots of the parameters), not a size_t representing the maximum number of snapshots.

Note that the default value for updatePolicy is the default constructor for the UpdatePolicyType.

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

SnapshotSGDR<> optimizer(50, 2.0, 1, 0.01, 10000, 1e-3);
optimizer.Optimize(f, coordinates);

SMORMS3

An optimizer for differentiable separable functions.

SMORMS3 is a hybrid of RMSprop, which is trying to estimate a safe and optimal distance based on curvature or perhaps just normalizing the step-size in the parameter space.

Constructors

SMORMS3()
SMORMS3(stepSize, batchSize)
SMORMS3(stepSize, batchSize, epsilon, maxIterations, tolerance)
SMORMS3(stepSize, batchSize, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process at each step.	`32`
`double`	`epsilon`	Value used to initialize the mean squared gradient parameter.	`1e-16`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the mini-batch order is shuffled; otherwise, each mini-batch is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer can also be modified via the member methods StepSize(), BatchSize(), Epsilon(), MaxIterations(), Tolerance(), Shuffle(), ResetPolicy(), and ExactObjective().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

SMORMS3 optimizer(0.001, 1, 1e-16, 5000000, 1e-9, true);
optimizer.Optimize(f, coordinates);

Standard stochastic variance reduced gradient (SVRG)

An optimizer for differentiable separable functions.

Stochastic Variance Reduced Gradient is a technique for minimizing smooth and strongly convex problems.

Constructors

SVRGType<UpdatePolicyType, DecayPolicyType>()
SVRGType<UpdatePolicyType, DecayPolicyType>(stepSize)
SVRGType<UpdatePolicyType, DecayPolicyType>(stepSize, batchSize, maxIterations, innerIterations)
SVRGType<UpdatePolicyType, DecayPolicyType>(stepSize, batchSize, maxIterations, innerIterations, tolerance, shuffle, updatePolicy, decayPolicy, resetPolicy, exactObjective)

The UpdatePolicyType template parameter controls the update step used by SVRG during the optimization. The SVRGUpdate class is available for use and custom update behavior can be achieved by implementing a class with the same method signatures as SVRGUpdate.

The DecayPolicyType template parameter controls the decay policy used to adjust the step size during the optimization. The BarzilaiBorweinDecay and NoDecay classes are available for use. Custom decay functionality can be achieved by implementing a class with the same method signatures.

For convenience the following typedefs have been defined:

SVRG (equivalent to SVRGType<SVRGUpdate, NoDecay>): the standard SVRG technique
SVRG_BB (equivalent to SVRGType<SVRGUpdate, BarzilaiBorweinDecay>): SVRG with the Barzilai-Borwein decay policy

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Initial batch size.	`32`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`1000`
`size_t`	`innerIterations`	The number of inner iterations allowed (0 means n / batchSize). Note that the full gradient is only calculated in the outer iteration.	`0`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the batch order is shuffled; otherwise, each batch is visited in linear order.	`true`
`UpdatePolicyType`	`updatePolicy`	Instantiated update policy used to adjust the given parameters.	`UpdatePolicyType()`
`DecayPolicyType`	`decayPolicy`	Instantiated decay policy used to adjust the step size.	`DecayPolicyType()`
`bool`	`resetPolicy`	Flag that determines whether update policy parameters are reset before every Optimize call.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), MaxIterations(), InnerIterations(), Tolerance(), Shuffle(), UpdatePolicy(), DecayPolicy(), ResetPolicy(), and ExactObjective().

Note that the default values for the updatePolicy and decayPolicy parameters are simply the default constructors of the UpdatePolicyType and DecayPolicyType classes.

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

// Standard stochastic variance reduced gradient.
SVRG optimizer(0.005, 1, 300, 0, 1e-10, true);
optimizer.Optimize(f, coordinates);

// Stochastic variance reduced gradient with Barzilai-Borwein.
SVRG_BB bbOptimizer(0.005, batchSize, 300, 0, 1e-10, true, SVRGUpdate(),
    BarzilaiBorweinDecay(0.1));
bbOptimizer.Optimize(f, coordinates);

SPALeRA Stochastic Gradient Descent (SPALeRASGD)

An optimizer for differentiable separable functions.

SPALeRA involves two components: a learning rate adaptation scheme, which ensures that the learning system goes as fast as it can; and a catastrophic event manager, which is in charge of detecting undesirable behaviors and getting the system back on track.

Constructors

SPALeRASGD<DecayPolicyType>()
SPALeRASGD<DecayPolicyType>(stepSize, batchSize)
SPALeRASGD<DecayPolicyType>(stepSize, batchSize, maxIterations, tolerance)
SPALeRASGD<DecayPolicyType>(stepSize, batchSize, maxIterations, tolerance, lambda, alpha, epsilon, adaptRate, shuffle, decayPolicy, resetPolicy, exactObjective)

The DecayPolicyType template parameter controls the decay in the step size during the course of the optimization. The NoDecay class is available for use; custom behavior can be achieved by implementing a class with the same method signatures.

By default, DecayPolicyType is set to NoDecay, so the shorter type SPALeRASGD<> can be used instead of the equivalent SPALeRASGD<NoDecay>.

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.01`
`size_t`	`batchSize`	Initial batch size.	`32`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`double`	`lambda`	Page-Hinkley update parameter.	`0.01`
`double`	`alpha`	Memory parameter of the Agnostic Learning Rate adaptation.	`0.001`
`double`	`epsilon`	Numerical stability parameter.	`1e-6`
`double`	`adaptRate`	Agnostic learning rate update rate.	`3.10e-8`
`bool`	`shuffle`	If true, the batch order is shuffled; otherwise, each batch is visited in linear order.	`true`
`DecayPolicyType`	`decayPolicy`	Instantiated decay policy used to adjust the step size.	`DecayPolicyType()`
`bool`	`resetPolicy`	Flag that determines whether update policy parameters are reset before every Optimize call.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), MaxIterations(), Tolerance(), Lambda(), Alpha(), Epsilon(), AdaptRate(), Shuffle(), DecayPolicy(), ResetPolicy(), and ExactObjective().

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

SPALeRASGD<> optimizer(0.05, 1, 10000, 1e-4);
optimizer.Optimize(f, coordinates);

SWATS

An optimizer for differentiable separable functions.

SWATS is an optimizer that uses a simple strategy to switch from Adam to standard SGD when a triggering condition is satisfied. The condition relates to the projection of Adam steps on the gradient subspace.

Constructors

SWATS()
SWATS(stepSize, batchSize)
SWATS(stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance)
SWATS(stepSize, batchSize, beta1, beta2, epsilon, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process at each step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`epsilon`	Value used to initialize the mean squared gradient parameter.	`1e-16`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the mini-batch order is shuffled; otherwise, each mini-batch is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer can also be modified via the member methods StepSize(), BatchSize(), Beta1(), Beta2(), Epsilon(), MaxIterations(), Tolerance(), Shuffle(), ResetPolicy(), and ExactObjective().

Examples:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

SWATS optimizer(0.001, 1, 0.9, 0.999, 1e-16, 5000000, 1e-9, true);
optimizer.Optimize(f, coordinates);

WNGrad

An optimizer for differentiable separable functions.

WNGrad is a general nonlinear update rule for the learning rate. WNGrad has near-optimal convergence rates in both the batch and stochastic settings.

Constructors

WNGrad()
WNGrad(stepSize, batchSize)
WNGrad(stepSize, batchSize, maxIterations, tolerance, shuffle)
WNGrad(stepSize, batchSize, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.562`
`size_t`	`batchSize`	Initial batch size.	`32`
`size_t`	`maxIterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the batch order is shuffled; otherwise, each batch is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Attributes of the optimizer may also be modified via the member methods StepSize(), BatchSize(), MaxIterations(), Tolerance(), Shuffle(), ResetPolicy(), and ExactObjective().

Examples

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

WNGrad<> optimizer(0.562, 1, 10000, 1e-4);
optimizer.Optimize(f, coordinates);

Yogi

An optimizer for differentiable separable functions.

Yogi is an optimization algorithm based on Adam with more fine-grained effective learning rate control, which uses additive updates instead of multiplicative updates for the moving average of the squared gradient. In addition, Yogi has similar theoretical guarantees on convergence as Adam.

Constructors

Yogi()
Yogi(stepSize, batchSize)
Yogi(stepSize, batchSize, beta1, beta2, eps, maxIterations)
Yogi(stepSize, batchSize, beta1, beta2, eps, maxIterations, tolerance, shuffle, resetPolicy, exactObjective)

Attributes

type	name	description	default
`double`	`stepSize`	Step size for each iteration.	`0.001`
`size_t`	`batchSize`	Number of points to process in a single step.	`32`
`double`	`beta1`	Exponential decay rate for the first moment estimates.	`0.9`
`double`	`beta2`	Exponential decay rate for the weighted infinity norm estimates.	`0.999`
`double`	`eps`	Value used to initialize the mean squared gradient parameter.	`1e-8`
`size_t`	`max_iterations`	Maximum number of iterations allowed (0 means no limit).	`100000`
`double`	`tolerance`	Maximum absolute tolerance to terminate algorithm.	`1e-5`
`bool`	`shuffle`	If true, the function order is shuffled; otherwise, each function is visited in linear order.	`true`
`bool`	`resetPolicy`	If true, parameters are reset before every Optimize call; otherwise, their values are retained.	`true`
`bool`	`exactObjective`	Calculate the exact objective (Default: estimate the final objective obtained on the last pass over the data).	`false`

Examples

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

Yogi optimizer(0.001, 32, 0.9, 0.999, 1e-8, 100000, 1e-5, true);
optimizer.Optimize(f, coordinates);

Callback documentation

Callbacks in ensmallen are methods that are called at various states during the optimization process, which can be used to implement and control behaviors such as:

Changing the learning rate.
Printing of the current objective.
Sending a message when the optimization hits a specific state such us a minimal objective.

Callbacks can be passed as an argument to the Optimize() function:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));

// Pass the built-in *PrintLoss* callback as the last argument to the
// *Optimize()* function.
optimizer.Optimize(f, coordinates, PrintLoss());

Passing multiple callbacks is just the same as passing a single callback:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));

// Pass the built-in *PrintLoss* and *EarlyStopAtMinLoss* callback as the last
// argument to the *Optimize()* function.
optimizer.Optimize(f, coordinates, PrintLoss(), EarlyStopAtMinLoss());

It is also possible to pass a callback instantiation that allows accessing of internal callback parameters at a later state:

Click to collapse/expand example code.

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));

// Create an instantiation of the built-in *StoreBestCoordinates* callback,
// which will store the best objective and the corresponding model parameter
// that can be accessed later.
StoreBestCoordinates<> callback;

// Pass an instantiation of the built-in *StoreBestCoordinates* callback as the
// last argument to the *Optimize()* function.
optimizer.Optimize(f, coordinates, callback);

// Print the minimum objective that is stored inside the *StoreBestCoordinates*
// callback that was passed to the *Optimize()* call.
std::cout << callback.BestObjective() << std::endl;

Built-in Callbacks

EarlyStopAtMinLoss

Stops the optimization process if the loss stops decreasing or no improvement has been made.

Constructors

EarlyStopAtMinLoss()
EarlyStopAtMinLoss(patience)
EarlyStopAtMinLoss(func)
EarlyStopAtMinLoss(func,patience)

Attributes

type	name	description	default
`size_t`	`patience`	The number of epochs to wait after the minimum loss has been reached.	`10`
`std::function<double(const arma::mat&)>`	`func`	A callback to return immediate loss evaluated by the function.

Note that for the func argument above, if a different matrix type is desired, instead of using the class EarlyStopAtMinLoss, the class EarlyStopAtMinLossType<MatType> should be used.

Examples:

Click to collapse/expand example code.

AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, EarlyStopAtMinLoss());

Another example of using lambda in the constructor.

// Generate random training data and labels.
arma::mat trainingData(5, 100, arma::fill::randu);
arma::Row<size_t> trainingLabels =
    arma::randi<arma::Row<size_t>>(100, arma::distr_param(0, 1));
// Generate a validation set.
arma::mat validationData(5, 100, arma::fill::randu);
arma::Row<size_t> validationLabels =
    arma::randi<arma::Row<size_t>>(100, arma::distr_param(0, 1));

// Create a LogisticRegressionFunction for both the training and validation data.
LogisticRegressionFunction lrfTrain(trainingData, trainingLabels);
LogisticRegressionFunction lrfValidation(validationData, validationLabels);

// Create a callback that will terminate when the validation loss starts to
// increase.
EarlyStopAtMinLoss cb(
    [&](const arma::mat& coordinates)
    {
      // You could also, e.g., print the validation loss here to watch it converge.
      return lrfValidation.Evaluate(coordinates);
    });

arma::mat coordinates = lrfTrain.GetInitialPoint();
SMORMS3 smorms3;
smorms3.Optimize(lrfTrain, coordinates, cb);

GradClipByNorm

One difficulty with optimization is that large parameter gradients can lead an optimizer to update the parameters strongly into a region where the loss function is much greater, effectively undoing much of the work done to get to the current solution. Such large updates during the optimization can cause a numerical overflow or underflow, often referred to as “exploding gradients”. The exploding gradient problem can be caused by: Choosing the wrong learning rate which leads to huge updates in the gradients. Failing to scale a data set leading to very large differences between data points. Applying a loss function that computes very large error values.

A common answer to the exploding gradients problem is to change the derivative of the error before applying the update step. One option is to clip the norm ||g|| of the gradient g before a parameter update. So given the gradient, and a maximum norm value, the callback normalizes the gradient so that its L2-norm is less than or equal to the given maximum norm value.

Constructors

GradClipByNorm(maxNorm)

Attributes

type	name	description	default
`double`	`maxNorm`	The maximum clipping value.

Examples:

Click to collapse/expand example code.

AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, GradClipByNorm(0.3));

GradClipByValue

A common answer to the exploding gradients problem is to change the derivative of the error before applying the update step. One option is to clip the parameter gradient element-wise before a parameter update.

Constructors

GradClipByValue(min, max)

Attributes

type	name	description	default
`double`	`min`	The minimum value to clip to.
`double`	`max`	The maximum value to clip to.

Examples:

Click to collapse/expand example code.

AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, GradClipByValue(0, 1.3));

PrintLoss

Callback that prints loss to stdout or a specified output stream.

Constructors

PrintLoss()
PrintLoss(output)

Attributes

type	name	description	default
`std::ostream`	`output`	Ostream which receives output from this object.	`stdout`

Examples:

Click to collapse/expand example code.

AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, PrintLoss());

ProgressBar

Callback that prints a progress bar to stdout or a specified output stream.

Constructors

ProgressBar()
ProgressBar(width)
ProgressBar(width, output)

Attributes

type	name	description	default
`size_t`	`width`	Width of the bar.	`70`
`std::ostream`	`output`	Ostream which receives output from this object.	`stdout`

Examples:

Click to collapse/expand example code.

AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, ProgressBar());

Report

Callback that prints a optimizer report to stdout or a specified output stream.

Constructors

Report()
Report(iterationsPercentage)
Report(iterationsPercentage, output)
Report(iterationsPercentage, output, outputMatrixSize)

Attributes

type	name	description	default
`double`	`iterationsPercentage`	The number of iterations to report in percent, between [0, 1].	`0.1`
`std::ostream`	`output`	Ostream which receives output from this object.	`stdout`
`size_t`	`outputMatrixSize`	The number of values to output for the function coordinates.	`4`

Examples:

Click to collapse/expand example code.

AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();
optimizer.Optimize(f, coordinates, Report(0.1));

Click to collapse/expand example output.

Optimization Report
--------------------------------------------------------------------------------

Initial Coordinates:
  -1.2000   1.0000

Final coordinates:
  -1.0490   1.1070

iter          loss          loss change   |gradient|    step size     total time
0             24.2          0             233           1             4.27e-05
100           8.6           15.6          104           1             0.000215
200           5.26          3.35          48.7          1             0.000373
300           4.49          0.767         23.4          1             0.000533
400           4.31          0.181         11.3          1             0.000689
500           4.27          0.0431        5.4           1             0.000846
600           4.26          0.012         2.86          1             0.00101
700           4.25          0.00734       2.09          1             0.00117
800           4.24          0.00971       1.95          1             0.00132
900           4.22          0.0146        1.91          1             0.00148

--------------------------------------------------------------------------------

Version:
ensmallen:                    2.13.0 (Automatically Automated Automation)
armadillo:                    9.900.1 (Nocturnal Misbehaviour)

Function:
Number of functions:          1
Coordinates rows:             2
Coordinates columns:          1

Loss:
Initial                       24.2
Final                         4.2
Change                        20

Optimizer:
Maximum iterations:           1000
Reached maximum iterations:   true
Batchsize:                    1
Iterations:                   1000
Number of epochs:             1001
Initial step size:            1
Final step size:              1
Coordinates max. norm:        233
Evaluate calls:               1000
Gradient calls:               1000
Time (in seconds):            0.00163

StoreBestCoordinates

Callback that stores the model parameter after every epoch if the objective decreased.

Constructors

StoreBestCoordinates<ModelMatType>()

The ModelMatType template parameter refers to the matrix type of the model parameter.

Attributes

The stored model parameter can be accessed via the member method BestCoordinates() and the best objective via BestObjective().

Examples:

Click to collapse/expand example code.

AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);

RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

StoreBestCoordinates<arma::mat> cb;
optimizer.Optimize(f, coordinates, cb);

std::cout << "The optimized model found by AdaDelta has the "
      << "parameters " << cb.BestCoordinates();

Callback States

Callbacks are called at several states during the optimization process:

At the beginning and end of the optimization process.
After any call to Evaluate() and EvaluateConstraint().
After any call to Gradient() and GradientConstraint().
At the start and end of an epoch.

Each callback provides optimization-relevant information that can be accessed or modified.

BeginOptimization

Called at the beginning of the optimization process.

void BeginOptimization(optimizer, function, coordinates)

Attributes

type	name	description
`OptimizerType`	`optimizer`	The optimizer used to update the function.
`FunctionType`	`function`	The function to be optimized.
`MatType`	`coordinates`	The current function parameter.

EndOptimization

Called at the end of the optimization process.

void EndOptimization(optimizer, function, coordinates)

Attributes

type	name	description
`OptimizerType`	`optimizer`	The optimizer used to update the function.
`FunctionType`	`function`	The function to be optimized.
`MatType`	`coordinates`	The current function parameter.

Evaluate

Called after any call to Evaluate().

bool Evaluate(optimizer, function, coordinates, objective)

If the callback returns true, the optimization will be terminated.

Attributes

type	name	description
`OptimizerType`	`optimizer`	The optimizer used to update the function.
`FunctionType`	`function`	The function to be optimized.
`MatType`	`coordinates`	The current function parameter.
`double`	`objective`	Objective value of the current point.

EvaluateConstraint

Called after any call to EvaluateConstraint().

bool EvaluateConstraint(optimizer, function, coordinates, constraint, constraintValue)

If the callback returns true, the optimization will be terminated.

Attributes

type	name	description
`OptimizerType`	`optimizer`	The optimizer used to update the function.
`FunctionType`	`function`	The function to be optimized.
`MatType`	`coordinates`	The current function parameter.
`size_t`	`constraint`	The index of the constraint.
`double`	`constraintValue`	Constraint value of the current point.

Gradient

Called after any call to Gradient().

bool Gradient(optimizer, function, coordinates, gradient)

If the callback returns true, the optimization will be terminated.

Attributes

type	name	description
`OptimizerType`	`optimizer`	The optimizer used to update the function.
`FunctionType`	`function`	The function to be optimized.
`MatType`	`coordinates`	The current function parameter.
`GradType`	`gradient`	Matrix that holds the gradient.

GradientConstraint

Called after any call to GradientConstraint().

bool GradientConstraint(optimizer, function, coordinates, constraint, gradient)

If the callback returns true, the optimization will be terminated.

Attributes

type	name	description
`OptimizerType`	`optimizer`	The optimizer used to update the function.
`FunctionType`	`function`	The function to be optimized.
`MatType`	`coordinates`	The current function parameter.
`size_t`	`constraint`	The index of the constraint.
`GradType`	`gradient`	Matrix that holds the gradient.

BeginEpoch

Called at the beginning of a pass over the data. The objective may be exact or an estimate depending on exactObjective value.

bool BeginEpoch(optimizer, function, coordinates, epoch, objective)

If the callback returns true, the optimization will be terminated.

Attributes

type	name	description
`OptimizerType`	`optimizer`	The optimizer used to update the function.
`FunctionType`	`function`	The function to be optimized.
`MatType`	`coordinates`	The current function parameter.
`size_t`	`epoch`	The index of the current epoch.
`double`	`objective`	Objective value of the current point.

EndEpoch

Called at the end of a pass over the data. The objective may be exact or an estimate depending on exactObjective value.

bool EndEpoch(optimizer, function, coordinates, epoch, objective)

If the callback returns true, the optimization will be terminated.

Attributes

type	name	description
`OptimizerType`	`optimizer`	The optimizer used to update the function.
`FunctionType`	`function`	The function to be optimized.
`MatType`	`coordinates`	The current function parameter.
`size_t`	`epoch`	The index of the current epoch.
`double`	`objective`	Objective value of the current point.

GenerationalStepTaken

Called after the evolution of a single generation. Intended specifically for MultiObjective Optimizers.

bool GenerationalStepTaken(optimizer, function, coordinates, objectives, frontIndices)

If the callback returns true, the optimization will be terminated.

Attributes

type	name	description
`OptimizerType`	`optimizer`	The optimizer used to update the function.
`FunctionType`	`function`	The function to be optimized.
`MatType`	`coordinates`	The current function parameter.
`ObjectivesVecType`	`objectives`	The set of calculated objectives so far.
`IndicesType`	`frontIndices`	The indices of the members belonging to Pareto Front.

Custom Callbacks

Learning rate scheduling

Setting the learning rate is crucially important when training because it controls both the speed of convergence and the ultimate performance of the model. One of the simplest learning rate strategies is to have a fixed learning rate throughout the training process. Choosing a small learning rate allows the optimizer to find good solutions, but this comes at the expense of limiting the initial speed of convergence. To overcome this tradeoff, changing the learning rate as more epochs have passed is commonly done in model training. The Evaluate method in combination with the StepSize method of the optimizer can be used to update the variables.

Example code showing how to implement a custom callback to change the learning rate is given below.

Click to collapse/expand example code.

class ExponentialDecay
{
  // Set up the exponential decay learning rate scheduler with the user
  // specified decay value.
  ExponentialDecay(const double decay) : decay(decay), learningRate(0) { }


  // Callback function called at the start of the optimization process.
  // In this example we will use this to save the initial learning rate.
  template<typename OptimizerType, typename FunctionType, typename MatType>
  void BeginOptimization(OptimizerType& /* optimizer */,
                         FunctionType& /* function */,
                         MatType& /* coordinates */)
  {
    // Save the initial learning rate.
    learningRate = optimizer.StepSize();
  }

  // Callback function called at the end of a pass over the data. We are only
  // interested in the current epoch and the optimizer, we ignore the rest.
  template<typename OptimizerType, typename FunctionType, typename MatType>
  bool EndEpoch(OptimizerType& optimizer,
                FunctionType& /* function */,
                const MatType& /* coordinates */,
                const size_t epoch,
                const double objective)
  {
    // Update the learning rate.
    optimizer.StepSize() = learningRate * (1.0 - std::pow(decay,
        (double) epoch));

    // Do not terminate the optimization.
    return false;
  }

  double learningRate;
};

int main()
{
  // First, generate some random data, with 10000 points and 10 dimensions.
  // This data has no pattern and as such will make a model that's not very
  // useful---but the purpose here is just demonstration. :)
  //
  // For a more "real world" situation, load a dataset from file using X.load()
  // and y.load() (but make sure the matrix is column-major, so that each
  // observation/data point corresponds to a *column*, *not* a row.
  arma::mat data(10, 10000, arma::fill::randn);
  arma::rowvec responses(10000, arma::fill::randn);

  // Create a starting point for our optimization randomly.  The model has 10
  // parameters, so the shape is 10x1.
  arma::mat startingPoint(10, 1, arma::fill::randn);

  // Construct the objective function.
  LinearRegressionFunction lrf(data, responses);
  arma::mat lrfParams(startingPoint);

  // Create the StandardSGD optimizer with specified parameters.
  // The ens::StandardSGD type can be replaced with any ensmallen optimizer
  //that can handle differentiable functions.
  StandardSGD optimizer(0.001, 1, 0, 1e-15, true);

  // Use the StandardSGD optimizer with specified parameters to minimize the
  // LinearRegressionFunction and pass the *exponential decay*
  // callback function from above.
  optimizer.Optimize(lrf, lrfParams, ExponentialDecay(0.01));

  // Print the trained model parameter.
  std::cout << lrfParams.t();
}

Early stopping at minimum loss

Early stopping is a technique for controlling overfitting in machine learning models, especially neural networks, by stopping the optimization process before the model has trained for the maximum number of iterations.

Example code showing how to implement a custom callback to stop the optimization when the minimum of loss has been reached is given below.

Click to collapse/expand example code.

#include <ensmallen.hpp>

// This class implements early stopping at minimum loss callback function to
// terminate the optimization process early if the loss stops decreasing.
class EarlyStop
{
 public:
  // Set up the early stop at min loss class, which keeps track of the minimum
  // loss.
  EarlyStop() : bestObjective(std::numeric_limits<double>::max()) { }

  // Callback function called at the end of a pass over the data, which provides
  // the current objective. We are only interested in the objective and ignore
  // the rest.
  template<typename OptimizerType, typename FunctionType, typename MatType>
  bool EndEpoch(OptimizerType& /* optimizer */,
                FunctionType& /* function */,
                const MatType& /* coordinates */,
                const size_t /* epoch */,
                const double objective)
  {
    // Check if the given objective is lower as the previous objective.
    if (objective < bestObjective)
    {
      // Update the local objective.
      bestObjective = objective;
    }
    else
    {
      // Stop the optimization process.
      return true;
    }

    // Do not stop the optimization process.
    return false;
  }

  // Locally-stored best objective.
  double bestObjective;
};

int main()
{
  // First, generate some random data, with 10000 points and 10 dimensions.
  // This data has no pattern and as such will make a model that's not very
  // useful---but the purpose here is just demonstration. :)
  //
  // For a more "real world" situation, load a dataset from file using X.load()
  // and y.load() (but make sure the matrix is column-major, so that each
  // observation/data point corresponds to a *column*, *not* a row.
  arma::mat data(10, 10000, arma::fill::randn);
  arma::rowvec responses(10000, arma::fill::randn);

  // Create a starting point for our optimization randomly.  The model has 10
  // parameters, so the shape is 10x1.
  arma::mat startingPoint(10, 1, arma::fill::randn);

  // Construct the objective function.
  LinearRegressionFunction lrf(data, responses);
  arma::mat lrfParams(startingPoint);

  // Create the L_BFGS optimizer with default parameters.
  // The ens::L_BFGS type can be replaced with any ensmallen optimizer that can
  // handle differentiable functions.
  ens::L_BFGS lbfgs;

  // Use the L_BFGS optimizer with default parameters to minimize the
  // LinearRegressionFunction and pass the *early stopping at minimum loss*
  // callback function from above.
  lbfgs.Optimize(lrf, lrfParams, EarlyStop());

  // Print the trained model parameter.
  std::cout << lrfParams.t();
}

Note that we have simply passed an instantiation of EarlyStop the rest is handled inside the optimizer.

ensmallen provides a more complete and general implementation of a early stopping at minimum loss callback function.

History/changelog

ensmallen 2.21.1: “Bent Antenna”

2024-02-15

Fix numerical precision issues for small-gradient L-BFGS scaling factor computations (#392).
Ensure the tests are built with optimisation enabled (#394).

ensmallen 2.21.0: “Bent Antenna”

2023-11-27

Clarify return values for different callback types (#383).
Fix return types of callbacks (#382).
Minor cleanup for printing optimization reports via Report() (#385).

ensmallen 2.20.0: “Stripped Bolt Head”

2023-10-02

Implementation of Active CMAES (#367).
LBFGS: avoid generation of NaNs, and add checks for finite values (#368).
Fix CNE test tolerances (#360).
Rename SCD optimizer, to CD (#379).

ensmallen 2.19.1: “Eight Ball Deluxe”

2023-01-30

Avoid deprecation warnings in Armadillo 11.2+ (#347).

ensmallen 2.19.0: “Eight Ball Deluxe”

2022-04-06

Added DemonSGD and DemonAdam optimizers (#211).
Fix bug with Adam-like optimizers not resetting when resetPolicy is true. (#340).
Add Yogi optimizer (#232).
Add AdaBelief optimizer (#233).
Add AdaSqrt optimizer (#234).
Bump check for minimum supported version of Armadillo (#342).

ensmallen 2.18.2: “Fairmount Bagel”

2022-02-13

Update Catch2 to 2.13.8 (#336).
Fix epoch timing output (#337).

ensmallen 2.18.1: “Fairmount Bagel”

2021-11-19

Accelerate SGD test time (#330).
Fix potential infinite loop in CMAES (#331).
Fix SCD partial gradient test (#332).

ensmallen 2.18.0: “Fairmount Bagel”

2021-10-20

Add gradient value clipping and gradient norm scaling callback (#315).
Remove superfluous CMake option to build the tests (#313).
Bump minimum Armadillo version to 9.800 (#318).
Update Catch2 to 2.13.7 (#322).
Remove redundant template argument for C++20 compatibility (#324).
Fix MOEAD test stability (#327).

ensmallen 2.17.0: “Pachis Din Me Pesa Double”

2021-07-06

CheckArbitraryFunctionTypeAPI extended for MOO support (#283).
Refactor NSGA2 (#263, #304).
Add Indicators for Multiobjective optimizers (#285).
Make Callback flexible for MultiObjective Optimizers (#289).
Add ZDT Test Suite (#273).
Add MOEA-D/DE Optimizer (#269).
Introduce Policy Methods for MOEA/D-DE (#293).
Add Das-Dennis weight initialization method (#295).
Add Dirichlet Weight Initialization (#296).
Improved installation and compilation instructions (#300).
Disable building the tests by default for faster installation (#303).
Modify matrix initialisation to take into account default element zeroing in Armadillo 10.5 (#305).

ensmallen 2.16.2: “Severely Dented Can Of Polyurethane”

2021-03-24

Fix CNE test trials (#267).
Update Catch2 to 2.13.4 (#268).
Fix typos in documentation (#270, #271).
Add clarifying comments in problems/ implementations (#276).

ensmallen 2.16.1: “Severely Dented Can Of Polyurethane”

2021-03-02

Fix test compilation issue when ENS_USE_OPENMP is set (#255).
Fix CNE initial population generation to use normal distribution (#258).
Fix compilation warnings (#259).
Remove AdamSchafferFunctionN2Test test from Adam test suite to prevent spurious issue on some aarch64 (#265).

ensmallen 2.16.0: “Severely Dented Can Of Polyurethane”

2021-02-11

Expand README with example installation and add simple example program showing usage of the L-BFGS optimizer (#248).
Refactor tests to increase stability and reduce random errors (#249).

ensmallen 2.15.1: “Why Can’t I Manage To Grow Any Plants?”

2020-11-05

Fix include order to ensure traits is loaded before reports (#239).

ensmallen 2.15.0: “Why Can’t I Manage To Grow Any Plants?”

2020-11-01

Make a few tests more robust (#228).
Add release date to version information. (#226)
Fix typo in release script (#236).

ensmallen 2.14.2: “No Direction Home”

2020-08-31

Fix implementation of fonesca fleming problem function f1 and f2 type usage and negative signs. (#223)

ensmallen 2.14.1: “No Direction Home”

2020-08-19

Fix release script (remove hardcoded information, trim leading whitespaces introduced by wc -l in MacOS) (#216, #220).
Adjust tolerance for AugLagrangian convergence based on element type (#217).

ensmallen 2.14.0: “No Direction Home”

2020-08-10

Add NSGA2 optimizer for multi-objective functions (#149).
Update automatic website update release script (#207).
Clarify and fix documentation for constrained optimizers (#201).
Fix L-BFGS convergence when starting from a minimum (#201).
Add optimizer summary report callback (#213).

ensmallen 2.13.0: “Automatically Automated Automation”

2020-07-15

Fix CMake package export (#198).
Allow early stop callback to accept a lambda function (#165).

ensmallen 2.12.1: “Stir Crazy”

2020-04-20

Fix total number of epochs and time estimation for ProgressBar callback (#181).
Handle SpSubview_col and SpSubview_row in Armadillo 9.870 (#194).
Minor documentation fixes (#197).

ensmallen 2.12.0: “Stir Crazy”

2020-03-28

Correction in the formulation of sigma in CMA-ES (#183).
Remove deprecated methods from PrimalDualSolver implementation (#185.
Update logo (#186).

ensmallen 2.11.5: “The Poster Session Is Full”

2020-03-11

Change “mathematical optimization” term to “numerical optimization” in the documentation (#177).

ensmallen 2.11.4: “The Poster Session Is Full”

2020-03-03

Require new HISTORY.md entry for each PR. (#171, #172, #175).
Update/fix example documentation (#174).

ensmallen 2.11.3: “The Poster Session Is Full”

2020-02-19

Prevent spurious compiler warnings (#161).
Fix minor memory leaks (#167).
Revamp CMake configuration (#152).

ensmallen 2.11.2: “The Poster Session Is Full”

2020-01-16

Allow callback instantiation for SGD based optimizer (#138).
Minor test stability fixes on i386 (#156).
Fix Lookahead MaxIterations() check. (#159).

ensmallen 2.11.1: “The Poster Session Is Full”

2019-12-28

Fix Lookahead Synchronization period type (#153).

ensmallen 2.11.0: “The Poster Session Is Full”

2019-12-24

Add Lookahead (#138).
Add AdaBound and AMSBound (#137).

ensmallen 2.10.5: “Fried Chicken”

2019-12-13

SGD callback test 32-bit safety (big number) (#143).
Use “arbitrary” and “separable” terms in static function type checks (#145).
Remove ‘using namespace std’ from problems/ files (#147).

ensmallen 2.10.4: “Fried Chicken”

2019-11-18

Add optional tests building. (#141).
Make code samples collapsible in the documentation. (#140).

ensmallen 2.10.3: “Fried Chicken”

2019-09-26

Fix ParallelSGD runtime bug. (#135).
Add additional L-BFGS convergence check (#136).

ensmallen 2.10.2: “Fried Chicken”

2019-09-11

Add release script to rel/ for maintainers (#128).
Fix Armadillo version check (#133).

ensmallen 2.10.1: “Fried Chicken”

2019-09-10

Documentation fix for callbacks (#129.
Compatibility fixes for ensmallen 1.x (#131).

ensmallen 2.10.0: “Fried Chicken”

2019-09-07

All Optimize() functions now take any matrix type; so, e.g., arma::fmat or arma::sp_mat can be used for optimization. See the documentation for more details (#113, #119).
Introduce callback support. Callbacks can be appended as the last arguments of an Optimize() call, and can perform custom behavior at different points during the optimization. See the documentation for more details (#119).
Slight speedups for FrankWolfe optimizer (#127).

ensmallen 1.16.2: “Loud Alarm Clock”

2019-08-12

Fix PSO return type bug (#126).

ensmallen 1.16.1: “Loud Alarm Clock”

2019-08-11

Update HISTORY.md to use Markdown links to the PR and add release names.
Fix PSO return type bug (#124).

ensmallen 1.16.0: “Loud Alarm Clock”

2019-08-09

Add option to avoid computing exact objective at the end of the optimization (#109).
Fix handling of curvature for BigBatchSGD (#118).
Reduce runtime of tests (#118).
Introduce local-best particle swarm optimization, LBestPSO, for unconstrained optimization problems (#86).

ensmallen 1.15.1: “Wrong Side Of The Road”

2019-05-22

Fix -Wreorder in qhadam warning (#115).
Fix -Wunused-private-field warning in spsa (#115).
Add more warning output for gcc/clang (#116).

ensmallen 1.15.0: “Wrong Side Of The Road”

2019-05-14

Added QHAdam and QHSGD optimizers (#81).

ensmallen 1.14.4: “Difficult Crimp”

2019-05-12

Fixes for BigBatchSGD (#91).

ensmallen 1.14.3: “Difficult Crimp”

2019-05-06

Handle eig_sym() failures correctly (#100).

ensmallen 1.14.2: “Difficult Crimp”

2019-03-14

SPSA test tolerance fix (#97).
Minor documentation fixes (#95, #98).
Fix newlines at end of file (#92).

ensmallen 1.14.1: “Difficult Crimp”

2019-03-09

Fixes for SPSA (#87).
Optimized CNE and DE (#90). Changed initial population generation in CNE to be a normal distribution about the given starting point, which should accelerate convergence.

ensmallen 1.14.0: “Difficult Crimp”

2019-02-20

Add DE optimizer (#77).
Fix for Cholesky decomposition in CMAES (#83).

ensmallen 1.13.2: “Coronavirus Invasion”

2019-02-18

Minor documentation fixes (#82).

ensmallen 1.13.1: “Coronavirus Invasion”

2019-01-24

Fix -Wreorder warning (#75).

ensmallen 1.13.0: “Coronavirus Invasion”

2019-01-14

Enhance options for AugLagrangian optimizer (#66).
Add SPSA optimizer (#69).

ensmallen 1.12.2: “New Year’s Party”

2019-01-05

Fix list of contributors.

ensmallen 1.12.1: “New Year’s Party”

2019-01-03

Make sure all files end with newlines.

ensmallen 1.12.0: “New Year’s Party”

2018-12-30

Add link to ensmallen PDF to README.md.
Minor documentation fixes. Remove too-verbose documentation from source for each optimizer (#61).
Add FTML optimizer (#48).
Add SWATS optimizer (#42).
Add Padam optimizer (#46).
Add Eve optimizer (#45).
Add ResetPolicy() to SGD-like optimizers (#60).

ensmallen 1.11.1: “Jet Lag”

2018-11-29

Minor documentation fixes.

ensmallen 1.11.0: “Jet Lag”

2018-11-28

Add WNGrad optimizer.
Fix header name in documentation samples.

ensmallen 1.10.1: “Corporate Catabolism”

2018-11-16

Fixes for GridSearch optimizer.
Include documentation with release.

ensmallen 1.10.0: “Corporate Catabolism”

2018-10-20

Initial release.
Includes the ported optimization framework from mlpack (http://www.mlpack.org/).

ensmallen documentation

Citation

Contents

Function type documentation

Arbitrary functions

Example: squared function optimization

Differentiable functions

Partially differentiable functions

Arbitrary separable functions

Differentiable separable functions

Sparse differentiable separable functions

Categorical functions

Multi-objective functions

Constrained functions

Semidefinite programs

Alternate matrix types

Optimizer documentation

ActiveCMAES

Constructors

Attributes

Examples:

See also:

AdaBelief

Constructors

Attributes

Examples

See also:

AdaBound

Constructors

Attributes

Examples

See also:

AdaDelta

Constructors

Attributes

Examples:

See also:

Adagrad

Constructors

Attributes

Examples:

See also:

AdaSqrt

Constructors

Attributes

Examples:

See also:

Adam

Constructors

Attributes

Examples

See also:

AdaMax

Constructors

Attributes

Examples

See also:

AMSBound

Constructors

Attributes

Examples

See also:

AMSGrad

Constructors

Attributes

Examples

See also:

Augmented Lagrangian

Constructors

Attributes

Examples

See also:

Big Batch SGD

Constructors

Attributes

Examples:

See also:

Coordinate Descent (CD)

Constructors

Attributes