Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implmentation of Quasi-Hyperbolic Momentum for Adam #81

Merged
merged 23 commits into from
May 15, 2019

Conversation

niteya-shah
Copy link
Contributor

@niteya-shah niteya-shah commented Feb 12, 2019

Hello , This PR implements the update rule of QHAdam as discussed in Paper .https://arxiv.org/pdf/1810.06801v3.pdf
This PR should be straight forward as it is a simple update rule change like Nadam .

TODO list

  • Implement QHAdam update rule
  • Write Tests for QHAdam
  • Validate the improvements

Empty File , will discuss with assignee about the PR
QHAdam updates and an overloaded constructor for QHadam
1) Added SGD Update QHSGD
2) fixed some Adam issues wrt QHAdam
@niteya-shah
Copy link
Contributor Author

@zoq can you please review my PR ? The Paper also speaks of QHAdamR and QHAdamW which I will implement after my other PR is implemented

/**
* Tests the QHadam optimizer using a simple test function.
*/
TEST_CASE("SimpleQHdamTestFunction", "[AdamTest]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picky style comment, this should be SimpleQHAdamTestFunction.

/**
* Run QHAdam on logistic regression and make sure the results are acceptable.
*/
TEST_CASE("QHadamLogisticRegressionTest", "[AdamTest]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

/**
* Construct the Quasi Hyperbolic update policy with the given parameters.
* @param v The quasi hyperbolic term.
* @param momentum The momentum ter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

term instead of ter.

* QHSGD). This allows this method to combine the features of many optimisers
* and provide better optimisation control.
*
* TODO: Paper information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.


/**
*
* TODO: Fill in these details as well on info about algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍

const double biasCorrection1 = 1.0 - std::pow(beta1, iteration);
const double biasCorrection2 = 1.0 - std::pow(beta2, iteration);

const auto mDash = m / biasCorrection1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you. mind to use double here to be consistent with the rest of the codebase?


iterate -= stepSize * ((((1 - v1) * gradient) + v1 * mDash) /
(arma::sqrt(((1 - v2) * (gradient % gradient)) +
v2 * vDash) + epsilon ));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the extra space at the end.

const auto mDash = m / biasCorrection1;
const auto vDash = v / biasCorrection2;

iterate -= stepSize * ((((1 - v1) * gradient) + v1 * mDash) /
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind to add a comment here, "QHAdam recovers Adam when ν1=ν2= 1.".

// The number of iterations.
double iteration;

//The first quasi-hyperbolic term.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space right after //. Same for v2.

*
* @param epsilon The epsilon value used to initialise the squared gradient
* parameter.
* @param beta1 The smoothing parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the order is wrong.

@niteya-shah
Copy link
Contributor Author

@zoq sorry for the TODO , I had forgotten that I had put them , I have made the changes.

incorrect parameter type of double assigned to variable
@niteya-shah
Copy link
Contributor Author

@zoq can you review my PR?

/**
* @file adamw.hpp
* @author Niteya Shah
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add add the license and a description.

* @author Niteya Shah
*
*/
#ifndef ENSMALLEN_ADAM_ADAMW_HPP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not AdamW

* @param beta1 Exponential decay rate for the first moment estimates.
* @param beta2 Exponential decay rate for the weighted infinity norm
estimates.
* @param eps Value used to initialise the mean squared gradient parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind to rename this one to epsilon to match the rest of the code?

* @param resetPolicy If true, parameters are reset before every Optimize
* call; otherwise, their values are retained.
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the extra line here.

double& V2() { return optimizer.UpdatePolicy().V2(); }

private:
//! The Stochastic Gradient Descent object with AdamW policy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use QHAdam here.

/**
* QHAdam is a optimising strategy based on the Quasi-Hyperbolic step when
* applied to the Adam Optimiser . QH updates can be considered to a weighted
* average of the momentum . QHAdam , based on its paramterisation can recover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are some spaces we can remove.

/**
* Construct the QHAdam update policy with the given parameters.
*
* @param v1 The first quasi-hyperbolic term.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter description order dosn't match with the one below.

public:
/**
* Construct the Quasi Hyperbolic update policy with the given parameters.
* @param v The quasi hyperbolic term.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add another empty line between the method description and the beginning of the parameter description.

*/
QHUpdate(const double v = 0.7,
const double momentum = 0.999) :
momentum(momentum),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use 4 spaces instead of 2 here.

/**
* @file quasi_hyperbolic_momentum_sgd_test.cpp
* @author Niteya Shah
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.


#### See also:

* [Quasi-Hyperbolic Momentom and Adam For Deep Learning](https://arxiv.org/pdf/1810.06801.pdf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "Momentum".

*An optimizer for [differentiable separable functions](#differentiable-separable-functions).*

QHAdam is an optimizer which implements the QHAdam Adam algorithm
which uses Quasi-Hyperbolic Descent with the Adam Optimizer. This Method is the Adam Variant of the Quasi -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "This method is the Adam variant of the quasi-hyperbolic update for Adam".

Copy link
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, there are some minor style issue which I can fix during the merge process.

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niteya-shah nice contribution, sorry it took me so long to review this. There are some number of little style issues and documentation bits, but I think they can all be handled during merge. @zoq if you want, I can do the merge and style/doc fixes, just let me know. I saw that you had a couple changes you wanted to make too. In any case, if you merge it, we should add something to HISTORY.md and then I'll run my release script. :)

@@ -508,6 +508,7 @@ The following optimizers can be used with differentiable functions:
- [AdaDelta](#adadelta)
- [AdaGrad](#adagrad)
- [Adam](#adam)
- [QHAdam](#qhadam)
Copy link
Member

@rcurtin rcurtin May 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor note, maybe we can address it during merge---all the other optimizers are arranged alphabetically, we should probably keep it like that.

@@ -1394,6 +1394,114 @@ double Optimize(arma::mat& X);
* [Semidefinite programming on Wikipedia](https://en.wikipedia.org/wiki/Semidefinite_programming)
* [Semidefinite programs](#semidefinite-programs) (includes example usage of `PrimalDualSolver`)

## Quasi-Hyperbolic Momentum Update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called Quasi-Hyperbolic Momentum Update SGD (QHSGD)? The anchors will need to be updated. (In fact the qhsgd anchor right now doesn't point anywhere.)


Quasi Hyperbolic Momentum Update is an update policy for SGD where the Quasi Hyperbolic terms are added to the
parametrisation. Simply put QHM’s update rule is a weighted average of momentum’s and plain SGD’s
update rule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing for users since it's not described as an optimizer. I would suggest something like this:

Quasi-hyperbolic momentum update SGD (QHSGD) is an SGD-like optimizer with momentum where quasi-hyperbolic terms are added to the parametrization.  The update rule for this optimizer is a weighted average of momentum SGD and vanilla SGD.

different values of those terms can recover the following Adam Polices
QHAdam recovers Adam when ν1 = ν2 = 1
QHAdam recovers RMSProp when ν1 = 0 and ν2 = 1
QHAdam reovers NAdam when ν1 = β1 and ν2 = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do a little cleanup here too:

QHAdam is an optimizer that uses quasi-hyperbolic descent with the Adam optimizer.  This replaces the moment estimators of Adam with quasi-hyperbolic terms, and different values of the `v1` and `v2` parameters are equivalent to the following other optimizers:

 * When `v1 = v2 = 1`, `QHAdam` is equivalent to `Adam`.
 * When `v1 = 0` and `v2 = 1`, `QHAdam` is equivalent to `RMSProp`.
 * When `v1 = beta1` and `v2 = 1`, `QHAdam` is equivalent to `Nadam`.

I really like the reductions in the documentation.

#### See also:
* [Quasi-Hyperbolic Momentum and Adam For Deep Learning](https://arxiv.org/pdf/1810.06801.pdf)
* [SGD in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)
* [SGD](#standard-sgd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth also linking to Adam, Nadam, and RMSprop? (And vice versa from those?)

*
* QHAdam can optimize differentiable separable functions.
* For more details, see the documentation on function
* types included with this distribution or on the ensmallen website.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines look a little short here, they can probably be streamlined.

const bool resetPolicy = true);

/**
* Optimize the given function using QHAdam. The given starting point will be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spacing seems off on this comment, it should be of the form

/**
 * comment
 */

(note the alignment of the *)

// In case it hasn't been included yet.
#include "qhadam.hpp"

namespace ens {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing seems off here too.

/*
* Tests the Quasi Hyperbolic Momentum SGD update policy.
*/
TEST_CASE("QHSGDSpeedUpTestFunction", "[QHMomentumSGDTest]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name SpeedUp probably isn't meaningful here, just QHSGDTestFunction should be fine.

@mlpack-bot
Copy link

mlpack-bot bot commented May 12, 2019

Hello there! Thanks for your contribution. I see that this is your first contribution to mlpack. If you'd like to add your name to the list of contributors in src/mlpack/core.hpp and COPYRIGHT.txt and you haven't already, please feel free to push a change to this PR---or, if it gets merged before you can, feel free to open another PR.

In addition, if you'd like some stickers to put on your laptop, I'd be happy to help get them in the mail for you. Just send an email with your physical mailing address to stickers@mlpack.org, and then one of the mlpack maintainers will put some stickers in an envelope for you. It may take a few weeks to get them, depending on your location. 👍

1 similar comment
@mlpack-bot
Copy link

mlpack-bot bot commented May 12, 2019

Hello there! Thanks for your contribution. I see that this is your first contribution to mlpack. If you'd like to add your name to the list of contributors in src/mlpack/core.hpp and COPYRIGHT.txt and you haven't already, please feel free to push a change to this PR---or, if it gets merged before you can, feel free to open another PR.

In addition, if you'd like some stickers to put on your laptop, I'd be happy to help get them in the mail for you. Just send an email with your physical mailing address to stickers@mlpack.org, and then one of the mlpack maintainers will put some stickers in an envelope for you. It may take a few weeks to get them, depending on your location. 👍

@rcurtin
Copy link
Member

rcurtin commented May 12, 2019

Sorry for the erroneous mlpack-bot message :( (but if you want stickers, as always, we're happy to send them)

@rcurtin rcurtin merged commit 0f17a19 into mlpack:master May 15, 2019
@rcurtin
Copy link
Member

rcurtin commented May 15, 2019

@niteya-shah thanks for the contribution! I am releasing 1.15.0 with the new support now. I made some style fixes in 7e8108d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants