-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implmentation of Quasi-Hyperbolic Momentum for Adam #81
Conversation
Empty File , will discuss with assignee about the PR
QHAdam updates and an overloaded constructor for QHadam
1) Added SGD Update QHSGD 2) fixed some Adam issues wrt QHAdam
@zoq can you please review my PR ? The Paper also speaks of QHAdamR and QHAdamW which I will implement after my other PR is implemented |
tests/adam_test.cpp
Outdated
/** | ||
* Tests the QHadam optimizer using a simple test function. | ||
*/ | ||
TEST_CASE("SimpleQHdamTestFunction", "[AdamTest]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Picky style comment, this should be SimpleQHAdamTestFunction
.
tests/adam_test.cpp
Outdated
/** | ||
* Run QHAdam on logistic regression and make sure the results are acceptable. | ||
*/ | ||
TEST_CASE("QHadamLogisticRegressionTest", "[AdamTest]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
/** | ||
* Construct the Quasi Hyperbolic update policy with the given parameters. | ||
* @param v The quasi hyperbolic term. | ||
* @param momentum The momentum ter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
term instead of ter.
* QHSGD). This allows this method to combine the features of many optimisers | ||
* and provide better optimisation control. | ||
* | ||
* TODO: Paper information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
|
||
/** | ||
* | ||
* TODO: Fill in these details as well on info about algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 👍
const double biasCorrection1 = 1.0 - std::pow(beta1, iteration); | ||
const double biasCorrection2 = 1.0 - std::pow(beta2, iteration); | ||
|
||
const auto mDash = m / biasCorrection1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you. mind to use double
here to be consistent with the rest of the codebase?
|
||
iterate -= stepSize * ((((1 - v1) * gradient) + v1 * mDash) / | ||
(arma::sqrt(((1 - v2) * (gradient % gradient)) + | ||
v2 * vDash) + epsilon )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the extra space at the end.
const auto mDash = m / biasCorrection1; | ||
const auto vDash = v / biasCorrection2; | ||
|
||
iterate -= stepSize * ((((1 - v1) * gradient) + v1 * mDash) / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind to add a comment here, "QHAdam recovers Adam when ν1=ν2= 1.".
// The number of iterations. | ||
double iteration; | ||
|
||
//The first quasi-hyperbolic term. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space right after //
. Same for v2
.
* | ||
* @param epsilon The epsilon value used to initialise the squared gradient | ||
* parameter. | ||
* @param beta1 The smoothing parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the order is wrong.
@zoq sorry for the TODO , I had forgotten that I had put them , I have made the changes. |
incorrect parameter type of double assigned to variable
@zoq can you review my PR? |
/** | ||
* @file adamw.hpp | ||
* @author Niteya Shah | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add add the license and a description.
* @author Niteya Shah | ||
* | ||
*/ | ||
#ifndef ENSMALLEN_ADAM_ADAMW_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not AdamW
* @param beta1 Exponential decay rate for the first moment estimates. | ||
* @param beta2 Exponential decay rate for the weighted infinity norm | ||
estimates. | ||
* @param eps Value used to initialise the mean squared gradient parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind to rename this one to epsilon
to match the rest of the code?
* @param resetPolicy If true, parameters are reset before every Optimize | ||
* call; otherwise, their values are retained. | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the extra line here.
double& V2() { return optimizer.UpdatePolicy().V2(); } | ||
|
||
private: | ||
//! The Stochastic Gradient Descent object with AdamW policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use QHAdam
here.
/** | ||
* QHAdam is a optimising strategy based on the Quasi-Hyperbolic step when | ||
* applied to the Adam Optimiser . QH updates can be considered to a weighted | ||
* average of the momentum . QHAdam , based on its paramterisation can recover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are some spaces we can remove.
/** | ||
* Construct the QHAdam update policy with the given parameters. | ||
* | ||
* @param v1 The first quasi-hyperbolic term. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter description order dosn't match with the one below.
public: | ||
/** | ||
* Construct the Quasi Hyperbolic update policy with the given parameters. | ||
* @param v The quasi hyperbolic term. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add another empty line between the method description and the beginning of the parameter description.
*/ | ||
QHUpdate(const double v = 0.7, | ||
const double momentum = 0.999) : | ||
momentum(momentum), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use 4 spaces instead of 2 here.
/** | ||
* @file quasi_hyperbolic_momentum_sgd_test.cpp | ||
* @author Niteya Shah | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
doc/optimizers.md
Outdated
|
||
#### See also: | ||
|
||
* [Quasi-Hyperbolic Momentom and Adam For Deep Learning](https://arxiv.org/pdf/1810.06801.pdf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "Momentum".
doc/optimizers.md
Outdated
*An optimizer for [differentiable separable functions](#differentiable-separable-functions).* | ||
|
||
QHAdam is an optimizer which implements the QHAdam Adam algorithm | ||
which uses Quasi-Hyperbolic Descent with the Adam Optimizer. This Method is the Adam Variant of the Quasi - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "This method is the Adam variant of the quasi-hyperbolic update for Adam".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, there are some minor style issue which I can fix during the merge process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niteya-shah nice contribution, sorry it took me so long to review this. There are some number of little style issues and documentation bits, but I think they can all be handled during merge. @zoq if you want, I can do the merge and style/doc fixes, just let me know. I saw that you had a couple changes you wanted to make too. In any case, if you merge it, we should add something to HISTORY.md
and then I'll run my release script. :)
doc/function_types.md
Outdated
@@ -508,6 +508,7 @@ The following optimizers can be used with differentiable functions: | |||
- [AdaDelta](#adadelta) | |||
- [AdaGrad](#adagrad) | |||
- [Adam](#adam) | |||
- [QHAdam](#qhadam) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor note, maybe we can address it during merge---all the other optimizers are arranged alphabetically, we should probably keep it like that.
doc/optimizers.md
Outdated
@@ -1394,6 +1394,114 @@ double Optimize(arma::mat& X); | |||
* [Semidefinite programming on Wikipedia](https://en.wikipedia.org/wiki/Semidefinite_programming) | |||
* [Semidefinite programs](#semidefinite-programs) (includes example usage of `PrimalDualSolver`) | |||
|
|||
## Quasi-Hyperbolic Momentum Update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called Quasi-Hyperbolic Momentum Update SGD (QHSGD)
? The anchors will need to be updated. (In fact the qhsgd
anchor right now doesn't point anywhere.)
doc/optimizers.md
Outdated
|
||
Quasi Hyperbolic Momentum Update is an update policy for SGD where the Quasi Hyperbolic terms are added to the | ||
parametrisation. Simply put QHM’s update rule is a weighted average of momentum’s and plain SGD’s | ||
update rule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing for users since it's not described as an optimizer. I would suggest something like this:
Quasi-hyperbolic momentum update SGD (QHSGD) is an SGD-like optimizer with momentum where quasi-hyperbolic terms are added to the parametrization. The update rule for this optimizer is a weighted average of momentum SGD and vanilla SGD.
doc/optimizers.md
Outdated
different values of those terms can recover the following Adam Polices | ||
QHAdam recovers Adam when ν1 = ν2 = 1 | ||
QHAdam recovers RMSProp when ν1 = 0 and ν2 = 1 | ||
QHAdam reovers NAdam when ν1 = β1 and ν2 = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do a little cleanup here too:
QHAdam is an optimizer that uses quasi-hyperbolic descent with the Adam optimizer. This replaces the moment estimators of Adam with quasi-hyperbolic terms, and different values of the `v1` and `v2` parameters are equivalent to the following other optimizers:
* When `v1 = v2 = 1`, `QHAdam` is equivalent to `Adam`.
* When `v1 = 0` and `v2 = 1`, `QHAdam` is equivalent to `RMSProp`.
* When `v1 = beta1` and `v2 = 1`, `QHAdam` is equivalent to `Nadam`.
I really like the reductions in the documentation.
#### See also: | ||
* [Quasi-Hyperbolic Momentum and Adam For Deep Learning](https://arxiv.org/pdf/1810.06801.pdf) | ||
* [SGD in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) | ||
* [SGD](#standard-sgd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth also linking to Adam, Nadam, and RMSprop? (And vice versa from those?)
* | ||
* QHAdam can optimize differentiable separable functions. | ||
* For more details, see the documentation on function | ||
* types included with this distribution or on the ensmallen website. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lines look a little short here, they can probably be streamlined.
const bool resetPolicy = true); | ||
|
||
/** | ||
* Optimize the given function using QHAdam. The given starting point will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spacing seems off on this comment, it should be of the form
/**
* comment
*/
(note the alignment of the *
)
// In case it hasn't been included yet. | ||
#include "qhadam.hpp" | ||
|
||
namespace ens { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spacing seems off here too.
/* | ||
* Tests the Quasi Hyperbolic Momentum SGD update policy. | ||
*/ | ||
TEST_CASE("QHSGDSpeedUpTestFunction", "[QHMomentumSGDTest]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name SpeedUp
probably isn't meaningful here, just QHSGDTestFunction
should be fine.
Hello there! Thanks for your contribution. I see that this is your first contribution to mlpack. If you'd like to add your name to the list of contributors in In addition, if you'd like some stickers to put on your laptop, I'd be happy to help get them in the mail for you. Just send an email with your physical mailing address to stickers@mlpack.org, and then one of the mlpack maintainers will put some stickers in an envelope for you. It may take a few weeks to get them, depending on your location. 👍 |
1 similar comment
Hello there! Thanks for your contribution. I see that this is your first contribution to mlpack. If you'd like to add your name to the list of contributors in In addition, if you'd like some stickers to put on your laptop, I'd be happy to help get them in the mail for you. Just send an email with your physical mailing address to stickers@mlpack.org, and then one of the mlpack maintainers will put some stickers in an envelope for you. It may take a few weeks to get them, depending on your location. 👍 |
Sorry for the erroneous mlpack-bot message :( (but if you want stickers, as always, we're happy to send them) |
@niteya-shah thanks for the contribution! I am releasing 1.15.0 with the new support now. I made some style fixes in 7e8108d. |
Hello , This PR implements the update rule of QHAdam as discussed in Paper .https://arxiv.org/pdf/1810.06801v3.pdf
This PR should be straight forward as it is a simple update rule change like Nadam .
TODO list