-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FTML optimizer #48
FTML optimizer #48
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. The paper was good; I had not previously known of the connection of Adam to FTL-type techniques.
I think we should add documentation before merge, but I am writing that in all the PRs so maybe I sound like a broken record now. :)
* @endcode | ||
* | ||
* For FTML to work, a DecomposableFunctionType template parameter is | ||
* required. This class must implement the following function: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this documentation across a lot of optimizers, but the addition of function_types.md
probably means we can reduce and centralize a lot of it. Do you think that we could replace this with something like... FTML requires a separable differentiable function to optimize (see <url>).
The only problem with that is that it's not clear what URL to use there. I suppose ideally we'd like to point a user at the website documentation, but, the URL there could change. So I am not sure what the best choice is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually for this one I wonder if it would be best to open another issue instead of handling it in this PR.
include/ensmallen_bits/ftml/ftml.hpp
Outdated
const size_t batchSize = 32, | ||
const double beta1 = 0.9, | ||
const double beta2 = 0.999, | ||
const double eps = 1e-8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like for Padam, do you think we should adjust this documentation to point out that it's to avoid divisions by zero? (Or something like that.)
Also for consistency here I guess we should call it eps
since there is a method Epsilon()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. Feel free to merge whenever you are happy with the code. If you want to add something to HISTORY.txt
go ahead, otherwise I will do as part of the next release process.
I'll update |
Implementation of "Follow the Moving Leader in Deep Learning" by Shuai Zheng and James T. Kwok.