-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adadelta #1122
Adadelta #1122
Conversation
Good start! Comment once you've investigated the initial issues for review. On Saturday, September 20, 2014, Mohamed Omran notifications@github.com
Evan Shelhamer |
Hello, I'm currently using this PR in my project (https://github.com/muupan/dqn-in-the-caffe). I think allowing base_lr and lr_policy will be helpful in case AdaDelta does not coverge. In my case, using original AdaDelta caused divergence, so I multiplied the With respect to the slow tests, I wonder why kNumIters for AdaDeltaSolver are very large (=500), https://github.com/mohomran/caffe/blob/adadelta/src/caffe/test/test_gradient_based_solver.cpp#L566 while those for other solvers are small (=4). https://github.com/mohomran/caffe/blob/adadelta/src/caffe/test/test_gradient_based_solver.cpp#L390 Are these 500 iterations necessary? |
Thank you for the feedback! |
Initial implementation of the Adadelta solver as proposed in "ADADELTA: An Adaptive Learning Rate Method" (Zeiler, 2012). Motivation: http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html
Performance on the MNIST autoencoder demo is more or less on par with standard SGD+momentum but not as good as the Nesterov solver. The lack of a learning rate does seem to be a problem towards later iterations in that loss/accuracy don't entirely converge, but this could be due to an implementation issue.
(for comparison see: #741 (comment))
A couple of things to note: