Add proximal adagrad optimizer #5128

sidgoyal78 · 2017-10-26T11:23:49Z

This closes #4687 by adding the implementation of the proximal adagrad optimizer.

The main idea is to modify the proximal gradient descent with adagrad learning rate scheme:

moment = moment + grad * grad
prox_param = param - learning_rate * grad * (1 / sqrt(moment))
param = sign(prox_param) / (1 + learning_rate * l2) * max { |prox_param| - learning_rate * l1 , 0 }

The paper that proposed Proximal GD: http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf
The paper with details of adagrad: http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

qingqing01

Excellent!

Add proximal adagrad optimizer

2c3e2bb

sidgoyal78 requested review from dzhwinter and qingqing01 October 26, 2017 11:24

qingqing01 approved these changes Oct 26, 2017

View reviewed changes

sidgoyal78 merged commit 66476fc into PaddlePaddle:develop Oct 26, 2017

sidgoyal78 deleted the proximal_adagrad branch November 16, 2017 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proximal adagrad optimizer #5128

Add proximal adagrad optimizer #5128

sidgoyal78 commented Oct 26, 2017

qingqing01 left a comment

Add proximal adagrad optimizer #5128

Add proximal adagrad optimizer #5128

Conversation

sidgoyal78 commented Oct 26, 2017

qingqing01 left a comment

Choose a reason for hiding this comment