about grad freezing #3

scirocc · 2023-03-17T05:53:58Z

paper says:where φ are the parameters of the value network, θ the parameters of the policy network, and η and α
are Lagrange multipliers. In practice, the policy and value networks share most of their parameters in
the form of a shared convolutional network (a ResNet) and recurrent LSTM core, and are optimized
together (Fig. 5b in the Appendix) (Mnih et al., 2016). We note, however, that the value network
parameters φ are considered fixed for the policy improvement loss, and gradients are not propagated

scirocc · 2023-03-17T05:56:04Z

so maybe like this?
`self.optim_4_pi = torch.optim.AdamW([
{'params': self.eta, 'lr': self.lr},
{'params': self.alpha_mean, 'lr': self.lr},
{'params': self.alpha_cov, 'lr': self.lr},
{'params': self.actor.nn_avg.parameters(), 'lr': self.lr},
{'params': self.actor.diag_cholesky_factor, 'lr': self.lr},
])
self.optim_4_phi = torch.optim.AdamW([
{'params': self.shared_net.parameters(), 'lr': self.lr},
{'params': self.critic.net.parameters(), 'lr': self.lr},
])
loss.backward()
self.optim_4_pi.step()
self.optim_4_pi.zero_grad()
self.optim_4_phi.step()
self.optim_4_phi.zero_grad()

`
?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about grad freezing #3

about grad freezing #3

scirocc commented Mar 17, 2023

scirocc commented Mar 17, 2023

about grad freezing #3

about grad freezing #3

Comments

scirocc commented Mar 17, 2023

scirocc commented Mar 17, 2023