You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
paper says:where φ are the parameters of the value network, θ the parameters of the policy network, and η and α
are Lagrange multipliers. In practice, the policy and value networks share most of their parameters in
the form of a shared convolutional network (a ResNet) and recurrent LSTM core, and are optimized
together (Fig. 5b in the Appendix) (Mnih et al., 2016). We note, however, that the value network
parameters φ are considered fixed for the policy improvement loss, and gradients are not propagated
The text was updated successfully, but these errors were encountered:
paper says:where φ are the parameters of the value network, θ the parameters of the policy network, and η and α
are Lagrange multipliers. In practice, the policy and value networks share most of their parameters in
the form of a shared convolutional network (a ResNet) and recurrent LSTM core, and are optimized
together (Fig. 5b in the Appendix) (Mnih et al., 2016). We note, however, that the value network
parameters φ are considered fixed for the policy improvement loss, and gradients are not propagated
The text was updated successfully, but these errors were encountered: