-
Notifications
You must be signed in to change notification settings - Fork 3.4k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging RL results and tracking them with ModelCheckpoint(monitor=...) #4584
Comments
Hi! thanks for your contribution!, great first issue! |
I have multiple comments that I did not verify yet but they might help
So in summary, I imagine something like this: # Model
def training_epoch_end(self, outputs):
# ... compute reward losses
if self.current_epoch % self.hparams['eval_every']==0:
self.last_eval_mean = # compute the new eval mean
self.log("eval_mean", self.last_eval_mean)
# Trainer
trainer = Trainer(callbacks=[ModelCheckpoint(monitor="eval_mean")]
# or maybe also try
trainer = Trainer(callbacks=[ModelCheckpoint(monitor="eval_mean", period=self.hparams['eval_every'])] |
Thanks for this all of this. It sounds like the fundamental problem may be that with my code I was not logging from I will try this and let you know if it works. |
I had a very similar issue: in my reinforcement learning framework I wanted to measure the validation performance of my agent. Of course I would do so without a Maybe pytorch_lightning could at least give a warning once one tries to use |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I am using Pytorch Lightning in an RL setting and want to save a model when it hits a new max average reward. I am using the Tensorboard logger where I return my neural network loss in the
training_step()
using:And then I am saving my RL environment rewards using in
on_epoch_end()
:And every 5 epochs I am also writing out another RL reward loss where I use the best actions rather than sampling from them:
My question is, how can I set my ModelCheckpoint to monitor
eval_mean
(which is only written out every 5 epochs, this seems like it would be a problem)? I would also settle for monitoringmean_reward
(written out every epoch)? Right now I can only successfully monitorpolicy_loss
which does not always correspond to higher rewards obtained (setting monitor = to anything else throws an error).I know that in the new PL version
self.log()
should be used but after re-writing my code using this it still didn't solve my issue.I have spent a lot of time looking through the docs and for examples of this but I have found the logging docs on this to be quite sparse and difficult to even get everything to log in the first place.
I am using Pytorch Lightning 1.0.5 and Pytorch 1.7.0.
Thank you for any help/guidance.
The text was updated successfully, but these errors were encountered: