Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging RL results and tracking them with ModelCheckpoint(monitor=...) #4584

Closed
TrentBrick opened this issue Nov 9, 2020 · 5 comments
Closed
Assignees
Labels
question Further information is requested

Comments

@TrentBrick
Copy link

TrentBrick commented Nov 9, 2020

I am using Pytorch Lightning in an RL setting and want to save a model when it hits a new max average reward. I am using the Tensorboard logger where I return my neural network loss in the training_step() using:

logs = {"policy_loss": pred_loss}
return {'loss':pred_loss, 'log':logs}

And then I am saving my RL environment rewards using in on_epoch_end():

self.logger.experiment.add_scalar("mean_reward", np.mean(reward_losses), self.global_step)
self.logger.experiment.add_scalars('rollout_stats', {"std_reward":np.std(reward_losses),
                "max_reward":np.max(reward_losses), "min_reward":np.min(reward_losses)}, self.global_step)

And every 5 epochs I am also writing out another RL reward loss where I use the best actions rather than sampling from them:

if self.current_epoch % self.hparams['eval_every']==0 and self.logger:
            output = self.collect_rollouts(greedy=True, num_episodes=self.hparams['eval_episodes'])
            reward_losses = output[0]
            self.logger.experiment.add_scalar("eval_mean", np.mean(reward_losses), self.global_step)

My question is, how can I set my ModelCheckpoint to monitor eval_mean (which is only written out every 5 epochs, this seems like it would be a problem)? I would also settle for monitoring mean_reward (written out every epoch)? Right now I can only successfully monitor policy_loss which does not always correspond to higher rewards obtained (setting monitor = to anything else throws an error).

I know that in the new PL version self.log() should be used but after re-writing my code using this it still didn't solve my issue.

I have spent a lot of time looking through the docs and for examples of this but I have found the logging docs on this to be quite sparse and difficult to even get everything to log in the first place.

I am using Pytorch Lightning 1.0.5 and Pytorch 1.7.0.

Thank you for any help/guidance.

@TrentBrick TrentBrick added the question Further information is requested label Nov 9, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2020

Hi! thanks for your contribution!, great first issue!

@justusschock justusschock self-assigned this Nov 9, 2020
@awaelchli
Copy link
Contributor

I have multiple comments that I did not verify yet but they might help

  • If I'm not mistaken, self.log only works within a selection of hooks currently. I suggest you try to move the relevant code to training_epoch_end where self.log should work correctly.
  • set the monitor key in the ModelCheckpoint(monitor=) explicitly.
  • You have the problem that you can only update/log every n epochs: I see two solutions: 1) synchronize your ModelCheckpoint with the period parameter to only run on the epochs you update the monitor quantity. 2) Cache the last value and log it in the epochs between your regular interval, to make the ModelCheckpoint see it as unchanged. The second option may even be the default behavior by Lightning but need to verify.

So in summary, I imagine something like this:

# Model

def training_epoch_end(self, outputs):
    # ... compute reward losses
    
    if self.current_epoch % self.hparams['eval_every']==0:
        self.last_eval_mean = # compute the new eval mean

     self.log("eval_mean", self.last_eval_mean)


# Trainer
trainer = Trainer(callbacks=[ModelCheckpoint(monitor="eval_mean")]

# or maybe also try
trainer = Trainer(callbacks=[ModelCheckpoint(monitor="eval_mean", period=self.hparams['eval_every'])]

@TrentBrick
Copy link
Author

Thanks for this all of this. It sounds like the fundamental problem may be that with my code I was not logging from training_epoch_end()? Because I was setting monitor= to explicitly track either eval_mean or mean_reward but they wouldn't be detected.

I will try this and let you know if it works.

@NotNANtoN
Copy link

NotNANtoN commented Nov 22, 2020

I had a very similar issue: in my reinforcement learning framework I wanted to measure the validation performance of my agent. Of course I would do so without a validation_dataloader, hence I thought I could just set that dataloader to None and define a validation_step myself. Unfortunately, given a validation_dataloader that is None, validation_step and all validation methods are not called at all.
I tried to solve this via a callback to on_train_epoch_end or on_epoch_end. This worked, but in those callbacks the self.log() call does not work at all - most importantly, there is no feedback from pytorch_lightning that the call was unsuccesfull. Luckily enough, I found this thread here and moved my validation code into training_epoch_end, which works.

Maybe pytorch_lightning could at least give a warning once one tries to use self.log() in a place where it has no effect?

@awaelchli
Copy link
Contributor

Regarding the self.log() from callbacks, @tchaton was working on this in #3813 and it should now be working.

@Lightning-AI Lightning-AI locked and limited conversation to collaborators Feb 9, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants