Logging RL results and tracking them with ModelCheckpoint(monitor=...) #4584

TrentBrick · 2020-11-09T01:59:49Z

I am using Pytorch Lightning in an RL setting and want to save a model when it hits a new max average reward. I am using the Tensorboard logger where I return my neural network loss in the training_step() using:

logs = {"policy_loss": pred_loss}
return {'loss':pred_loss, 'log':logs}

And then I am saving my RL environment rewards using in on_epoch_end():

self.logger.experiment.add_scalar("mean_reward", np.mean(reward_losses), self.global_step)
self.logger.experiment.add_scalars('rollout_stats', {"std_reward":np.std(reward_losses),
                "max_reward":np.max(reward_losses), "min_reward":np.min(reward_losses)}, self.global_step)

And every 5 epochs I am also writing out another RL reward loss where I use the best actions rather than sampling from them:

if self.current_epoch % self.hparams['eval_every']==0 and self.logger:
            output = self.collect_rollouts(greedy=True, num_episodes=self.hparams['eval_episodes'])
            reward_losses = output[0]
            self.logger.experiment.add_scalar("eval_mean", np.mean(reward_losses), self.global_step)

My question is, how can I set my ModelCheckpoint to monitor eval_mean (which is only written out every 5 epochs, this seems like it would be a problem)? I would also settle for monitoring mean_reward (written out every epoch)? Right now I can only successfully monitor policy_loss which does not always correspond to higher rewards obtained (setting monitor = to anything else throws an error).

I know that in the new PL version self.log() should be used but after re-writing my code using this it still didn't solve my issue.

I have spent a lot of time looking through the docs and for examples of this but I have found the logging docs on this to be quite sparse and difficult to even get everything to log in the first place.

I am using Pytorch Lightning 1.0.5 and Pytorch 1.7.0.

Thank you for any help/guidance.

The text was updated successfully, but these errors were encountered:

github-actions · 2020-11-09T02:00:30Z

Hi! thanks for your contribution!, great first issue!

awaelchli · 2020-11-09T16:45:12Z

I have multiple comments that I did not verify yet but they might help

If I'm not mistaken, self.log only works within a selection of hooks currently. I suggest you try to move the relevant code to training_epoch_end where self.log should work correctly.
set the monitor key in the ModelCheckpoint(monitor=) explicitly.
You have the problem that you can only update/log every n epochs: I see two solutions: 1) synchronize your ModelCheckpoint with the period parameter to only run on the epochs you update the monitor quantity. 2) Cache the last value and log it in the epochs between your regular interval, to make the ModelCheckpoint see it as unchanged. The second option may even be the default behavior by Lightning but need to verify.

So in summary, I imagine something like this:

# Model

def training_epoch_end(self, outputs):
    # ... compute reward losses
    
    if self.current_epoch % self.hparams['eval_every']==0:
        self.last_eval_mean = # compute the new eval mean

     self.log("eval_mean", self.last_eval_mean)


# Trainer
trainer = Trainer(callbacks=[ModelCheckpoint(monitor="eval_mean")]

# or maybe also try
trainer = Trainer(callbacks=[ModelCheckpoint(monitor="eval_mean", period=self.hparams['eval_every'])]

TrentBrick · 2020-11-09T18:16:27Z

Thanks for this all of this. It sounds like the fundamental problem may be that with my code I was not logging from training_epoch_end()? Because I was setting monitor= to explicitly track either eval_mean or mean_reward but they wouldn't be detected.

I will try this and let you know if it works.

NotNANtoN · 2020-11-22T12:11:40Z

I had a very similar issue: in my reinforcement learning framework I wanted to measure the validation performance of my agent. Of course I would do so without a validation_dataloader, hence I thought I could just set that dataloader to None and define a validation_step myself. Unfortunately, given a validation_dataloader that is None, validation_step and all validation methods are not called at all.
I tried to solve this via a callback to on_train_epoch_end or on_epoch_end. This worked, but in those callbacks the self.log() call does not work at all - most importantly, there is no feedback from pytorch_lightning that the call was unsuccesfull. Luckily enough, I found this thread here and moved my validation code into training_epoch_end, which works.

Maybe pytorch_lightning could at least give a warning once one tries to use self.log() in a place where it has no effect?

awaelchli · 2020-11-23T03:13:54Z

Regarding the self.log() from callbacks, @tchaton was working on this in #3813 and it should now be working.

TrentBrick added the question Further information is requested label Nov 9, 2020

justusschock self-assigned this Nov 9, 2020

edenlightning closed this as completed Feb 9, 2021

Lightning-AI locked and limited conversation to collaborators Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Logging RL results and tracking them with ModelCheckpoint(monitor=...) #4584

Logging RL results and tracking them with ModelCheckpoint(monitor=...) #4584

TrentBrick commented Nov 9, 2020 •

edited

Loading

github-actions bot commented Nov 9, 2020

awaelchli commented Nov 9, 2020

TrentBrick commented Nov 9, 2020

NotNANtoN commented Nov 22, 2020 •

edited

Loading

awaelchli commented Nov 23, 2020

This issue was moved to a discussion.

This issue was moved to a discussion.

Logging RL results and tracking them with ModelCheckpoint(monitor=...) #4584

Logging RL results and tracking them with ModelCheckpoint(monitor=...) #4584

Comments

TrentBrick commented Nov 9, 2020 • edited Loading

github-actions bot commented Nov 9, 2020

awaelchli commented Nov 9, 2020

TrentBrick commented Nov 9, 2020

NotNANtoN commented Nov 22, 2020 • edited Loading

awaelchli commented Nov 23, 2020

This issue was moved to a discussion.

TrentBrick commented Nov 9, 2020 •

edited

Loading

NotNANtoN commented Nov 22, 2020 •

edited

Loading