Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve early stopping verbose logging #6811

Merged
merged 20 commits into from
May 3, 2021

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Apr 3, 2021

What does this PR do?

Part of #6795
When the user sets verbose=True, show detailed information how early stopping tracks the metric over time.

Adds a message when metric improves by at least min delta. Looks like this:

Metric valid_loss improved by 0.076 >= min_delta = 0.075. New best score: 0.310

For multi-gpu, the message will show on all ranks. Looks like this:

[0] Metric valid_loss improved by 0.076 >= min_delta = 0.075. New best score: 0.310
[1] Metric valid_loss improved by 0.077 >= min_delta = 0.075. New best score: 0.309

User can log with sync_dist=True or torchmetrics, then the values are the same for all early stopping across gpus.

Also
Fixes #6113: make sure we don't report patience in terms of "epochs" because depending on the Trainer flags we may not check every epoch.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@awaelchli awaelchli added feature Is an improvement or enhancement callback labels Apr 3, 2021
@codecov
Copy link

codecov bot commented Apr 3, 2021

Codecov Report

Merging #6811 (7dd7405) into master (393b252) will decrease coverage by 4%.
The diff coverage is 93%.

@@           Coverage Diff           @@
##           master   #6811    +/-   ##
=======================================
- Coverage      91%     87%    -4%     
=======================================
  Files         200     200            
  Lines       12875   12887    +12     
=======================================
- Hits        11765   11231   -534     
- Misses       1110    1656   +546     

@awaelchli awaelchli added this to the 1.3 milestone Apr 5, 2021
@awaelchli awaelchli marked this pull request as ready for review April 5, 2021 09:28
msg = f"Metric {self.monitor} improved. New best score: {current:.3f}"
return msg

def _log_info(self, msg: str, rank: int) -> None:
Copy link
Contributor

@kaushikb11 kaushikb11 Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awaelchli What's the point of having the rank parameter? Maybe rank_zero_only=True parameter would make more sense? Or the rank_zero_info utility.

Copy link
Contributor Author

@awaelchli awaelchli Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends. we can log_info on each rank when verbose, like the way I have done it for debug a few lines below.
basically I implemented it like this so people can give feedback what they prefer. happy to go either way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to display on all ranks only if the current_value is different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's different when the user does not use sync_dist=True in logging, but we don't know that.
The best I could do is all_gather and then compare. Is it worth it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be an over-kill. I don't have strong opinion on either log_info or rank_zero_info.

@jlperla
Copy link

jlperla commented Apr 5, 2021

Insofar as this was triggered my my suggestion, let me give you the use-case I had in mind.
When I am running a whole bunch of experiments, in a single run, I would love to be able to see in the grid experiments view the reason for stopping occurring. With this early stopping changes discussed it could be: max_epochs hit, max_time hit, metric divergence, metric hit patience level, metric hit threshold. You would almost always want to see that sort of stuff, not just during a verbose flag, and since you would want it in a tight web view it would be nice to have it fairly succinct.

Basically I think this is almost a "return code" sort of situation? Maybe that is even a better way to do it for those guis. @williamFalcon basically in the GUI I would want to be able to sort on which of my hyperparameter options divergered, hit max_time, etc. and @luiscape this is also useful for the sorts of return codes from an experiment that we discussed for hyperprameter optimization. In the controller you might want to do different things based on them.

@tchaton
Copy link
Contributor

tchaton commented Apr 6, 2021

Insofar as this was triggered my my suggestion, let me give you the use-case I had in mind.
When I am running a whole bunch of experiments, in a single run, I would love to be able to see in the grid experiments view the reason for stopping occurring. With this early stopping changes discussed it could be: max_epochs hit, max_time hit, metric divergence, metric hit patience level, metric hit threshold. You would almost always want to see that sort of stuff, not just during a verbose flag, and since you would want it in a tight web view it would be nice to have it fairly succinct.

Basically I think this is almost a "return code" sort of situation? Maybe that is even a better way to do it for those guis. @williamFalcon basically in the GUI I would want to be able to sort on which of my hyperparameter options divergered, hit max_time, etc. and @luiscape this is also useful for the sorts of return codes from an experiment that we discussed for hyperprameter optimization. In the controller you might want to do different things based on them.

Hey @jlperla, sounds like a great idea. EarlyStopping should provide a summary to why it stops the run which can then be parsed. Maybe in a following PR @awaelchli ?

@awaelchli
Copy link
Contributor Author

I cannot comment on the Grid integration part at this point. In this PR I propose to add a report for why EarlyStopping is triggered. Currently, the only reason for stopping is that the monitored metric did not improve.

@jlperla
Copy link

jlperla commented Apr 6, 2021

I cannot comment on the Grid integration part at this point. In this PR I propose to add a report for why EarlyStopping is triggered. Currently, the only reason for stopping is that the monitored metric did not improve.
Hey @jlperla, sounds like a great idea. EarlyStopping should provide a summary to why it stops the run which can then be parsed. Maybe in a following PR @awaelchli ?

Hey, sounds good. Take a look at this: #6795 (comment) I just really think that the whole early stopping stuff should be thought of holistically.

To me, stopping at a threshold, hitting max time, hitting the maximum number of epochs, and having the optimizer stalling out before it can solve the model (i.e. the patience thing) are all related and something that should be part of the trainer options/optimizer options itself rather than as an add-on since they are so essential.

Honestly, given that I could only have one, I would much prefer a "return code" than any logging. That way I can do a whole bunch of runs, put those return codes in a csv table and start analyzing the proportion that succeeded (e.g. below the tolerance) etc.

The other thing is that this feature is useless if it only happens when verbose=True. I just want to know why it stopped, not get a whole bunch of other stuff in there

@jlperla
Copy link

jlperla commented Apr 6, 2021

Also, why would you only log the stopping reason for the early stopping callback rather than the max_epochs or max_time ones?

I really think that this smells like the whole user added early stopping callback is the wrong design. There is no reason in my mind why max_epochs and max_time belong in the trainer and the other stuff belongs in a separate callback. I think the intent was that it would be easy for users to create their own early stopping logic, but (1) it didn't work out that way because of the difficulty of stopping distributed processes; (2) as soon as you start having return codes/etc. for a controller, it shows these things are all linked; (3) I think that these features are a lot more consistent than it first appeared. Basically, add in the features of an optimizer.... and (4) if it is in a separate callback then users need to manually add in all of the CLI options and cannot use the pl.Trainer.from_argparse_args(args) to load from it. This last one is an issue for anyone paying for cloud services or GPUs (e.g. grid) since you want it to be as easy as possible for people to stop things or they would blow through their $$$ and be mad that they have little to show for it.

So my proposal: stop on this and the other issues that spawned out of #6795, and see what would be involved to add those things in as core trainer options. I can't imagine a scenario were you wouldn't want early stopping, checking of divergence, etc. especially if you are paying by the minute.

Users could still write callbacks if they wanted, but I am not sure it is that useful to make it first class. The way most optimizers work is that if a user wants to stop an optimization run they are also able to just return -Inf or Inf or whatever. That is good enough for most scenarios I can think of.

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@awaelchli awaelchli changed the title Early Stopping verbose logging [blocked by #6868] Early Stopping verbose logging Apr 8, 2021
@mergify mergify bot added the has conflicts label Apr 9, 2021
@awaelchli awaelchli changed the title [blocked by #6868] Early Stopping verbose logging Early Stopping verbose logging Apr 26, 2021
@mergify mergify bot removed the has conflicts label Apr 26, 2021
@awaelchli awaelchli marked this pull request as draft April 26, 2021 10:01
@awaelchli awaelchli changed the title Early Stopping verbose logging improve early stopping verbose logging Apr 27, 2021
@awaelchli awaelchli marked this pull request as ready for review April 27, 2021 17:12
@awaelchli awaelchli added the ready PRs ready to be merged label Apr 30, 2021
@mergify mergify bot removed the has conflicts label Apr 30, 2021
Copy link
Contributor

@kaushikb11 kaushikb11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@awaelchli awaelchli enabled auto-merge (squash) May 3, 2021 19:55
@awaelchli awaelchli merged commit bf1394a into master May 3, 2021
@awaelchli awaelchli deleted the feature/early-stopping-verbosity branch May 3, 2021 20:20
kaushikb11 pushed a commit to kaushikb11/pytorch-lightning that referenced this pull request May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
callback feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Training is cancelled when check_val_every_n_epoch > 1 and early stopping is used.
5 participants