-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nan metric breaking ModelCheckpoint #2636
Comments
Hi! thanks for your contribution!, great first issue! |
@ehsanmok mind send a PR? 🐰 |
Assign this to me please. |
So I can reproduce this issue like this. I am not exactly clear on what the expected behavior is though. In @awaelchli 's PR for nan detection and intervention, training is stopped when loss or weights contain What do we want to do:
|
🐛 Bug
Comparing any numbers to
float('nan')
isFalse
in Python so as a result if a non-loss metric score isnan
initially in training, then callback cannot checkpoint any scores after.Expected behavior
Ignore a
nan
metric score. This is orthogonal to when grad or weights becomenan
.Environment
conda
,pip
, source):pip
Additional context
Previous issue wasn't addressed completely #1008
The text was updated successfully, but these errors were encountered: