-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training will continue... but it does not #5717
Comments
Hi! It would be great if you could provide a reproduction script. You can use the following collab link with the BoringModel and post it here |
Please feel free to reopen with a reproducible example! |
Hi ! (It's my very first time that I contribute to an issue on Github, I hope I have followed rules correctly) I run into the same issue. I tried to illustrate what append with the BoringModel mentionned above. Using a random dataset :
And adding a dummy (constant) metric on_validation_step to "early stop on"
We can observe the fact that model stop training by seeing that epoch speed growing insanely fast (as shown is the picture). Moreover, for some reasons, tqdm shows that model stop in middle of last epoch (step 340/626).
I hope that could help to solve this issue. |
Did this by any chance fix it for you? #6705 |
@awaelchli I tested on the boring model and on my personnal application and it seems that this issue is fixed perfectly. (both computation time and logs confirm that model is training after the "stop" signal was raised). Thanks ! |
Happy to hear that. Thanks for confirming. |
❓ Questions and Help
Related to #2644
I tried to set
min_steps
so that the model will continue training after the warmup + patience. Unfortunately, it does not appear to do that.I see a bunch of log messages like this,
I can see this behavior in the CSV logs as well. Warm up happens for the 5 epochs. After that point it runs one step per epoch.
Code
What's your environment?
The text was updated successfully, but these errors were encountered: