Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix save_model and load_model #19924

Merged
merged 1 commit into from
Jun 26, 2024
Merged

Conversation

james77777778
Copy link
Contributor

Should fix #19921

The root cause is that #19852 will incorrectly duplicate *.weights.h5 in save_model.
And we should consider falling back to the original behavior if the system is read-only in load_model.

Sorry for the inconvenience!

Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix.

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Jun 26, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.

Project coverage is 67.21%. Comparing base (558d38c) to head (80eccac).
Report is 2 commits behind head on master.

Files Patch % Lines
keras/src/saving/saving_lib.py 73.91% 3 Missing and 3 partials ⚠️

❗ There is a different number of reports uploaded between BASE (558d38c) and HEAD (80eccac). Click for more details.

HEAD has 2 uploads less than BASE | Flag | BASE (558d38c) | HEAD (80eccac) | |------|------|------| |keras|4|2|
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #19924       +/-   ##
===========================================
- Coverage   79.02%   67.21%   -11.81%     
===========================================
  Files         499      499               
  Lines       46436    46448       +12     
  Branches     8548     8550        +2     
===========================================
- Hits        36695    31222     -5473     
- Misses       8015    13567     +5552     
+ Partials     1726     1659       -67     
Flag Coverage Δ
keras 67.14% <73.91%> (-11.74%) ⬇️
keras-jax ?
keras-numpy 57.25% <69.56%> (+0.02%) ⬆️
keras-tensorflow ?
keras-torch 62.40% <73.91%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fchollet
Copy link
Collaborator

What test can we add to prevent a similar issue in the future?

@fchollet fchollet merged commit f5e90a2 into keras-team:master Jun 26, 2024
7 of 10 checks passed
@google-ml-butler google-ml-butler bot removed the ready to pull Ready to be merged into the codebase label Jun 26, 2024
@james77777778
Copy link
Contributor Author

What test can we add to prevent a similar issue in the future?

The reduant *.weights.h5 detection has been added in keras/src/saving/saving_lib_test.py
However, I don't have a good idea to detect OSError (for read-only system).

@james77777778
Copy link
Contributor Author

I will try to add the tests for OSError tomorrow. (using os.chmod?!)

@WeichenXu123
Copy link

Hi @james77777778 Could you check #19921 (comment) ? seemingly the fix doesn't work.

@james77777778
Copy link
Contributor Author

james77777778 commented Jun 27, 2024

Hi @james77777778 Could you check #19921 (comment) ? seemingly the fix doesn't work.

Hey @WeichenXu123
I have proposed a fix for this in #19927

The issue may stem from a race condition in load_model.
Is your CI running load_model in multiple processes within a short period of time?

@WeichenXu123
Copy link

Hi @james77777778 Could you check #19921 (comment) ? seemingly the fix doesn't work.

Hey @WeichenXu123 I have proposed a fix for this in #19927

The issue may stem from a race condition in load_model. Is your CI running load_model in multiple processes within a short period of time?

Yes. It only occurs when multiple processes concurrently loading the model

@james77777778
Copy link
Contributor Author

Yes. It only occurs when multiple processes concurrently loading the model

Thanks for the information. I have added a test to verify the concurrently loading in #19927 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Assigned Reviewer
Development

Successfully merging this pull request may close these issues.

Bug in Keras 3.4.0: Loading model error 'No such file or directory: 'model.weights.h5'
6 participants