Fix `save_model` and `load_model` #19924

james77777778 · 2024-06-26T14:50:58Z

Should fix #19921

The root cause is that #19852 will incorrectly duplicate *.weights.h5 in save_model.
And we should consider falling back to the original behavior if the system is read-only in load_model.

Sorry for the inconvenience!

fchollet

LGTM, thanks for the fix.

codecov-commenter · 2024-06-26T14:57:34Z

Codecov Report

Attention: Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.

Project coverage is 67.21%. Comparing base (558d38c) to head (80eccac).
Report is 2 commits behind head on master.

Files	Patch %	Lines
keras/src/saving/saving_lib.py	73.91%	3 Missing and 3 partials ⚠️

❗ There is a different number of reports uploaded between BASE (558d38c) and HEAD (80eccac). Click for more details.

HEAD has 2 uploads less than BASE
| Flag | BASE (558d38c) | HEAD (80eccac) | |------|------|------| |keras|4|2|

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #19924       +/-   ##
===========================================
- Coverage   79.02%   67.21%   -11.81%     
===========================================
  Files         499      499               
  Lines       46436    46448       +12     
  Branches     8548     8550        +2     
===========================================
- Hits        36695    31222     -5473     
- Misses       8015    13567     +5552     
+ Partials     1726     1659       -67

Flag	Coverage Δ
keras	`67.14% <73.91%> (-11.74%)`	⬇️
keras-jax	`?`
keras-numpy	`57.25% <69.56%> (+0.02%)`	⬆️
keras-tensorflow	`?`
keras-torch	`62.40% <73.91%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fchollet · 2024-06-26T15:02:57Z

What test can we add to prevent a similar issue in the future?

james77777778 · 2024-06-26T15:06:26Z

What test can we add to prevent a similar issue in the future?

The reduant *.weights.h5 detection has been added in keras/src/saving/saving_lib_test.py
However, I don't have a good idea to detect OSError (for read-only system).

james77777778 · 2024-06-26T15:14:01Z

I will try to add the tests for OSError tomorrow. (using os.chmod?!)

WeichenXu123 · 2024-06-27T02:11:44Z

Hi @james77777778 Could you check #19921 (comment) ? seemingly the fix doesn't work.

james77777778 · 2024-06-27T02:27:40Z

Hi @james77777778 Could you check #19921 (comment) ? seemingly the fix doesn't work.

Hey @WeichenXu123
I have proposed a fix for this in #19927

The issue may stem from a race condition in load_model.
Is your CI running load_model in multiple processes within a short period of time?

WeichenXu123 · 2024-06-27T04:57:23Z

Hi @james77777778 Could you check #19921 (comment) ? seemingly the fix doesn't work.

Hey @WeichenXu123 I have proposed a fix for this in #19927

The issue may stem from a race condition in load_model. Is your CI running load_model in multiple processes within a short period of time?

Yes. It only occurs when multiple processes concurrently loading the model

james77777778 · 2024-06-27T05:43:22Z

Yes. It only occurs when multiple processes concurrently loading the model

Thanks for the information. I have added a test to verify the concurrently loading in #19927 .

Fix save_model and load_model

80eccac

google-ml-butler bot added the size:S label Jun 26, 2024

google-ml-butler bot assigned gbaned Jun 26, 2024

fchollet approved these changes Jun 26, 2024

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Jun 26, 2024

kokoro-team removed the kokoro:force-run label Jun 26, 2024

fchollet merged commit f5e90a2 into keras-team:master Jun 26, 2024
7 of 10 checks passed

google-ml-butler bot removed the ready to pull Ready to be merged into the codebase label Jun 26, 2024

sachinprasadhs mentioned this pull request Jun 26, 2024

Loading model error 'No such file or directory: 'model.weights.h5' tensorflow/tensorflow#70273

Closed

james77777778 deleted the fix-saving branch June 27, 2024 00:41

james77777778 mentioned this pull request Jun 27, 2024

Fix GPU CI and improve save_model and load_model #19927

Merged

hadifawaz1999 mentioned this pull request Jul 18, 2024

[MNT] Remove keras bound aeon-toolkit/aeon#1816

Merged

Grvzard mentioned this pull request Aug 10, 2024

Bug in keras.src.saving.saving_lib._save_model_to_dir #20108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `save_model` and `load_model` #19924

Fix `save_model` and `load_model` #19924

james77777778 commented Jun 26, 2024

fchollet left a comment

codecov-commenter commented Jun 26, 2024 •

edited

Loading

fchollet commented Jun 26, 2024

james77777778 commented Jun 26, 2024

james77777778 commented Jun 26, 2024

WeichenXu123 commented Jun 27, 2024

james77777778 commented Jun 27, 2024 •

edited

Loading

WeichenXu123 commented Jun 27, 2024

james77777778 commented Jun 27, 2024

Fix save_model and load_model #19924

Fix save_model and load_model #19924

Conversation

james77777778 commented Jun 26, 2024

fchollet left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jun 26, 2024 • edited Loading

Codecov Report

fchollet commented Jun 26, 2024

james77777778 commented Jun 26, 2024

james77777778 commented Jun 26, 2024

WeichenXu123 commented Jun 27, 2024

james77777778 commented Jun 27, 2024 • edited Loading

WeichenXu123 commented Jun 27, 2024

james77777778 commented Jun 27, 2024

Fix `save_model` and `load_model` #19924

Fix `save_model` and `load_model` #19924

codecov-commenter commented Jun 26, 2024 •

edited

Loading

james77777778 commented Jun 27, 2024 •

edited

Loading