-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
try loading as TF SavedModel instead of HDF5 #91
Conversation
@vahidrezanezhad EDIT: sorry, that was a bullshit question |
Ah, here's the reason:
So I must do the model conversion on Python <= 3.7 but TF >= 2.5 ... |
Ok, it does work. I can also update ocrd-tool.json here to include the changed resource URL. But first I need some place to upload it permanently. How about qurator-data.de? (@kba knows where to find the new archive...) |
BTW, CI failure is because there is no TF2 for Py 3.6 anymore. Config in GH Actions should be changed (back) to 3.7 or 3.8. |
This is what I tried in #90. |
@kba asked me to upload the updated model, but 1. I'd prefer @vahidrezanezhad to do it 2. that he produces the updated models himself. Is this just about - skims the issue comments - being able to load the files with a newer Python/TF version? Then I could maybe be persuaded to do it. Not trying to be a pedant, it's just professional diligence before I upload some variant of some model I don't fully understand myself. |
(Alternatively, I offered @kba access to the relevant resources (our internal git annex + qurator-data.de), then the diligence is up to him.) |
Skimming the PR, it's not only about loading the model, but some other stuff as well? (Cleaning up and a change of a larger code part) |
Alright, I had a look at the individual commits. a. I would upload the data if necessary, |
Initially, yes. But then it turned out there are more incompatibilities (newer Numpy). In order to be able to do the Numpy fixes, I had to understand some of the code, which in turn led to the cosmetic simplifications. The last commit returned to TF2 migration – there is no session management anymore, and no benefit in keeping this IMO. Especially reloading the models for every page seemed unnecessarily inefficient. (And yes, I did test, of course.)
Sure – I would also recommend to first allow @vahidrezanezhad to take a look, even if the changes are more or less cosmetic. Regarding model upload: if you could do that first, then I could update this PR accordingly – adding the new URL to the local ocrd-tool.json, so it would immediately become available as the current model location (independent of old URLs registered in core). (I really just loaded as HDF5 and stored as SavedFormat, nothing fancy.) |
;-) As I will only be uploading the files, that's not my concern. (I would try to test myself if it was - had the tiniest change break something unexpected too often.)
Yeah, I think @kba will give me the files and I'll upload them (or I give him access.) |
He's on vacation btw. |
It would be great if we can put the TF saved model versions of the models on qurator-data already (and possibly HF later as well), for testing. About the other changes and the PR, I would like to have some discussion to see if we can review/merge while Vahid is away. So
Yes please.
To be discussed. |
Still waiting for the data :) @kba, @bertsky?
Yes, that's 100% "in meinem Sinne"/my opinion. I'd rather be not involved, as I am currently not familiar with the code base. |
|
https://qurator-data.de/eynollah/2021-04-25/SavedModel.tar.gz I've created a short README, to document how this file was created, see https://qurator-data.de/eynollah/2021-04-25/. @bertsky Could you post a short code snippet here, how you converted this? Could be useful in the future. @joergleh Relevant to your efforts, probably. |
Many thanks @mikegerber! @bertsky This should be the location for the model(s), rather than https://ocr-d.kba.cloud/2021-04-25.SavedModel.tar.gz, which was just for temporary file transfer, so 00f431a would need to be adapted again. |
@mikegerber it boils down to: import os
from tensorflow.keras.models import load_model
os.chdir('default')
for path in os.listdir():
if not path.endswith('.h5'):
continue
model = load_model(path, compile=False)
model.save(path[:-3], save_format="tf") |
@cneud done! |
Wait! I just found out my "cosmetic" changes are not so cosmetic after all … investigating. |
@bertsky Sorry for the laughing emoji :) Couldn't resist! |
That's just the model name (i.e. last directory inside the archive = directory to unpack). Put your absolute or relative path there.
No worries. I shouldn't have been so bold to begin with! |
I just realized that the conversion .h5 -> TF SavedModel was done using the models from https://qurator-data.de/eynollah/2021-04-25/ where rather the models from https://qurator-data.de/eynollah/2022-04-05/ should have been used. Other than having more comprehensible model names, the |
you should know better than me, but as of Vahid's earlier response I think this only belongs to the (now public) I can convert these models as well, if you like, but this must be done similarly against that branch. |
I see (the mess...). But since that |
c76027b
to
875e4fe
Compare
I removed these parts, rebased and force-pushed. (Sorry for the noise!) |
I have converted these models as well and placed them in the same location for @kba to upload. I suggest merging master into |
Available from https://ocr-d.kba.cloud/2022-04-05.SavedModel.tar.gz |
I refuse to upload 2022-04-05.SavedModel.tar.gz as the original is not fully in our internal data repository, sorry. This must be corrected by Vahid first. Here, I am pedantic because it is not the first time this kind of situation comes up and while I am happy to support, I don't want to clean up all the time. |
Take your time! This PR can wait, and even more so any further action on eynollah_light. (I'll shut up now.) |
I appreciate the boldness, I just think you should have been bold in another PR ;) I know it's more work but for me, while reviewing a PR like this (even if it's just to understand the upload situation), it's more work if the PR contains more than just the changes necessary. However, this is not my project, so it's not up to me, actually. |
Side note: This PR seems to fix the CircleCI tests, it that correct? Great stuff if this is true! |
No. This PR only fixes Python 3.8 onwards. The CI check comes with @cneud's config changes (throwing out broken 3.6, activating 3.7 onwards). |
Note, since we have moved towards Python 3.8 as base version, this is becoming rather urgent. See ocrd_all – where I am now considering switching to this PR version. |
In trying to solve #87, I have converted the models and changed the loader by trying to omit the .h5 suffix.
Unfortunately, I now get a new error:
Given that this seems to be the output layer, I am at a loss here. Creating as draft only.
Ideas anyone?