Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Translator with persistent cpu cache #1645

Merged
merged 2 commits into from
Mar 25, 2024

Conversation

anterart
Copy link
Contributor

@anterart anterart commented Mar 19, 2024

This change allows to create instance with persistent cpu cache improving performance of Translator.unload_model(to_cpu=True) method.
With this code change, in order to create such an instance one would need to call the constructor with persist_cpu_cache=True arg, the default value of this arg is False and it keeps the same behavior as before.
This is achieved by creating a clone of the models and placing them in CPU when the constructor or the Translator.load_model methods are called.
Note that when creating an instance of translator with persist_cpu_cache=True, there will be an additional overhead for creating the models clone and placing it in CPU memory.

This change optimizes the execution of the Translator.unload_model(to_cpu=True) model and is useful in use cases where the Translator is often unloaded and loaded.

Currently the execution of Translator.unload_model(to_cpu=True) of flant5 model takes ~5.5 seconds. With this improvement it takes just about ~0.3 seconds.

Allows to create instance with persistent cpu cache improving performance of Translator.unload_model(to_cpu=True) method.
Instead of adding a keep_cpu_cache to the Translator constructor, add the optional keep_cache arg to the Translator.load_model method
@anterart
Copy link
Contributor Author

anterart commented Mar 24, 2024

I updated the PR, now instead of adding an additional argument to the Translator constructor, I added an optional argument keep_cache with default value false to the method Translator.load_model.
Currently there are no tests for this additional functionality, but I did test it manually.
In case the change is welcomed, I can add tests.

@minhthuc2502 minhthuc2502 merged commit 5045b04 into OpenNMT:master Mar 25, 2024
17 checks passed
@NeonBohdan
Copy link

Can this feature be supported for decoder only models too?
Models swapping is very cool

Or them are too big and any way too long to load

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants