Support Translator with persistent cpu cache #1645

anterart · 2024-03-19T20:07:12Z

This change allows to create instance with persistent cpu cache improving performance of Translator.unload_model(to_cpu=True) method.
With this code change, in order to create such an instance one would need to call the constructor with persist_cpu_cache=True arg, the default value of this arg is False and it keeps the same behavior as before.
This is achieved by creating a clone of the models and placing them in CPU when the constructor or the Translator.load_model methods are called.
Note that when creating an instance of translator with persist_cpu_cache=True, there will be an additional overhead for creating the models clone and placing it in CPU memory.

This change optimizes the execution of the Translator.unload_model(to_cpu=True) model and is useful in use cases where the Translator is often unloaded and loaded.

Currently the execution of Translator.unload_model(to_cpu=True) of flant5 model takes ~5.5 seconds. With this improvement it takes just about ~0.3 seconds.

Allows to create instance with persistent cpu cache improving performance of Translator.unload_model(to_cpu=True) method.

Instead of adding a keep_cpu_cache to the Translator constructor, add the optional keep_cache arg to the Translator.load_model method

anterart · 2024-03-24T13:00:26Z

I updated the PR, now instead of adding an additional argument to the Translator constructor, I added an optional argument keep_cache with default value false to the method Translator.load_model.
Currently there are no tests for this additional functionality, but I did test it manually.
In case the change is welcomed, I can add tests.

python/cpp/translator.cc

NeonBohdan · 2024-04-11T21:20:14Z

Can this feature be supported for decoder only models too?
Models swapping is very cool

Or them are too big and any way too long to load

Support translator with persistent cpu cache

f95ee0b

Allows to create instance with persistent cpu cache improving performance of Translator.unload_model(to_cpu=True) method.

anterart mentioned this pull request Mar 19, 2024

Translator.unload_model(to_cpu=True) takes long time #1642

Closed

Add keep_cache arg to Transloator.load_model method

0bd630e

Instead of adding a keep_cpu_cache to the Translator constructor, add the optional keep_cache arg to the Translator.load_model method

minhthuc2502 reviewed Mar 25, 2024

View reviewed changes

python/cpp/translator.cc Show resolved Hide resolved

minhthuc2502 merged commit 5045b04 into OpenNMT:master Mar 25, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Translator with persistent cpu cache #1645

Support Translator with persistent cpu cache #1645

anterart commented Mar 19, 2024 •

edited

Loading

anterart commented Mar 24, 2024 •

edited

Loading

NeonBohdan commented Apr 11, 2024

Support Translator with persistent cpu cache #1645

Support Translator with persistent cpu cache #1645

Conversation

anterart commented Mar 19, 2024 • edited Loading

anterart commented Mar 24, 2024 • edited Loading

NeonBohdan commented Apr 11, 2024

anterart commented Mar 19, 2024 •

edited

Loading

anterart commented Mar 24, 2024 •

edited

Loading