Support Translator with persistent cpu cache #1645
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change allows to create instance with persistent cpu cache improving performance of
Translator.unload_model(to_cpu=True)
method.With this code change, in order to create such an instance one would need to call the constructor with
persist_cpu_cache=True
arg, the default value of this arg is False and it keeps the same behavior as before.This is achieved by creating a clone of the models and placing them in CPU when the constructor or the
Translator.load_model
methods are called.Note that when creating an instance of translator with
persist_cpu_cache=True
, there will be an additional overhead for creating the models clone and placing it in CPU memory.This change optimizes the execution of the
Translator.unload_model(to_cpu=True)
model and is useful in use cases where the Translator is often unloaded and loaded.Currently the execution of
Translator.unload_model(to_cpu=True)
of flant5 model takes ~5.5 seconds. With this improvement it takes just about ~0.3 seconds.