local models #233

FelipeAdachi · 2024-02-13T20:09:12Z

Langkit downloads models from HF Hub, which will hit an error in network-restricted environments. Even though modules such as toxicity,themes, and input_output allow passing the path of local model, the auto-initialization on import will hit an error before we're able to pass the local path in the config.

To deal with this, this change lazy initializes the models so it will fetch them when it's actually needed, when the udf is called. When using local model, one can do as below:

from langkit import LangKitConfig

local_config = LangKitConfig(toxicity_model_path="local-toxicity-model",
              transformer_name="local-sentence-transformers")

toxicity.init(config=local_config)
themes.init(config=local_config)
input_output.init(config=local_config)

provided the supported models were already downloaded and stored in local-toxicity-model and local-sentence-transformers. For reference, these are the HF models that are currently being used in Langkit:

martin-ha/toxic-comment-model

sentence-transformers/all-MiniLM-L6-v2

langkit/examples/Local_Models.ipynb

Co-authored-by: richard-rogers <93153899+richard-rogers@users.noreply.github.com>

jamie256

Left some preliminary comments.

langkit/utils.py

jamie256 · 2024-03-05T17:52:54Z

langkit/toxicity.py

+    if _model_path is None:
+        raise ValueError("Must initialize model path before calling toxicity!")
+    _model.value
+    _toxicity_tokenizer.value
+    _toxicity_pipeline.value


Curios if this is noticeably slower or not?

I did a very simple test, iterating on 5k samples (single row extraction), and init_model() takes, on average 0.0048 ms (4.8e-6 seconds) - ignoring the first call, which will actually load the model artifacts

update: on recent changes, the lazy initialization is now done for each row, even if it's a batch - it was changed to concile with the detoxify addition in another PR. If the added latency is prohibitive, we need to rethink the design proposed in this PR

langkit/transformer.py

# Conflicts: # langkit/docs/modules.md # langkit/toxicity.py

jamie256

LGTM!

Let's file an issue for local model support on other toxicity models.

Also consider a post_init() or other way to trigger initialization outside of an actual request. We could document that you can call predict on these models as a first time call that should trigger any downloads or initialization.

felipe207 and others added 5 commits February 9, 2024 14:14

local models

aedeb1c

clear embeddings cache at init, small fixes

295fd26

example and precommit fixes

5fd5bc6

Merge branch 'main' into dev/felipe/local-models

ee936e2

add docs

4cf6a20

richard-rogers reviewed Feb 13, 2024

View reviewed changes

langkit/examples/Local_Models.ipynb Outdated Show resolved Hide resolved

richard-rogers reviewed Feb 13, 2024

View reviewed changes

langkit/examples/Local_Models.ipynb Outdated Show resolved Hide resolved

FelipeAdachi and others added 2 commits February 27, 2024 15:05

Update langkit/examples/Local_Models.ipynb

6f823ff

Co-authored-by: richard-rogers <93153899+richard-rogers@users.noreply.github.com>

Update langkit/examples/Local_Models.ipynb

fd40b87

Co-authored-by: richard-rogers <93153899+richard-rogers@users.noreply.github.com>

jamie256 reviewed Mar 5, 2024

View reviewed changes

felipe207 and others added 4 commits March 5, 2024 18:48

cache refactor

b454988

Merge branch 'main' into dev/felipe/local-models

70e83c9

# Conflicts: # langkit/docs/modules.md # langkit/toxicity.py

mypy

7cf3572

Merge branch 'main' into dev/felipe/local-models

a5b2330

jamie256 approved these changes Mar 26, 2024

View reviewed changes

This was referenced Mar 26, 2024

add support for detoxify local models #277

Open

reduce latency in lazy initialization #278

Open

FelipeAdachi merged commit a818af7 into main Mar 27, 2024
12 checks passed

FelipeAdachi deleted the dev/felipe/local-models branch March 27, 2024 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local models #233

local models #233

FelipeAdachi commented Feb 13, 2024

jamie256 left a comment

jamie256 Mar 5, 2024

FelipeAdachi Mar 5, 2024 •

edited

Loading

jamie256 left a comment

local models #233

local models #233

Conversation

FelipeAdachi commented Feb 13, 2024

jamie256 left a comment

Choose a reason for hiding this comment

jamie256 Mar 5, 2024

Choose a reason for hiding this comment

FelipeAdachi Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

jamie256 left a comment

Choose a reason for hiding this comment

FelipeAdachi Mar 5, 2024 •

edited

Loading