backend: support non-ASCII characters in path to llmodel libs on Windows #2388

cebtenzzre · 2024-05-29T17:45:35Z

There were two problems preventing GPT4All from locating the implementation libraries on Windows if the path contained non-ASCII characters:

We were using LoadLibraryExA (which normally only supports ASCII) and not LoadLibraryExW
std::filesystem doesn't understand UTF-8 on Windows and must be used with wide strings instead

While I was touching this code I refactored Dlhandle into a proper .cpp/.h split, which greatly improves readability IMO.

Related to #2111 (that issue was about the model path, not the implementation lib path)

Testing

I created a venv called zz😊 (for tab completion's sake) using pure Python (since using Unicode at the Windows command line is not trivial):

$ python
>>> import sys
>>> sys.argv[1:] = ['zz\U0001F60A']
>>> import venv.__main__

Then I activated it (the glob allows this to work even when the console doesn't fully support Unicode):

$ . .\zz??\Scripts\Activate.ps1

Then install the Python binding in that venv. With the current release, this fails:

$ python
>>> from gpt4all import GPT4All
>>> x = GPT4All('orca-mini-3b-gguf2-q4_0.gguf') 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\msys64\home\jared\gpt4all\zz😊\Lib\site-packages\gpt4all\gpt4all.py", line 206, in __init__
    self.model = LLModel(self.config["path"], n_ctx, ngl)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\msys64\home\jared\gpt4all\zz😊\Lib\site-packages\gpt4all\_pyllmodel.py", line 218, in __init__
    raise RuntimeError(f"Unable to instantiate model: {'null' if s is None else s.decode()}")
RuntimeError: Unable to instantiate model: Could not find any implementations for build variant: default

But with this PR (don't install it editable/-e or the resolved path won't be in the venv), it works:

$ python
>>> from gpt4all import GPT4All
>>> x = GPT4All('orca-mini-3b-gguf2-q4_0.gguf')
>>>

I didn't test with the UI since the installer doesn't let you select a path containing non-ASCII characters, but if the user moves GPT4All to a non-ASCII path after-the-fact, this should at least give it a chance of working.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Using the native path representation allows us to manipulate paths and call LoadLibraryEx without mangling non-ASCII characters. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

gpt4all-backend/oscompat.cpp

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

gpt4all-backend/dlhandle.h

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added 3 commits May 29, 2024 13:18

backend: refactor dlhandle.h into oscompat.{cpp,h}

84f5c58

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llmodel: alias std::filesystem

86165f1

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llmodel: use wide strings for paths on Windows

7575880

Using the native path representation allows us to manipulate paths and call LoadLibraryEx without mangling non-ASCII characters. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre requested a review from manyoso May 29, 2024 17:45

llmodel: prefer built-in std::filesystem functionality

b2393d9

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

manyoso reviewed May 31, 2024

View reviewed changes

gpt4all-backend/oscompat.cpp Outdated Show resolved Hide resolved

cebtenzzre added 5 commits May 31, 2024 11:30

oscompat: fix string type error

0eba91b

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: rename oscompat back to dlhandle

2027819

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

dlhandle: fix #includes

908c502

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

dlhandle: remove another #include

6d56a4d

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

dlhandle: move dlhandle #include

0246d17

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

manyoso reviewed May 31, 2024

View reviewed changes

gpt4all-backend/dlhandle.h Show resolved Hide resolved

dlhandle: remove #includes that are covered by dlhandle.h

9e3d549

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

manyoso approved these changes May 31, 2024

View reviewed changes

llmodel: fix #include order

23803a4

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

manyoso merged commit 4e89a9c into main May 31, 2024
6 of 19 checks passed

ellipsis-dev bot mentioned this pull request Jul 2, 2024

release.json: update release notes for v3.0.0 #2514

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backend: support non-ASCII characters in path to llmodel libs on Windows #2388

backend: support non-ASCII characters in path to llmodel libs on Windows #2388

cebtenzzre commented May 29, 2024

backend: support non-ASCII characters in path to llmodel libs on Windows #2388

backend: support non-ASCII characters in path to llmodel libs on Windows #2388

Conversation

cebtenzzre commented May 29, 2024

Testing