-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance on Apple silicon #5
Comments
@marella I haven't been able to use the llm again, once we manage to solve the starcoder quantize issue, I can post the M1 Pro 64GB performance. |
Hi, I'm guessing the issue might be related to performance - it is running too slow so taking time to print the output. By default for token in llm('def fibo(', max_new_tokens=5, stream=True):
print(token, end='', flush=True) The code used by starcoder example and this library is same. The only difference is in the way it is built. So building the library from source can help validate if building locally improves performance and may prevent the library from being stuck: git clone --recurse-submodules https://github.com/marella/ctransformers
cd ctransformers
./scripts/build.sh llm = AutoModelForCausalLM.from_pretrained(..., lib='/path/to/ctransformers/build/lib/libctransformers.dylib') |
See #8 |
Context: #1 (comment)
@bgonzalezfractal did you notice any performance improvement just by changing the
threads
parameter?If you don't have the latest quantized models, you can go back to the previous commit using:
Here you can run the build commands and check:
cmake -S . -B build cmake --build build
The text was updated successfully, but these errors were encountered: