-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama.cpp run examples/chat-13B.sh with llama-13B model and hangs on start up long time, any type input is not responsed. #6084
Comments
If I'm understanding correctly then the issue is it's very slow.
I'd edit this value based on the # of physical CPU cores your system has. --threads "Using the correct number of threads can greatly improve performance." |
Yes, it is very slow . I also tried 12 threads, but it does not work. Only one logic core using 100%, the rest is ~10%? |
Performance depends on the cores in your system, for example, some systems use fewer cores for better performance CLBlast may help your performance, then you'll be able to use |
Thanks. I found that i configured virual processors incorrect for llama, now it works fast with 12 virutal processors.
|
Following the readme in repo with the following steps:
make
in repo.python3 convert.py models/llama/13B/
./quantize ./models/llama/13B/ggml-model-f16.gguf ./models/llama/13B/ggml-model-Q4_K_M.gguf Q4_K_M
And then any typing is not responsed in console and after a long time(very long time ~5min), the typed input showed in console.
And after pressing the enter, the generation starts but it is very slow(1s per token)....
And cpu 100%, mem ~8G
Any ideas?
`
The text was updated successfully, but these errors were encountered: