-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
~2x perf improvement on Apple Silicon by changing state_shared.has_work access from atomic to mutex/conditional #633
Comments
Tested more with different model sizes, different prompts, and Linux OS. 2x on MBP was an outlier. Now I see different configs have different speedups/slowdowns. So the change as it is cannot be suggested, though it is a place to look at in further perf analysis. |
Can you create a draft pull request anyway (with a note you don't want to merge, just sharing the code with the others), so we can test as well? |
^ did give it a try - works a bit slower on my machine 7B and very small n_predict, but maybe it can be improved - i did a mechanical rewrite without much thinking. its a draft and does not work on windows, so hiding it in my fork, feel free to use : ) |
#710 - this one is a try to change to an existing c thread pool, slight eval timings drop, but cpu usage gets from 700% to 400% on 8 threads, guess cool for mobile usage. also the pr has timings, hope thats useful |
GGUF (Breaking Change to Model Files)
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Discussed in #616
Originally posted by izard March 30, 2023
I profiled on a latest Mac Book Pro machine and found that significantly more time is spent in atomic checks for
state_shared.has_work
in while loops than doing actual work in matrix multiply.So I changed busy waits like:
and setting
has_work
toGot a nice 2x speedup in time/token.
I can't post a patch/pull request because everything I do in spare time still belongs to my employer, but the change is trivial as described above. Probably won't provide much benefit (if any) for other platforms though.
The text was updated successfully, but these errors were encountered: