Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threaded quantization #1075

Merged
merged 6 commits into from
Apr 20, 2023
Merged

Multi-threaded quantization #1075

merged 6 commits into from
Apr 20, 2023

Conversation

ikawrakow
Copy link
Contributor

This PR adds multi-threading for quantization.

The gain is very minor for small models (e.g., LLaMA 7B) and simple quantization (Q4_0 and Q4_1), but very significant for large models and the now more elaborate Q4_2 quantization.

quantize-stats now finishes in just 14.5 seconds (7B) or 44 seconds (13B) on my computer for all 3 quantization types. The single-threaded version took 144 seconds (7B) or 242 seconds (13B).

Not much gain for simple quantizations, bit it will be important
for quantizations that require more CPU cycles.
It now does the job in ~14 seconds on my Mac for
Q4_0, Q4_1 and Q4_2. Single-threaded it was taking
more than 2 minutes after adding the more elaborate
version of Q4_2.
@ikawrakow ikawrakow requested review from sw and unbounded April 20, 2023 05:45
@DannyDaemonic
Copy link
Collaborator

This could make more accurate but slow quantization methods more practical. (See #835.)

llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
ggml.c Outdated Show resolved Hide resolved
@ggerganov ggerganov added the performance Speed related topics label Apr 20, 2023
After changing chunk_size to const int as suggested by
@ggerganov, clang and GCC starting to warn me that I don't
need to capture it in the lambda. So, I removed it from the
capture list. But that makes the MSVC build fail. So,
making it a constexpr to make every compiler happy.
@prusnak
Copy link
Collaborator

prusnak commented Apr 20, 2023

Please resolve conflicts with the master branch

@ggerganov ggerganov merged commit 38de86a into master Apr 20, 2023
@ggerganov ggerganov deleted the multi-thread-quantize branch April 20, 2023 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Speed related topics
Development

Successfully merging this pull request may close these issues.

5 participants