Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: parallelize quantization #906

Closed
jon-chuang opened this issue Apr 12, 2023 · 3 comments
Closed

perf: parallelize quantization #906

jon-chuang opened this issue Apr 12, 2023 · 3 comments
Labels
performance Speed related topics

Comments

@jon-chuang
Copy link
Contributor

jon-chuang commented Apr 12, 2023

static void llama_model_quantize_internal(const std::string & fname_inp, const std::string & fname_out, enum llama_ftype ftype) {

Is currently single threaded. Quantization is quite slow (vicuna 7B: 65156.31 ms, vicuna 13B: 129902.48 ms).

@sw sw added the performance Speed related topics label Apr 12, 2023
@sw
Copy link
Collaborator

sw commented Apr 12, 2023

@ikawrakow did that in #896, see kQuantizeQ4 in ggml_extra.cpp, but that's for a new quantization scheme.

llama.cpp/ggml_extra.cpp

Lines 287 to 291 in 6bfb00a

int nthread = std::min(nchunk, int(std::thread::hardware_concurrency()));
std::vector<std::thread> workers(nthread-1);
for (auto& w : workers) w = std::thread(compute);
compute();
for (auto& w : workers) w.join();

It did indeed speed things up. This could probably be integrated into llama_model_quantize_internal so that a new cpp module isn't necessary.

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Apr 12, 2023

Is the new quantization scheme the one that minimizes MSE against the original weights?

@sw
Copy link
Collaborator

sw commented Apr 22, 2023

Resolved by #1075

@sw sw closed this as completed Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Speed related topics
Projects
None yet
Development

No branches or pull requests

2 participants