gpt4all-backend: Add temperature sampling with repetition penalty #541

kuvaus · 2023-05-11T17:53:45Z

Title: gpt4all-backend: Add compatibility with new sampling algorithms in llama.cpp

Description: This pull request allows compilation with newest llama.cpp as a submodule.

Changes:

Implemented temperature sampling with repetition penalty as an alternative to the previous llama_sample_top_p_top_k sampling method.

        // Temperature sampling with repetition penalty
        llama_sample_repetition_penalty(
            d_ptr->ctx, &candidates_data,
            promptCtx.tokens.data() + promptCtx.n_ctx - promptCtx.repeat_last_n, promptCtx.repeat_last_n,
            promptCtx.repeat_penalty);
        llama_sample_top_k(d_ptr->ctx, &candidates_data, promptCtx.top_k, 1);
        llama_sample_top_p(d_ptr->ctx, &candidates_data, promptCtx.top_p, 1);
        llama_sample_temperature(d_ptr->ctx, &candidates_data, promptCtx.temp);
        llama_token id = llama_sample_token(d_ptr->ctx, &candidates_data);

This is aimed just to have the minimal changes needed for llama.cpp to compile. In the future one needs to add the other sampling methods and expand the context to include the new sampling parameters.

Note
Compared to examples/main.cpp this omits Tail Free and Locally Typical samplings. This is to avoid changing the context with extra parameters. Those were not used or included in the old llama_sample_top_p_top_k either.

        llama_sample_tail_free(ctx, &candidates_p, tfs_z);
        llama_sample_typical(ctx, &candidates_p, typical_p);

Note
Llama.cpp needs to be included as a submodule as it now generates version info from git automatically using Cmakelists.txt. See ggerganov/llama.cpp#1232 and ggerganov/llama.cpp#1289

Compatible with new llama.cpp

kuvaus · 2023-05-11T17:56:54Z

Compared to nomic-ai/gpt4all-chat#219 this adds size_t min_keep = 1 to the llama_sample_top_k and llama_sample_top_p function calls. This is exactly same as in examples/main.cpp of the llama.cpp.

kuvaus · 2023-05-11T19:08:40Z

Update: A short test with llama.cpp-master-cf348a6 using MPT models seems to suggest the new sampling gives very odd results after the 1st sentence compared to llama_sample_top_p_top_k. Definitely not ready as is.

kuvaus · 2023-05-20T11:00:36Z

See #642 instead.

Add temperature sampling with repetition penalty

e5bfdcc

Compatible with new llama.cpp

AndriyMulyar requested a review from manyoso May 11, 2023 17:56

niansa mentioned this pull request May 19, 2023

Add llama compatibility with new ggml quantization #642

Merged

kuvaus closed this May 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt4all-backend: Add temperature sampling with repetition penalty #541

gpt4all-backend: Add temperature sampling with repetition penalty #541

kuvaus commented May 11, 2023

kuvaus commented May 11, 2023 •

edited

Loading

kuvaus commented May 11, 2023

kuvaus commented May 20, 2023

gpt4all-backend: Add temperature sampling with repetition penalty #541

gpt4all-backend: Add temperature sampling with repetition penalty #541

Conversation

kuvaus commented May 11, 2023

kuvaus commented May 11, 2023 • edited Loading

kuvaus commented May 11, 2023

kuvaus commented May 20, 2023

kuvaus commented May 11, 2023 •

edited

Loading