Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpt4all-backend: Add temperature sampling with repetition penalty #541

Closed
wants to merge 1 commit into from

Conversation

kuvaus
Copy link
Contributor

@kuvaus kuvaus commented May 11, 2023

Title: gpt4all-backend: Add compatibility with new sampling algorithms in llama.cpp

Description: This pull request allows compilation with newest llama.cpp as a submodule.

Changes:

Implemented temperature sampling with repetition penalty as an alternative to the previous llama_sample_top_p_top_k sampling method.

        // Temperature sampling with repetition penalty
        llama_sample_repetition_penalty(
            d_ptr->ctx, &candidates_data,
            promptCtx.tokens.data() + promptCtx.n_ctx - promptCtx.repeat_last_n, promptCtx.repeat_last_n,
            promptCtx.repeat_penalty);
        llama_sample_top_k(d_ptr->ctx, &candidates_data, promptCtx.top_k, 1);
        llama_sample_top_p(d_ptr->ctx, &candidates_data, promptCtx.top_p, 1);
        llama_sample_temperature(d_ptr->ctx, &candidates_data, promptCtx.temp);
        llama_token id = llama_sample_token(d_ptr->ctx, &candidates_data);

This is aimed just to have the minimal changes needed for llama.cpp to compile. In the future one needs to add the other sampling methods and expand the context to include the new sampling parameters.

Note
Compared to examples/main.cpp this omits Tail Free and Locally Typical samplings. This is to avoid changing the context with extra parameters. Those were not used or included in the old llama_sample_top_p_top_k either.

        llama_sample_tail_free(ctx, &candidates_p, tfs_z);
        llama_sample_typical(ctx, &candidates_p, typical_p);

Note
Llama.cpp needs to be included as a submodule as it now generates version info from git automatically using Cmakelists.txt. See ggerganov/llama.cpp#1232 and ggerganov/llama.cpp#1289

@AndriyMulyar AndriyMulyar requested a review from manyoso May 11, 2023 17:56
@kuvaus
Copy link
Contributor Author

kuvaus commented May 11, 2023

Compared to nomic-ai/gpt4all-chat#219 this adds size_t min_keep = 1 to the llama_sample_top_k and llama_sample_top_p function calls. This is exactly same as in examples/main.cpp of the llama.cpp.

@kuvaus
Copy link
Contributor Author

kuvaus commented May 11, 2023

Update: A short test with llama.cpp-master-cf348a6 using MPT models seems to suggest the new sampling gives very odd results after the 1st sentence compared to llama_sample_top_p_top_k. Definitely not ready as is.

@kuvaus
Copy link
Contributor Author

kuvaus commented May 20, 2023

See #642 instead.

@kuvaus kuvaus closed this May 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant