Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top-K Sampling Support #59

Open
mikemykhaylov opened this issue Dec 15, 2024 · 1 comment
Open

Top-K Sampling Support #59

mikemykhaylov opened this issue Dec 15, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@mikemykhaylov
Copy link

Hello,

I find that for some models, like Mistral Nemo, it is very beneficial to restrict the number of considered completions to improve the coherence of the output. However, compared to llama.cpp, MLX does not seem to support top K sampling in LM Studio. It does look like the underlying library supports that, so implementing that would be much appreciated.

llama.cpp models support Top-K
Screenshot 2024-12-15 at 12 29 11

MLX models do not support Top-K
Screenshot 2024-12-15 at 12 29 44

@neilmehta24
Copy link
Member

Thanks for bringing this to our attention. The MLX core library does indeed support a top-k matrix operation, but the MLX LLM library does not support top-k sampling. Here are the supported generation/sampling options as of today https://github.com/ml-explore/mlx-examples/blob/dfa4dd6/llms/mlx_lm/utils.py#L200-L215 . Please track this issue ml-explore/mlx-examples#1167 for adding support in mlx_lm

@neilmehta24 neilmehta24 added the enhancement New feature or request label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants