Top-K Sampling Support #59

mikemykhaylov · 2024-12-15T19:30:07Z

Hello,

I find that for some models, like Mistral Nemo, it is very beneficial to restrict the number of considered completions to improve the coherence of the output. However, compared to llama.cpp, MLX does not seem to support top K sampling in LM Studio. It does look like the underlying library supports that, so implementing that would be much appreciated.

llama.cpp models support Top-K

MLX models do not support Top-K

neilmehta24 · 2024-12-16T17:07:23Z

Thanks for bringing this to our attention. The MLX core library does indeed support a top-k matrix operation, but the MLX LLM library does not support top-k sampling. Here are the supported generation/sampling options as of today https://github.com/ml-explore/mlx-examples/blob/dfa4dd6/llms/mlx_lm/utils.py#L200-L215 . Please track this issue ml-explore/mlx-examples#1167 for adding support in mlx_lm

neilmehta24 mentioned this issue Dec 16, 2024

[Feature Request] Top-K sampling support in mlx_lm.utils.generate_step ml-explore/mlx-examples#1167

Closed

neilmehta24 added the enhancement New feature or request label Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Top-K Sampling Support #59

Top-K Sampling Support #59

mikemykhaylov commented Dec 15, 2024

neilmehta24 commented Dec 16, 2024

Top-K Sampling Support #59

Top-K Sampling Support #59

Comments

mikemykhaylov commented Dec 15, 2024

neilmehta24 commented Dec 16, 2024