[Help wanted] [Spec decode]: Increase acceptance rate via Medusa's typical acceptance #5015

cadedaniel · 2024-05-23T20:50:19Z

🚀 The feature, motivation and pitch

Speculative decoding allows emitting multiple tokens per sequence by speculating future tokens, scoring their likelihood using the LLM, and then accepting each speculative token based on its likelihood. This process is laid out in the following diagram:

The problem with rejection sampling is that it holds a very high bar for quality: it is lossless and guarantees the distribution of the target model, even if it means rejecting plausible speculative tokens.

This issue is a request to implement Medusa's typical acceptance routing in vLLM. Typical acceptance trades off output quality to increase the acceptance rate. See "Choice of threshold in typical acceptance" in the Medusa blogpost for more information.

vLLM users should be able to toggle between different acceptance routines; they can use rejection sampling for tasks that require higher quality, or typical acceptance when speedup is more important.

NOTE: This acceptance routine should work with other proposal types (Eagle, draft, ngram, other), not just Medusa. The speculative decoding framework in vLLM may need improvements to the rejection sampling interface to support this.

Alternatives

No response

Additional context

vLLM's rejection sampler is implemented here: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/rejection_sampler.py

sroy745 · 2024-05-23T22:23:25Z

I have started looking at this issue. I will be basing my implementation on this reference implementation in medusa (https://sourcegraph.com/github.com/FasterDecoding/Medusa@e2a5d20c048a9b0a4092e6933c34313687422518/-/blob/medusa/model/utils.py?L404)

cadedaniel added the feature request label May 23, 2024

cadedaniel changed the title ~~[Feature] [Spec decode]: Increase acceptance rate via Medusa's typical acceptance~~ [Help wanted] [Spec decode]: Increase acceptance rate via Medusa's typical acceptance May 23, 2024

cadedaniel added the speculative-decoding label May 23, 2024

cadedaniel assigned sroy745 May 23, 2024

sroy745 mentioned this issue May 30, 2024

[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier #5131

Merged

sroy745 mentioned this issue Jun 11, 2024

[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker #5348

Merged

simon-mo closed this as completed in #5131 Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help wanted] [Spec decode]: Increase acceptance rate via Medusa's typical acceptance #5015

[Help wanted] [Spec decode]: Increase acceptance rate via Medusa's typical acceptance #5015

cadedaniel commented May 23, 2024

sroy745 commented May 23, 2024

[Help wanted] [Spec decode]: Increase acceptance rate via Medusa's typical acceptance #5015

[Help wanted] [Spec decode]: Increase acceptance rate via Medusa's typical acceptance #5015

Comments

cadedaniel commented May 23, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

sroy745 commented May 23, 2024