08. Sampling

Supported Samplers

Samplers are used to alter raw probabilities during response generations. Users can tune these to adjust what outputs they get.

Note

Sampling is not a catch-all solution if your generations are behaving the wrong way! These factors can also fall to the prompt, frontend, model, etc. Please do not set arbitrary sampler values without understanding what they do first!

Penalties

Repetition Penalty -

API request field: repetition_penalty
Default: 1.0 - Off
Description: Multiplicative method of preventing repetition of previous tokens in the context.

Frequency Penalty -

API request field: frequency_penalty
Default: 0.0 - Off
Description: A constant value added each time each time a token is sampled, reducing the probability for that specific token.

Presence Penalty -

API request field: presence_penalty
Default: 0.0 - Off
Description: Additive method of preventing repetition of previous tokens in the context. Encourages new ideas to get generated. Unlike frequency penalty, this is a one-off application. tldr; repetition penalty, but additive.

Penalty Range -

Note

Unlike other backends, 0 disables penalties entirely!

API Request: penalty_range or repetition_range or repetition_penalty_range
Default: -1
- When frequency OR presence penalty is enabled, a penalty_range value of -1 applies the penalty to only the output tokens. A lower range is advised.
- Otherwise a penalty range value of -1 = max sequence length
Description: Amount of tokens to look behind when applying penalties.
- For frequency and presence penalty, this should be a low value to avoid "backing the model into a corner" when selecting similar tokens, resulting in large amounts of synonym repeats (aka "thesaurus mode").

Alphabet Soup

Top-P -

API request field: top_p
Default: 1.0 - Off

Min-P -

API request field: min_p
Default: 0.0 - Off

Top-K -

API request field: top_k
Default: 0.0 - Off

Top-A -

API request field: top_a
Default: 0.0 - Off

Miscellaneous

Temperature -

API request field: temperature
Default: 1.0 - Off
Description: A constant value applied to softmax calculation. A higher temperature = more randomness when choosing the next token.

Temp last -

API request field: temp_last
Default: false - Off
Description: Places temperature application last in the sampling stack. Necessary for min-P sampling.

Typical -

API request field: typical
Default: 1.0 - Off

Tail-free Sampling -

API request field: tfs
Default: 1.0 - Off

Logit bias -

API request field: logit_bias
Default: None - Off
Example: [{"1": 50}, {"2": 75}] - An array of bias objects
Description: Adds a positive or negative value to change the occurrence of a specific token. Format: {"token": bias} where bias is from -100 to 100.

Mirostat mode -

API request field: mirostat_mode
Default: 0 - Off
- Exllamav2 only applies mirostat when mirostat mode = 2

Mirostat tau -

API request field: mirostat_tau
Default: 1.5 - Off unless mirostat_mode = 2

Mirostat eta -

API request field: mirostat_eta
Default: 0.1 - Off unless mirostat_mode = 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

08. Sampling

Supported Samplers

Penalties

Alphabet Soup

Miscellaneous

Clone this wiki locally