Skip to content

Commit b11e357

Browse files
committed
docs: add llm token rate limit integration steps
1 parent 8e89592 commit b11e357

File tree

3 files changed

+394
-4
lines changed

3 files changed

+394
-4
lines changed
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
#### Integration Steps
2+
3+
From the user's perspective, to integrate the Token rate limiting function provided by Sentinel, the following steps are required:
4+
5+
1. Prepare a Redis instance
6+
7+
2. Configure and initialize Sentinel's runtime environment.
8+
1. Only initialization from a YAML file is supported
9+
10+
3. Embed points (define resources) with fixed resource type: `ResourceType=ResTypeCommon` and `TrafficType=Inbound`
11+
12+
4. Load rules according to the configuration file below. The rule configuration items include: resource name, rate limiting strategy, specific rule items, Redis configuration, error code, and error message. The following is an example of rule configuration, with specific field meanings detailed in the "Configuration File Description" below.
13+
14+
```go
15+
_, err = llmtokenratelimit.LoadRules([]*llmtokenratelimit.Rule{
16+
{
17+
18+
Resource: ".*",
19+
Strategy: llmtokenratelimit.FixedWindow,
20+
SpecificItems: []llmtokenratelimit.SpecificItem{
21+
{
22+
Identifier: llmtokenratelimit.Identifier{
23+
Type: llmtokenratelimit.Header,
24+
Value: ".*",
25+
},
26+
KeyItems: []llmtokenratelimit.KeyItem{
27+
{
28+
Key: ".*",
29+
Token: llmtokenratelimit.Token{
30+
Number: 1000,
31+
CountStrategy: llmtokenratelimit.TotalTokens,
32+
},
33+
Time: llmtokenratelimit.Time{
34+
Unit: llmtokenratelimit.Second,
35+
Value: 60,
36+
},
37+
},
38+
},
39+
},
40+
},
41+
},
42+
})
43+
```
44+
45+
5. Optional: Create an LLM instance and embed it into the provided adapter
46+
47+
48+
#### Configuration File Description
49+
50+
Overall rule configuration
51+
52+
| Configuration Item | Type | Required | Default Value | Description |
53+
| :----------------- | :------------------- | :------- | :------------------ | :----------------------------------------------------------- |
54+
| enabled | bool | No | false | Whether to enable the LLM Token rate limiting function. Values: false (disable), true (enable) |
55+
| rules | array of rule object | No | nil | Rate limiting rules |
56+
| redis | object | No | | Redis instance connection information |
57+
| errorCode | int | No | 429 | Error code. Will be changed to 429 if set to 0 |
58+
| errorMessage | string | No | "Too Many Requests" | Error message |
59+
60+
rule configuration
61+
62+
| Configuration Item | Type | Required | Default Value | Description |
63+
| :----------------- | :--------------------------- | :------- | :-------------- | :----------------------------------------------------------- |
64+
| resource | string | No | ".*" | Rule resource name, supporting regular expressions. Values: ".*" (global match), user-defined regular expressions |
65+
| strategy | string | No | "fixed-window" | Rate limiting strategy. Values: fixed-window, peta (predictive error temporal allocation) |
66+
| encoding | object | No | | Token encoding method, **exclusively for peta rate limiting strategy** |
67+
| specificItems | array of specificItem object | Yes | | Specific rule items |
68+
69+
encoding configuration
70+
71+
| Configuration Item | Type | Required | Default Value | Description |
72+
| :----------------- | :----- | :------- | :------------ | :-------------------- |
73+
| provider | string | No | "openai" | Model provider |
74+
| model | string | No | "gpt-4" | Model name |
75+
76+
specificItem configuration
77+
78+
| Configuration Item | Type | Required | Default Value | Description |
79+
| :----------------- | :---------------------- | :------- | :------------ | :------------------------------------------- |
80+
| identifier | object | No | | Request identifier |
81+
| keyItems | array of keyItem object | Yes | | Key-value information for rule matching |
82+
83+
identifier configuration
84+
85+
| Configuration Item | Type | Required | Default Value | Description |
86+
| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- |
87+
| type | string | No | "all" | Request identifier type. Values: all (global rate limiting), header |
88+
| value | string | No | ".*" | Request identifier value, supporting regular expressions. Values: ".*" (global match), user-defined regular expressions |
89+
90+
keyItem configuration
91+
92+
| Configuration Item | Type | Required | Default Value | Description |
93+
| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- |
94+
| key | string | No | ".*" | Specific rule item value, supporting regular expressions. Values: ".*" (global match), user-defined regular expressions |
95+
| token | object | Yes | | Token quantity and calculation strategy configuration |
96+
| time | object | Yes | | Time unit and cycle configuration |
97+
98+
token configuration
99+
100+
| Configuration Item | Type | Required | Default Value | Description |
101+
| :----------------- | :----- | :------- | :-------------- | :----------------------------------------------------------- |
102+
| number | int | Yes | | Token quantity, greater than or equal to 0 |
103+
| countStrategy | string | No | "total-tokens" | Token calculation strategy. Values: input-tokens, output-tokens, total-tokens |
104+
105+
time configuration
106+
107+
| Configuration Item | Type | Required | Default Value | Description |
108+
| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- |
109+
| unit | string | Yes | | Time unit. Values: second, minute, hour, day |
110+
| value | int | Yes | | Time value, greater than or equal to 0 |
111+
112+
redis configuration
113+
114+
| Configuration Item | Type | Required | Default Value | Description |
115+
| :----------------- | :------------------- | :------- | :----------------------------------- | :----------------------------------------------------------- |
116+
| addrs | array of addr object | No | [{name: "127.0.0.1", port: 6379}] | Redis node services, **see notes below** |
117+
| username | string | No | Empty string | Redis username |
118+
| password | string | No | Empty string | Redis password |
119+
| dialTimeout | int | No | 0 | Maximum waiting time for establishing a Redis connection, unit: milliseconds |
120+
| readTimeout | int | No | 0 | Maximum waiting time for Redis server response, unit: milliseconds |
121+
| writeTimeout | int | No | 0 | Maximum time for sending command data to the network connection, unit: milliseconds |
122+
| poolTimeout | int | No | 0 | Maximum waiting time for getting an idle connection from the connection pool, unit: milliseconds |
123+
| poolSize | int | No | 10 | Number of connections in the connection pool |
124+
| minIdleConns | int | No | 5 | Minimum number of idle connections in the connection pool |
125+
| maxRetries | int | No | 3 | Maximum number of retries for failed operations |
126+
127+
addr configuration
128+
129+
| Configuration Item | Type | Required | Default Value | Description |
130+
| :----------------- | :----- | :------- | :------------- | :----------------------------------------------------------- |
131+
| name | string | No | "127.0.0.1" | Redis node service name, a complete [FQDN](https://en.wikipedia.org/wiki/Fully_qualified_domain_name) with service type, e.g., my-redis.dns, redis.my-ns.svc.cluster.local |
132+
| port | int | No | 6379 | Redis node service port |
133+
134+
135+
#### Overall Configuration File Example
136+
137+
```YAML
138+
version: "v1"
139+
sentinel:
140+
app:
141+
name: sentinel-go-demo
142+
log:
143+
metric:
144+
maxFileCount: 7
145+
llmTokenRatelimit:
146+
enabled: true,
147+
rules:
148+
- resource: ".*"
149+
strategy: "fixed-window"
150+
specificItems:
151+
- identifier:
152+
type: "header"
153+
value: ".*"
154+
keyItems:
155+
- key: ".*"
156+
token:
157+
number: 1000
158+
countStrategy: "total-tokens"
159+
time:
160+
unit: "second"
161+
value: 60
162+
163+
errorCode: 429
164+
errorMessage: "Too Many Requests"
165+
166+
redis:
167+
addrs:
168+
- name: "127.0.0.1"
169+
port: 6379
170+
username: "redis"
171+
password: "redis"
172+
dialTimeout: 5000
173+
readTimeout: 5000
174+
writeTimeout: 5000
175+
poolTimeout: 5000
176+
poolSize: 10
177+
minIdleConns: 5
178+
maxRetries: 3
179+
```
180+
181+
182+
#### Notes
183+
184+
- PETA uses tiktoken to estimate input token consumption but requires downloading or preconfiguring the `Byte Pair Encoding (BPE)` dictionary
185+
- Online mode
186+
- tiktoken needs to download encoding files online for the first use
187+
- Offline mode
188+
- Prepare pre-cached tiktoken encoding files (**not directly downloaded files, but files processed by tiktoken**) in advance, and specify the file directory via the TIKTOKEN_CACHE_DIR environment variable
189+
- Rule deduplication description
190+
- In keyItems, if only the number differs, the latest number will be retained after deduplication
191+
- In specificItems, only deduplicated keyItems will be retained
192+
- In resource, only the latest resource will be retained
193+
- Redis configuration description
194+
- **If the connected Redis is in cluster mode, the number of addresses in addrs must be at least 2; otherwise, it will default to Redis standalone mode, causing rate limiting to fail**

0 commit comments

Comments
 (0)