Skip to content

Commit df91b66

Browse files
daalllguohan
authored andcommitted
Configurable drop counters HLD (#434)
1 parent 6798624 commit df91b66

File tree

2 files changed

+383
-0
lines changed

2 files changed

+383
-0
lines changed
+383
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,383 @@
1+
# Configurable Drop Counters in SONiC
2+
3+
# High Level Design Document
4+
#### Rev 0.2
5+
6+
# Table of Contents
7+
* [List of Tables](#list-of-tables)
8+
* [List of Figures](#list-of-figures)
9+
* [Revision](#revision)
10+
* [About this Manual](#about-this-manual)
11+
* [Scope](#scope)
12+
* [Defintions/Abbreviation](#definitionsabbreviation)
13+
* [1 Overview](#1-overview)
14+
* [2 Requirements](#2-requirements)
15+
- [2.1 Functional Requirements](#21-functional-requirements)
16+
- [2.2 Configuration and Management Requirements](#2.2-configuration-and-management-requirements)
17+
- [2.3 Scalability Requirements](#23-scalability-requirements)
18+
- [2.4 Supported Debug Counters](#24-supported-debug-counters)
19+
* [3 Design](#3-design)
20+
- [3.1 CLI (and usage example)](#31-cli-and-usage-example)
21+
- [3.1.1 Displaying available counter capabilities](#311-displaying-available-counter-capabilities)
22+
- [3.1.2 Displaying current counter configuration](#312-displaying-current-counter-configuration)
23+
- [3.1.3 Displaying the current counts](#313-displaying-the-current-counts)
24+
- [3.1.4 Clearing the counts](#314-clearing-the-counts)
25+
- [3.1.5 Configuring counters from the CLI](#315-configuring-counters-from-the-CLI)
26+
- [3.2 Config DB](#32-config-db)
27+
- [3.2.1 DEBUG_COUNTER Table](#321-debug_counter-table)
28+
- [3.2.2 PACKET_DROP_COUNTER_REASON Table](#322-packet_drop_counter_reason-table)
29+
- [3.3 State DB](#33-state-db)
30+
- [3.3.1 DEBUG_COUNTER_CAPABILITIES Table](#331-debug-counter-capabilities-table)
31+
- [3.3.2 SAI APIs](#332-sai-apis)
32+
- [3.4 Counters DB](#34-counters-db)
33+
- [3.5 SWSS](#35-swss)
34+
- [3.5.1 SAI APIs](#351-sai-apis)
35+
- [3.6 syncd](#34-syncd)
36+
* [4 Flows](#4-flows)
37+
- [4.1 General Flow](#41-general-flow)
38+
* [5 Warm Reboot Support](#5-warm-reboot-support)
39+
* [6 Unit Tests](#6-unit-tests)
40+
* [7 Open Questions](#7-open-questions)
41+
42+
43+
44+
# List of Tables
45+
* [Table 1: Abbreviations](#definitionsabbreviation)
46+
47+
# List of Figures
48+
* [Figure 1: General Flow](#41-general-flow)
49+
50+
# Revision
51+
| Rev | Date | Author | Change Description |
52+
|:---:|:--------:|:-----------:|--------------------|
53+
| 0.1 | 07/30/19 | Danny Allen | Initial version |
54+
| 0.2 | 09/03/19 | Danny Allen | Review updates |
55+
56+
# About this Manual
57+
This document provides an overview of the implementation of configurable packet drop counters in SONiC.
58+
59+
# Scope
60+
This document describes the high level design of the configurable drop counter feature.
61+
62+
# Definitions/Abbreviation
63+
| Abbreviation | Description |
64+
|--------------|-----------------|
65+
| RX | Receive/ingress |
66+
| TX | Transmit/egress |
67+
68+
# 1 Overview
69+
The main goal of this feature is to provide better packet drop visibility in SONiC by providing a mechanism to count and classify packet drops that occur due to different reasons.
70+
71+
The other goal of this feature is for users to be able to track the types of drop reasons that are important for their scenario. Because different users have different priorities, and because priorities change over time, it is important for this feature to be easily configurable.
72+
73+
We will accomplish both goals by adding support for SAI debug counters to SONiC.
74+
* Support for creating and configuring port-level and switch-level debug counters will be added to orchagent and syncd.
75+
* A CLI tool will be provided for users to manage and configure their own drop counters
76+
77+
# 2 Requirements
78+
79+
## 2.1 Functional Requirements
80+
1. CONFIG_DB can be configured to create debug counters
81+
2. STATE_DB can be queried for debug counter capabilities
82+
3. Users can access drop counter information via a CLI tool
83+
1. Users can see what capabilities are available to them
84+
1. Types of counters (i.e. port-level and/or switch-level)
85+
2. Number of counters
86+
3. Supported drop reasons
87+
2. Users can see what types of drops each configured counter contains
88+
3. Users can add and remove drop reasons from each counter
89+
4. Users can read the current value of each counter
90+
5. Users can assign aliases to counters
91+
6. Users can clear counters
92+
93+
## 2.2 Configuration and Management Requirements
94+
Configuration of the drop counters can be done via:
95+
* config_db.json
96+
* minigraph.xml
97+
* CLI
98+
99+
## 2.3 Scalability Requirements
100+
Users must be able to use all debug counters and drop reasons provided by the underlying hardware.
101+
102+
Interacting with debug counters will not interfere with existing hardware counters (e.g. portstat). Likewise, interacting with existing hardware counters will not interfere with debug counter behavior.
103+
104+
## 2.4 Supported Debug Counters
105+
* PORT_INGRESS_DROPS: port-level ingress drop counters
106+
* PORT_EGRESS_DROPS: port-level egress drop counters
107+
* SWITCH_INGRESS_DROPS: switch-level ingress drop counters
108+
* SWITCH_EGRESS_DROPS: switch-level egress drop counters
109+
110+
# 3 Design
111+
112+
## 3.1 CLI (and usage example)
113+
The CLI tool will provide the following functionality:
114+
* See available drop counter capabilities: `show drops available`
115+
* See drop counter config: `show drops config`
116+
* Show drop counts: `show drops`
117+
* Clear drop counters: `sonic-clear drops`
118+
* Initialize a new drop counter: `config drops init`
119+
* Add drop reasons to a drop counter: `config drops add`
120+
* Remove drop reasons from a drop counter: `config drops remove`
121+
* Delete a drop counter: `config drops delete`
122+
123+
### 3.1.1 Displaying available counter capabilities
124+
```
125+
$ show drops available
126+
TYPE FREE IN-USE
127+
-------------- ---- ------
128+
PORT_INGRESS 2 1
129+
PORT_EGRESS 2 1
130+
SWITCH_INGRESS 1 1
131+
SWITCH_EGRESS 2 0
132+
133+
PORT_INGRESS:
134+
L2_ANY
135+
SMAC_MULTICAST
136+
SMAC_EQUALS_DMAC
137+
INGRESS_VLAN_FILTER
138+
EXCEEDS_L2_MTU
139+
SIP_CLASS_E
140+
SIP_LINK_LOCAL
141+
DIP_LINK_LOCAL
142+
UNRESOLVED_NEXT_HOP
143+
DECAP_ERROR
144+
145+
PORT_EGRESS:
146+
L2_ANY
147+
L3_ANY
148+
A_CUSTOM_REASON
149+
150+
SWITCH_INGRESS:
151+
L2_ANY
152+
SMAC_MULTICAST
153+
SMAC_EQUALS_DMAC
154+
SIP_CLASS_E
155+
SIP_LINK_LOCAL
156+
DIP_LINK_LOCAL
157+
158+
SWITCH_EGRESS:
159+
L2_ANY
160+
L3_ANY
161+
A_CUSTOM_REASON
162+
ANOTHER_CUSTOM_REASON
163+
164+
$ show drops available --type=PORT_EGRESS
165+
TYPE TOTAL FREE IN-USE
166+
-------------- ----- ---- ------
167+
PORT_EGRESS 3 2 1
168+
169+
PORT_EGRESS:
170+
L2_ANY
171+
L3_ANY
172+
A_CUSTOM_REASON
173+
174+
```
175+
176+
### 3.1.2 Displaying current counter configuration
177+
```
178+
$ show drops config
179+
Counter Alias Group Type Reasons Description
180+
-------- -------- ----- -------------- ------------------- --------------
181+
DEBUG_0 RX_LEGIT LEGIT PORT_INGRESS SMAC_EQUALS_DMAC Legitimate port-level RX pipeline drops
182+
INGRESS_VLAN_FILTER
183+
DEBUG_1 TX_LEGIT LEGIT PORT_EGRESS EGRESS_VLAN_FILTER Legitimate port-level TX pipeline drops
184+
DEBUG_2 RX_LEGIT LEGIT SWITCH_INGRESS TTL Legitimate switch-level RX pipeline drops
185+
```
186+
187+
### 3.1.3 Displaying the current counts
188+
189+
```
190+
$ show drops
191+
IFACE STATE RX_ERR RX_DRP RX_DISC RX_LEGIT TX_ERR TX_DRP TX_DISC TX_LEGIT
192+
--------------- ------- ---------- -------- --------- ---------- -------- -------- --------- ----------
193+
Ethernet0 U 0 0 1500 1500 0 0 0 0
194+
Ethernet4 U 0 0 300 250 0 0 0 0
195+
Ethernet8 U 0 0 0 0 0 0 0 0
196+
Ethernet12 U 0 0 1200 400 0 0 0 0
197+
198+
DEVICE STATE RX_LEGIT
199+
--------------- ------- ----------
200+
ABCDEFG-123-XYZ U 2000
201+
202+
$ show drops --type=PORT
203+
IFACE STATE RX_ERR RX_DRP RX_DISC RX_LEGIT TX_ERR TX_DRP TX_DISC TX_LEGIT
204+
---------- ------- -------- -------- --------- ---------- -------- -------- --------- ----------
205+
Ethernet0 U 0 0 1500 1500 0 0 0 0
206+
Ethernet4 U 0 0 300 250 0 0 0 0
207+
Ethernet8 U 0 0 0 0 0 0 0 0
208+
Ethernet12 U 0 0 1200 400 0 0 0 0
209+
210+
$ show drops --group "LEGIT"
211+
IFACE STATE RX_LEGIT TX_LEGIT
212+
--------------- ------- ---------- ----------
213+
Ethernet0 U 0 0
214+
Ethernet4 U 0 0
215+
Ethernet8 U 0 0
216+
Ethernet12 U 0 0
217+
218+
DEVICE STATE RX_LEGIT
219+
--------------- ------- ----------
220+
ABCDEFG-123-XYZ U 2000
221+
```
222+
223+
### 3.1.4 Clearing the counts
224+
```
225+
$ sonic-clear drops
226+
```
227+
228+
### 3.1.5 Configuring counters from the CLI
229+
```
230+
$ config drops init --counter="DEBUG_3" --alias="TX_LEGIT" --group="LEGIT" --type="SWITCH_EGRESS" --desc="Legitimate switch-level TX pipeline drops" --reasons=["L2_ANY", "L3_ANY"]
231+
Initializing DEBUG_3 as TX_LEGIT...
232+
233+
Counter Alias Group Type Reasons Description
234+
------- -------- ----- ------------- ------- -----------
235+
DEBUG_3 TX_LEGIT LEGIT SWITCH_EGRESS L2_ANY Legitimate switch-level TX pipeline drops
236+
L3_ANY
237+
238+
$ config drops add --counter="DEBUG_3" --reasons=["A_CUSTOM_REASON", "ANOTHER_CUSTOM_REASON"]
239+
Configuring DEBUG_3...
240+
241+
Counter Alias Group Type Reasons Description
242+
------- -------- ----- ------------- ------- -----------
243+
DEBUG_3 TX_LEGIT LEGIT SWITCH_EGRESS L2_ANY Legitimate switch-level TX pipeline drops
244+
L3_ANY
245+
A_CUSTOM_REASON
246+
ANOTHER_CUSTOM_REASON
247+
248+
$ config drops remove --counter="DEBUG_3" --reasons=["A_CUSTOM_REASON"]
249+
Configuring DEBUG_3...
250+
251+
Counter Alias Group Type Reasons Description
252+
------- -------- ----- ------------- ------- -----------
253+
DEBUG_3 TX_LEGIT LEGIT SWITCH_EGRESS L2_ANY Legitimate switch-level TX pipeline drops
254+
L3_ANY
255+
ANOTHER_CUSTOM_REASON
256+
257+
$ config drops delete --counter="DEBUG_3"
258+
```
259+
260+
## 3.2 Config DB
261+
Two new tables will be added to Config DB:
262+
* DEBUG_COUNTER to store general debug counter metadata
263+
* DEBUG_COUNTER_DROP_REASON to store drop reasons for debug counters that have been configured to track packet drops
264+
265+
### 3.2.1 DEBUG_COUNTER Table
266+
Example:
267+
```
268+
{
269+
"DEBUG_COUNTER": {
270+
"DEBUG_0": {
271+
"alias": "PORT_RX_LEGIT",
272+
"type": "PORT_INGRESS_DROPS",
273+
"desc": "Legitimate port-level RX pipeline drops",
274+
"group": "LEGIT"
275+
},
276+
"DEBUG_1": {
277+
"alias": "PORT_TX_LEGIT",
278+
"type": "PORT_EGRESS_DROPS",
279+
"desc": "Legitimate port-level TX pipeline drops"
280+
"group": "LEGIT"
281+
},
282+
"DEBUG_2": {
283+
"alias": "SWITCH_RX_LEGIT",
284+
"type": "SWITCH_INGRESS_DROPS",
285+
"desc": "Legitimate switch-level RX pipeline drops"
286+
"group": "LEGIT"
287+
}
288+
}
289+
}
290+
```
291+
292+
### 3.2.2 DEBUG_COUNTER_DROP_REASON Table
293+
Example:
294+
```
295+
{
296+
"DEBUG_COUNTER_DROP_REASON": {
297+
"DEBUG_0|SMAC_EQUALS_DMAC": {},
298+
"DEBUG_0|INGRESS_VLAN_FILTER": {},
299+
"DEBUG_1|EGRESS_VLAN_FILTER": {},
300+
"DEBUG_2|TTL": {},
301+
}
302+
}
303+
```
304+
305+
## 3.3 State DB
306+
State DB will store information about:
307+
* What types of drop counters are available on this device
308+
* How many drop counters are available on this device
309+
* What drop reasons are supported by this device
310+
311+
### 3.3.1 DEBUG_COUNTER_CAPABILITIES Table
312+
Example:
313+
```
314+
{
315+
"DEBUG_COUNTER_CAPABILITIES": {
316+
"SWITCH_INGRESS_DROPS": {
317+
"total": 3,
318+
"used": 1,
319+
"reasons": [L2_ANY, L3_ANY, SMAC_EQUALS_DMAC]
320+
},
321+
"SWITCH_EGRESS_DROPS": {
322+
"total": 3,
323+
"used": 1,
324+
"reasons": [L2_ANY, L3_ANY]
325+
}
326+
}
327+
}
328+
```
329+
330+
This information will be populated by the orchestrator (described later) on startup.
331+
332+
### 3.3.2 SAI APIs
333+
We will use the following SAI APIs to get this information:
334+
* `sai_query_attribute_enum_values_capability` to query support for different types of counters
335+
* `sai_object_type_get_availability` to query the amount of available debug counters
336+
337+
## 3.4 Counters DB
338+
The contents of the drop counters will be added to Counters DB by flex counters.
339+
340+
Additionally, we will add a mapping from debug counter names to the appropriate port or switch stat index called COUNTERS_DEBUG_NAME_PORT_STAT_MAP and COUNTERS_DEBUG_NAME_SWITCH_STAT_MAP respectively.
341+
342+
## 3.5 SWSS
343+
A new orchestrator will be created to handle debug counter creation and configuration. Specifically, this orchestrator will support:
344+
* Creating a new counter
345+
* Deleting existing counters
346+
* Adding drop reasons to an existing counter
347+
* Removing a drop reason from a counter
348+
349+
### 3.5.1 SAI APIs
350+
This orchestrator will interact with the following SAI Debug Counter APIs:
351+
* `sai_create_debug_counter_fn` to create/configure new drop counters.
352+
* `sai_remove_debug_counter_fn` to delete/free up drop counters that are no longer being used.
353+
* `sai_get_debug_counter_attribute_fn` to gather information about counters that have been configured (e.g. index, drop reasons, etc.).
354+
* `sai_set_debug_counter_attribute_fn` to re-configure drop reasons for counters that have already been created.
355+
356+
## 3.6 syncd
357+
Flex counter will be extended to support switch-level SAI counters.
358+
359+
# 4 Flows
360+
## 4.1 General Flow
361+
![alt text](./drop_counters_general_flow.png)
362+
The overall workflow is shown above in figure 1.
363+
364+
(1) Users configure drop counters using the CLI. Configurations are stored in the DEBUG_COUNTER Config DB table.
365+
366+
(2) The debug counts orchagent subscribes to the Config DB table. Once the configuration changes, the orchagent uses the debug SAI API to configure the drop counters.
367+
368+
(3) The debug counts orchagent publishes counter configurations to Flex Counter DB.
369+
370+
(4) Syncd subscribes to Flex Counter DB and sets up flex counters. Flex counters periodically query ASIC counters and publishes data to Counters DB.
371+
372+
(5) CLI uses counters DB to satisfy CLI requests.
373+
374+
(6) (not shown) CLI uses State DB to display hardware capabilities (e.g. how many counters are available, supported drop reasons, etc.)
375+
376+
# 5 Warm Reboot Support
377+
On resource-constrained platforms, debug counters will be deleted prior to warm reboot and re-installed when orchagent starts back up. This is intended to conserve hardware resources during the warm reboot.
378+
379+
# 6 Unit Tests
380+
A separate test plan will be uploaded and reviewed by the community. This will include both virtual switch tests to verify that ASIC_DB is configured correctly as well as pytest to verify overall system correctness.
381+
382+
# 7 Open Questions
383+
- How common of an operation is configuring a drop counter? Is this something that will usually only be done on startup, or something people will be updating frequently?
24.2 KB
Loading

0 commit comments

Comments
 (0)