Skip to content

Commit 6cbd733

Browse files
SONiC Entity MIB Extension HLD (#657)
* Create extension-to-physical-entity-mib.md * fix typo * Update review for physical entity * 1. add section to describe entPhysicalIsFRU implementation; 2. add more detail about database change; 3. add one more case in regression test to cover new platform API * fix format * Update entPhysicalIndex generating rule * Change description to explain why we define the new rule for generating entPhysicalIndex Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
1 parent d5b29be commit 6cbd733

File tree

1 file changed

+374
-0
lines changed

1 file changed

+374
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,374 @@
1+
# SONiC Entity MIB Extension #
2+
3+
### Revision ###
4+
5+
| Rev | Date | Author | Change Description |
6+
|:---:|:-----------:|:------------------:|-----------------------------------|
7+
| 0.1 | | Kebo Liu | Initial version |
8+
| 0.2 | | Junchao Chen | Fix community review comment |
9+
10+
11+
12+
## 1. Overview
13+
14+
The Entity MIB contains several groups of MIB objects: entityPhysical group, entityLogical group and so on. Currently SONiC only implemented part of the entityPhysical group following RFC2737. Since entityPhysical group is mostly common used, this extension will focus on entityPhysical group and leave other groups for future implementation. The group entityPhysical contains a single table called "entPhysicalTable" to identify the physical components of the system. The MIB objects of "entityPhysical" group listed as below:
15+
16+
EntPhysicalEntry ::= SEQUENCE {
17+
entPhysicalIndex PhysicalIndex,
18+
entPhysicalDescr SnmpAdminString,
19+
entPhysicalVendorType AutonomousType,
20+
entPhysicalContainedIn INTEGER,
21+
entPhysicalClass PhysicalClass,
22+
entPhysicalParentRelPos INTEGER,
23+
entPhysicalName SnmpAdminString,
24+
entPhysicalHardwareRev SnmpAdminString,
25+
entPhysicalFirmwareRev SnmpAdminString,
26+
entPhysicalSoftwareRev SnmpAdminString,
27+
entPhysicalSerialNum SnmpAdminString,
28+
entPhysicalMfgName SnmpAdminString,
29+
entPhysicalModelName SnmpAdminString,
30+
entPhysicalAlias SnmpAdminString,
31+
entPhysicalAssetID SnmpAdminString,
32+
entPhysicalIsFRU TruthValue
33+
}
34+
35+
Detailed information about the MIB objects inside entPhysicalTable can be found in section 3 of RFC2737
36+
37+
## 2. Current Entity MIB implementation in SONiC
38+
Currently SONiC implemented part of the MIB objects in the table:
39+
40+
entPhysicalDescr SnmpAdminString,
41+
entPhysicalClass PhysicalClass,
42+
entPhysicalName SnmpAdminString,
43+
entPhysicalHardwareRev SnmpAdminString,
44+
entPhysicalFirmwareRev SnmpAdminString,
45+
entPhysicalSoftwareRev SnmpAdminString,
46+
entPhysicalSerialNum SnmpAdminString,
47+
entPhysicalMfgName SnmpAdminString,
48+
entPhysicalModelName SnmpAdminString,
49+
50+
Now only physical entities as transceivers and its DOM sensors(Temp, voltage, rx power, tx power and tx bias) are implemented, with snmpwalk can fetch the MIB info:
51+
52+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1000 = STRING: "SFP/SFP+/SFP28 for Ethernet0"
53+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1001 = STRING: "DOM Temperature Sensor for Ethernet0"
54+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1002 = STRING: "DOM Voltage Sensor for Ethernet0"
55+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1011 = STRING: "DOM RX Power Sensor for Ethernet0/1"
56+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1012 = STRING: "DOM TX Bias Sensor for Ethernet0/1"
57+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1013 = STRING: "DOM TX Power Sensor for Ethernet0/1"
58+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1021 = STRING: "DOM RX Power Sensor for Ethernet0/2"
59+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1022 = STRING: "DOM TX Bias Sensor for Ethernet0/2"
60+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1023 = STRING: "DOM TX Power Sensor for Ethernet0/2"
61+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1031 = STRING: "DOM RX Power Sensor for Ethernet0/3"
62+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1032 = STRING: "DOM TX Bias Sensor for Ethernet0/3"
63+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1033 = STRING: "DOM TX Power Sensor for Ethernet0/3"
64+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1041 = STRING: "DOM RX Power Sensor for Ethernet0/4"
65+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1042 = STRING: "DOM TX Bias Sensor for Ethernet0/4"
66+
SNMPv2-SMI::mib-2.47.1.1.1.1.2.1043 = STRING: "DOM TX Power Sensor for Ethernet0/4"
67+
68+
## 3. A new extension to Entity MIB implementation
69+
This extension aims to implement all the objects in the entityPhysical group.
70+
71+
Also plan to add more physical entities such as thermal sensors, fan, and its tachometers; PSU, PSU fan, and some sensors contained in PSU.
72+
73+
Another thing need to highlight is that in the current implementation, "entPhysicalContainedIn" object is not implemented, so there is no way to reflect the physical location of the components, this time it will be amended, by this all the MIB entries can be organized in a hierarchy manner, see below chart:
74+
75+
Chassis -
76+
         |--MGMT (Chassis)
77+
         |     |--CPU package Sensor/T(x) (Temperature sensor)
78+
         |     |--CPU Core Sensor/T(x) (Temperature sensor)
79+
         |     |--Board AMB temp/T(x) (Temperature sensor)
80+
         |     |--Ports AMB temp/T(x) (Temperature sensor)
81+
         |     |--ASIC (Switch device)
82+
         |              |--ASIC/T(x) (Temperature sensor)
83+
         |--FAN(x) (Fan)
84+
         |              |-- FAN/F(y) (Fan sensor)
85+
         |--PS(x) (Power supply)
86+
         |              |-- FAN/F(y) (Fan sensor)
87+
         |              |-- power-mon/T(y) (Temperature sensor)
88+
         |              |-- power-mon/ VOLTAGE  (Voltage sensor)
89+
         |--Ethernet x/y … cable (Port module)
90+
|--DOM Temperature Sensor for Ethernet(x)  (Temperature sensor)
91+
|--DOM Voltage Sensor for Ethernet(x)  (Voltage sensor)
92+
|--DOM RX Power Sensor for Ethernet(x)/(y) (Power sensor)
93+
|--DOM TX Bias Sensor for Ethernet(x)/(y) (Bias sensor)
94+
95+
96+
## 4. The data source of the MIB entries
97+
98+
Thermalctl daemon, xcvrd, psud, are collecting physical device info to state DB, now we have PSU_INFO tale, FAN_INFO table, and TEMPERATURE_INFO table which can provide information for MIB entries.
99+
100+
Thermal sensors MIB info will come from TEMPERATURE_INFO, FAN_INFO will feed to FAN MIB entries and PSU_INFO will be the source of the PSU related entries.
101+
102+
The current already implemented cable and cable DOM sensors getting data from tables(TRANSCEIVER_INFO and TRANSCEIVER_DOM_SENSOR) which is maintained by xcvrd.
103+
104+
### 4.1 entPhysicalParentRelPos implementation
105+
106+
entPhysicalParentRelPos is an indication of the relative position of this 'child' component among all its 'sibling' components. Sibling components are defined as entPhysicalEntries which share the same instance values of each of the entPhysicalContainedIn and entPhysicalClass objects.
107+
108+
In current SONiC implementation, there are following issues:
109+
110+
1. There is no position information in current platform API. Take fan as an example, now fan objects are saved as an list in chassis object, but the list index cannot reflect the physical fan position. There might be two problems, one is that the list can be initialized with arbitrary order, the other is that the order might change when remove/insert a fan to switch.
111+
2. Now all thermal objects are stored in chassis object, but not all thermal objects are the directly children of chassis. For example, we have PSU thermal and SFP module thermal object whose parent device is not chassis.
112+
113+
In order to provide reliable data for entPhysicalParentRelPos, a few changes will be made in platform API.
114+
115+
First, a new API will be added to DeviceBase class for getting the relative position in parent device. See:
116+
117+
```python
118+
class DeviceBase(object):
119+
def get_position_in_parent(self):
120+
"""
121+
Retrieves 1-based relative physical position in parent device
122+
Returns:
123+
integer: The 1-based relative physical position in parent device
124+
"""
125+
raise NotImplementedError
126+
```
127+
128+
Second, add thermal list for PsuBase and SfpBase to reflect the actual hierarchy. Vender should initialize thermal list for PSU and SFP properly. Thermal control daemon should also retrieve thermal objects from PSU and SFP.
129+
130+
```python
131+
class PsuBase(device_base.DeviceBase):
132+
def __init__(self):
133+
self._thermal_list = []
134+
135+
def get_num_thermals(self):
136+
"""
137+
Retrieves the number of thermals available on this PSU
138+
139+
Returns:
140+
An integer, the number of thermals available on this PSU
141+
"""
142+
return len(self._thermal_list)
143+
144+
def get_all_thermals(self):
145+
"""
146+
Retrieves all thermals available on this PSU
147+
148+
Returns:
149+
A list of objects derived from ThermalBase representing all thermals
150+
available on this PSU
151+
"""
152+
return self._thermal_list
153+
154+
def get_thermal(self, index):
155+
"""
156+
Retrieves thermal unit represented by (0-based) index <index>
157+
158+
Args:
159+
index: An integer, the index (0-based) of the thermal to
160+
retrieve
161+
162+
Returns:
163+
An object dervied from ThermalBase representing the specified thermal
164+
"""
165+
thermal = None
166+
167+
try:
168+
thermal = self._thermal_list[index]
169+
except IndexError:
170+
sys.stderr.write("THERMAL index {} out of range (0-{})\n".format(
171+
index, len(self._thermal_list)-1))
172+
173+
return thermal
174+
175+
176+
class SfpBase(device_base.DeviceBase):
177+
def __init__(self):
178+
self._thermal_list = []
179+
180+
def get_num_thermals(self):
181+
"""
182+
Retrieves the number of thermals available on this SFP
183+
184+
Returns:
185+
An integer, the number of thermals available on this SFP
186+
"""
187+
return len(self._thermal_list)
188+
189+
def get_all_thermals(self):
190+
"""
191+
Retrieves all thermals available on this SFP
192+
193+
Returns:
194+
A list of objects derived from ThermalBase representing all thermals
195+
available on this SFP
196+
"""
197+
return self._thermal_list
198+
199+
def get_thermal(self, index):
200+
"""
201+
Retrieves thermal unit represented by (0-based) index <index>
202+
203+
Args:
204+
index: An integer, the index (0-based) of the thermal to
205+
retrieve
206+
207+
Returns:
208+
An object derived from ThermalBase representing the specified thermal
209+
"""
210+
thermal = None
211+
212+
try:
213+
thermal = self._thermal_list[index]
214+
except IndexError:
215+
sys.stderr.write("THERMAL index {} out of range (0-{})\n".format(
216+
index, len(self._thermal_list)-1))
217+
218+
return thermal
219+
```
220+
221+
A new database table will be added to store the position information. The new table will be discussed in section 4.4.
222+
223+
### 4.2 entPhysicalContainedIn implementation
224+
225+
According to RFC, entPhysicalContainedIn indicates the value of entPhysicalIndex for the physical entity which 'contains' this physical entity. A value of zero indicates this physical entity is not contained in any other physical entity.
226+
227+
Now platform API uses a hierarchy structure to store platform devices. This hierarchy structure can be used for entPhysicalContainedIn implementation. For example, chassis object has a list of PSU objects, and PSU object has a list of PSU fan objects, we can deduce parent device based on such information and no new platform API is needed.
228+
229+
Thermalctld, psud, xcvrd will collect the parent device name and save the information to the database(see section 4.4). Snmp agent will use the parent device name to retrieve the parent sub ID and fill it to entPhysicalContainedIn field.
230+
231+
### 4.3 entPhysicalIsFRU implementation
232+
233+
The entPhysicalIsFRU object indicates whether or not this physical entity is considered a 'field replaceable unit' by the vendor. If this object contains the value 'true(1)' then this entPhysicalEntry identifies a field replaceable unit. For all entPhysicalEntries which represent components that are permanently contained within a field replaceable unit, the value 'false(2)' should be returned for this object.
234+
235+
A new platform API DeviceBase.is_replaceable will be added to get such information. Vendor should override this method in order to support entPhysicalIsFRU.
236+
237+
```python
238+
class DeviceBase(object):
239+
def is_replaceable(self):
240+
"""
241+
Indicate whether this device is replaceable.
242+
Returns:
243+
bool: True if it is replaceable.
244+
"""
245+
raise NotImplementedError
246+
```
247+
248+
A new field is_replaceable will be added to FAN_INFO, PSU_INFO and TRANSCEIVER_INFO table (See detail in section 4.4). Thermalctld, psud, xcvrd will collect this information and save it to database.
249+
250+
### 4.4 Database change
251+
252+
New fields 'current', 'power', 'is_replaceable' will be added to PSU_INFO table:
253+
254+
; Defines information for a psu
255+
key = PSU_INFO|psu_name ; information for the psu
256+
; field = value
257+
...
258+
current = FLOAT ; current of the psu
259+
power = FLOAT ; power of the psu
260+
is_replaceable = BOOLEAN ; indicate if the psu is replaceable
261+
262+
New field 'is_replaceable' will be added to FAN_INFO table:
263+
264+
; Defines information for a fan
265+
key = FAN_INFO|fan_name ; information for the fan
266+
; field = value
267+
...
268+
is_replaceable = BOOLEAN ; indicate if the fan is replaceable
269+
270+
New field 'is_replaceable' will be added to TEMPERATURE_INFO table:
271+
272+
; Defines information for a thermal object
273+
key = TEMPERATURE_INFO|object_name ; name of the thermal object(CPU, ASIC, optical modules...)
274+
; field = value
275+
...
276+
is_replaceable = BOOLEAN ; indicate if the thermal is replaceable
277+
278+
New field 'is_replaceable' will be added to TRANSCEIVER_INFO table:
279+
280+
; Defines Transceiver information for a port
281+
key = TRANSCEIVER_INFO|ifname ; information for SFP on port
282+
; field = value
283+
...
284+
is_replaceable = BOOLEAN ; indicate if the SFP is replaceable
285+
286+
Currently, we only store fan drawer name in FAN_INFO table and that is not enough to describe all the attributes of a fan drawer. A new table FAN_DRAWER_INFO will be added. Thermalctld is responsible for saving data to FAN_DRAWER_INFO table. See table definition:
287+
288+
; Defines information for a fan drawer
289+
key = FAN_DRAWER_INFO|object_name ; name of the fan drawer object
290+
; field = value
291+
presence = BOOLEAN ; presence of the fan drawer
292+
model = STRING ; model name of the fan drawer
293+
serial = STRING ; serial number of the fan drawer
294+
status = BOOLEAN ; status of the fan drawer
295+
led_status = STRING ; led status of the fan drawer
296+
is_replaceable = BOOLEAN ; indicate if the fan drawer is replaceable
297+
298+
As discussed in section 4.1 and 4.2, we need more information in database to implement entPhysicalParentRelPos and entPhysicalContainedIn. There is an option that we could add these information to existing table such as PSU_INFO, FAN_INFO etc. However, as these two MIB objects are used to describe the relationship between physical entities and table like PSU_INFO is used for saving attributes of a physical entity, we prefer to store the relation info to a new table. A new table PHYSICAL_ENTITY_INFO will be added:
299+
300+
; Defines information to store physical entity relationship
301+
key = PHYSICAL_ENTITY_INFO|object_name ; name of the entity object
302+
; field = value
303+
position_in_parent = INTEGER ; physical position in parent device
304+
parent_name = STRING ; name of parent device
305+
306+
The data of PHYSICAL_ENTITY_INFO will be collected by thermalctld, psud and xcvrd.
307+
308+
### 4.4 entPhysicalIndex implementation
309+
310+
The existing rule for generating entPhysicalIndex is too simple. There is risk that two different entities might have the same entPhysicalIndex. Here we design a new rule for generating the entPhysicalIndex:
311+
312+
```
313+
For non-port entity, the rule to generate entPhysicalIndex describes below:
314+
The entPhysicalIndex is divided into 3 layers:
315+
1. Module layer which includes modules located on system (e.g. fan drawer, PSU)
316+
2. Device layer which includes system devices (e.g. fan )
317+
3. Sensor layer which includes system sensors (e.g. temperature sensor, fan sensor)
318+
The entPhysicalIndex is a 9 digits number, and each digit describes below:
319+
Digit 1: Module Type
320+
Digit 2~3: Module Index
321+
Digit 4~5: Device Type
322+
Digit 6~7: Device Index
323+
Digit 8: Sensor Type
324+
Digit 9: Sensor Index
325+
326+
Module Type describes below:
327+
2 - Management
328+
5 - Fan Drawer
329+
6 - PSU
330+
Device Type describes below:
331+
01 - PS
332+
02 - Fan
333+
24 - Power Monitor (temperature, power, current, voltage...)
334+
99 - Chassis Thermals
335+
Sensor Type describes below:
336+
1 - Temperature
337+
2 - Fan Tachometers
338+
3 - Power
339+
4 - Current
340+
5 - Voltage
341+
342+
e.g. 501000000 means the first fan drawer, 502020100 means the first fan of the second fan drawer
343+
344+
As we are using ifindex to generate port entPhysicalIndex and ifindex might be a value larger than 99 which cannot be hold by 2 digits, we uses a different way to generate port entPhysicalIndex.
345+
346+
For port entity, the entPhysicalIndex is a 10 digits number, and each digit describes below:
347+
Digit 1: 1
348+
Digit 2~8: ifindex
349+
Digit 9: Sensor Type
350+
Digit 10: Sensor Index
351+
352+
Port Sensor Type describes below:
353+
1 - Temperature
354+
2 - TX Power
355+
3 - RX Power
356+
4 - TX BIAS
357+
5 - Voltage
358+
```
359+
360+
## 5. Entity MIB extension test
361+
362+
### 5.1 Unit test
363+
364+
SNMP unit test for sensors (https://github.com/Azure/sonic-snmpagent/blob/master/tests/test_sensor.py) will be extended to cover all the new added MIB objects and physical components.
365+
366+
### 5.2 Community regression test
367+
368+
New test cases will be added to cover the new MIB entries:
369+
370+
1. Get temp sensor MIB info and cross-check with the TEMPERATURE_INFO table.
371+
2. Get fan MIB info and cross-check with the FAN_INFO table.
372+
3. Get PSU related MIB info and cross-check with PSU_INFO and related tables
373+
4. Remove/Add DB entries from related tables to see whether MIB info can be correctly updated.
374+
5. Currently, each platform API is tested by sonic-mgmt, see [here](https://github.com/Azure/sonic-mgmt/tree/master/tests/platform_tests/api). We will add regression test case for each newly added platform API to verify them.

0 commit comments

Comments
 (0)