You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The inefficiency is in SfpBase, xcvr_mem_maps etc and this also affects xcvrd, since both xcvrd and sfputil use the same api's of SfpBase such as get_transceiver_info & get_transceiver_bulk_status & get_transceiver_threshold_info. On a device with 30 front-panel ports and 30 QSFP-DD xcvrs, i've seen pmon CPU usage reaching upto 35% with a period of 10-20 sec. pmon usage can get progressively worse if we have multiple front panel ports
Steps to reproduce the issue:
Plug in a cable of type CMIS eg: QSFP-DD
Run sfputil
Describe the results you received:
root@r-leopard-58:/home/admin# time sfputil show eeprom -p Ethernet0
Cannot get Module EEPROM data: Invalid argument
Ethernet0: SFP EEPROM detected
Active Firmware Version: 0.0
CMIS Revision: 4.0
Identifier: QSFP-DD Double Density 8X Pluggable Transceiver
Specification compliance: passive_copper_media_interface
Vendor Date Code(YYYY-MM-DD Lot): 2020-12-19
Vendor Name: Mellanox
Vendor OUI: 00-02-c9
Vendor PN: MCP1660-W00AE30
Vendor Rev: A3
Vendor SN: MT2051VS03513
real 0m4.875s
user 0m1.179s
sys 0m0.562s
In comparison:
QFFP-28
root@r-leopard-58:/home/admin# time sfputil show eeprom -p Ethernet248
Ethernet248: SFP EEPROM detected
Application Advertisement: N/A
Connector: No separable connector
Encoding: 64B/66B
Extended Identifier: Power Class 1 Module (1.5W max.), No CLEI code present in Page 02h, No CDR in TX, No CDR in RX
Extended RateSelect Compliance: Unknown
Identifier: QSFP28 or later
Length Cable Assembly(m): 2.0
Nominal Bit Rate(100Mbs): 255
Specification compliance:
10/40G Ethernet Compliance Code: Unknown
Extended Specification Compliance: 100GBASE-CR4, 25GBASE-CR CA-25G-L or 50GBASE-CR2 with RS
Fibre Channel Link Length: Unknown
Fibre Channel Speed: Unknown
Fibre Channel Transmission Media: Unknown
Fibre Channel Transmitter Technology: Unknown
Gigabit Ethernet Compliant Codes: 1000BASE-CX
SAS/SATA Compliance Codes: Unknown
SONET Compliance Codes: Unknown
Vendor Date Code(YYYY-MM-DD Lot): 2016-12-31
Vendor Name: Mellanox
Vendor OUI: 00-02-c9
Vendor PN: MCP7H00-G01AR
Vendor Rev: A1
Vendor SN: MT1710VS04177
real 0m0.691s
user 0m0.275s
sys 0m0.110s
Triage
A single get_transciever_info() is resulting in 31 calls to read_eeprom and the read_eeprom for a lot of platforms uses either a subprocess call or a file open/read operations. Thus making it extremely slow. Calling get_transciever_domI() can result in an addition of 40+ calls to read eeprom.
Note: These stats were taken for MSN4700 platform
SfpBase, Xcvr_Api, MemMap and the associated classed must be optimized.
Ideal optimization target should be to drastically reduce calls to read_eeprom.
The text was updated successfully, but these errors were encountered:
@dgsudharsan there is an inherent issue where mlnx platform make several ethool command call via process call that make sfputil much slower in mlnx platform. Do you still see the issue after this fix
@dgsudharsan there is an inherent issue where mlnx platform make several ethool command call via process call that make sfputil much slower in mlnx platform. Do you still see the issue after this fix
That fix significantly reduces the response time but the current approach still involves making multiple file open and read calls. I think SfpBase and the others can be optimized to reduce read_eeprom calls.
Description
The inefficiency is in SfpBase, xcvr_mem_maps etc and this also affects xcvrd, since both xcvrd and sfputil use the same api's of SfpBase such as
get_transceiver_info & get_transceiver_bulk_status & get_transceiver_threshold_info
. On a device with 30 front-panel ports and 30 QSFP-DD xcvrs, i've seen pmon CPU usage reaching upto 35% with a period of 10-20 sec. pmon usage can get progressively worse if we have multiple front panel portsSteps to reproduce the issue:
Describe the results you received:
In comparison:
Triage
A single get_transciever_info() is resulting in 31 calls to read_eeprom and the read_eeprom for a lot of platforms uses either a subprocess call or a file open/read operations. Thus making it extremely slow. Calling get_transciever_domI() can result in an addition of 40+ calls to read eeprom.
Note: These stats were taken for MSN4700 platform
SfpBase, Xcvr_Api, MemMap and the associated classed must be optimized.
Ideal optimization target should be to drastically reduce calls to read_eeprom.
The text was updated successfully, but these errors were encountered: