-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mirroring] Missing per-vendor validation of mirror session queue parameter #8189
Comments
orchagent crashed for this issue. @shi-su please work with @raphaelt-nvidia on this issue. |
Here is suggested code to add to MirrorOrch::createEntry that would address this:
My doubt is about the line
Should it be '>' or '>='? I chose '>' because of the attribute's description:
For example, if the valid values for queue are 0-14, I would expect get_switch_attribute to return 14. Or would you expect 15? |
I am not sure about this, but I felt it should be '>=' since it seems to be defined as the maximum number and 0-14 includes 15 values. In an extreme case that it supports nothing, the return value should be 0, otherwise, it could not cover it. This is just my feeling, we need to check. Another concern about this change is that it seems to check for |
I see that many instances of calling sai_switch_api->get_switch_attribute occur in initialization routines, so I agree with your second comment. If you wish me to supply the code, I would like to wait until there is a definite ruling on my original question, so that we can test it with a conforming SAI implementation. |
I tried your proposed fix. Interestingly, for the scenario that valid values for queue are 0-14, the |
The 16 is a bug in our SAI implementation that has been changed to 15 - not sure when that goes upstream. My question here is whether it should actually be changed to 14. |
Per my understanding, 15 makes better sense. Yet this question should better be answered by someone who is familiar with the SAI definition. |
How do we identify and get the attention of the people who should decide? It seems to me that if the decision is 15, then the comment "Maximum traffic classes limit" should be changed and clarified. Also say something about the lower bound. Is the assumption that 0 is the lowest valid value true for all platforms? |
@zhangyanzhao Could you please help get attention from someone who has expertise in the SAI definition? This seems to go beyond my knowledge set. |
Ping |
Ack and let me see how can help on the SAI part. |
Description
It is possible to configure a value for queue in a mirror session that is outside the range supported by the switch vendor. The flow is that MirrorOrch::createEntry calls activateSession which calls status = sai_mirror_api->create_mirror_session. If it fails, e.g. due to invalid queue, activateSession calls handleSaiCreateStatus, which in any failure case initiates exiting orchagent.
I think the solution involves orchagent comparing the user-supplied value of queue with SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES as part of its earlier validations, and not attempting to create the session if this validation fails.
Steps to reproduce the issue:
Describe the results you received:
ERR syncd#SDK: [SPAN.ERR] SWITCH_PRIO 15 is outside valid range 0-14.
ERR syncd#SDK: [SPAN.ERR] __span_add failed, err: Parameter Error.
ERR syncd#SDK: [SAI_MIRROR.ERR] mlnx_sai_mirror.c[2647]- mlnx_create_mirror_session: Error creating mirror session
ERR syncd#SDK: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_INVALID_PARAMETER
ERR syncd#SDK: :- processQuadEvent: attr: SAI_MIRROR_SESSION_ATTR_TC: 15
ERR swss#orchagent: :- create: create status: SAI_STATUS_INVALID_PARAMETER
ERR swss#orchagent: :- activateSession: Failed to activate mirroring session ms6
ERR swss#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_MIRROR, status: SAI_STATUS_INVALID_PARAMETER
Describe the results you expected:
A single error in log from orchagent only, no exiting the process.
Output of
show version
:SONiC Software Version: SONiC.202106.0-ebc962c22
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: ebc962c
Build date: Thu Jun 24 17:59:46 UTC 2021
Built by: raphaelt@r-build-sonic05
Platform: x86_64-mlnx_msn2410-r0
HwSKU: ACS-MSN2410
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1921X01546
Uptime: 13:55:39 up 2:44, 2 users, load average: 1.58, 1.65, 1.65
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-mlnx 202106.0-ebc962c22 d00568dad081 961MB
docker-syncd-mlnx latest d00568dad081 961MB
docker-snmp 202106.0-ebc962c22 69263d7ff0ca 454MB
docker-snmp latest 69263d7ff0ca 454MB
docker-dhcp-relay 202106.0-ebc962c22 12bd15bd2785 420MB
docker-dhcp-relay latest 12bd15bd2785 420MB
docker-teamd 202106.0-ebc962c22 92f76cb0c626 424MB
docker-teamd latest 92f76cb0c626 424MB
docker-nat 202106.0-ebc962c22 36858e717963 427MB
docker-nat latest 36858e717963 427MB
docker-router-advertiser 202106.0-ebc962c22 34fc1983f5d3 413MB
docker-router-advertiser latest 34fc1983f5d3 413MB
docker-platform-monitor 202106.0-ebc962c22 74dd0063f256 738MB
docker-platform-monitor latest 74dd0063f256 738MB
docker-lldp 202106.0-ebc962c22 ce9a9e42fecd 453MB
docker-lldp latest ce9a9e42fecd 453MB
docker-macsec 202106.0-ebc962c22 89d2ebe2ddaa 427MB
docker-macsec latest 89d2ebe2ddaa 427MB
docker-database 202106.0-ebc962c22 32ce5d38e191 413MB
docker-database latest 32ce5d38e191 413MB
docker-orchagent 202106.0-ebc962c22 c6254d27f85d 442MB
docker-orchagent latest c6254d27f85d 442MB
docker-sonic-telemetry 202106.0-ebc962c22 bd04168564d8 501MB
docker-sonic-telemetry latest bd04168564d8 501MB
docker-sonic-mgmt-framework 202106.0-ebc962c22 a5ec5adee9cd 570MB
docker-sonic-mgmt-framework latest a5ec5adee9cd 570MB
docker-fpm-frr 202106.0-ebc962c22 5436d8f7e983 442MB
docker-fpm-frr latest 5436d8f7e983 442MB
docker-sflow 202106.0-ebc962c22 0d353410dcb2 425MB
docker-sflow latest 0d353410dcb2 425MB
Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: