Skip to content

Commit

Permalink
Update snmpd.conf.j2 prolong agentXTimeout to avoid timeout failure i…
Browse files Browse the repository at this point in the history
…n high CPU utilization scenario (#21350)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it
Fix #21314
Update and prolong the timeout of the requests between snmpd and SNMP AgentX.

In SONiC SNMP AgentX, the MIB updaters and AgentX client shares the same AsyncIO/Coroutine event loop.
During the MIB updaters update the SNMP values, the AgentX client can't respond to the snmpd request.

The default value of snmpd request is 1s(timeout) failure_prs.log 5(retries)

When the CPU is high, the MIB updaters are slow, 1s timeout is not enough, even if it retries 5 times.
Hence update to 5s(timeout) failure_prs.log 4(retries), the time windows = 20s, which makes sure the SNMP request can be handled even with 100% CPU utilization.

##### Work item tracking
- Microsoft ADO **30112399**:

#### How I did it
Update the default value(https://linux.die.net/man/5/snmpd.conf):

agentXTimeout 1(default value) -> 5
agentXRetries 5(default value) -> 4

#### How to verify it

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->
Test on Cisco chassis, test_snmp_cpu.py which triggers 100% CPU utilization test whether snmp requests work well.

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305
- [x] 202405

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [x] 202405
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
  • Loading branch information
mssonicbld authored Jan 8, 2025
1 parent e11ffe5 commit 3ebcedb
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions dockers/docker-snmp/snmpd.conf.j2
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,9 @@ trapsink {{ v3SnmpTrapIp }}:{{ v3SnmpTrapPort }}{% if v3SnmpTrapVrf != 'None' %}
#
# Run as an AgentX master agent
master agentx
agentXTimeout 5
agentXRetries 4

# internal socket to allow extension to other docker containers
# Currently the other container using this is docker-fpm-frr
# make sure this line matches bgp:/etc/snmp/frr.conf
Expand Down

0 comments on commit 3ebcedb

Please sign in to comment.