Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[counters] Improve performance by polling only configured ports buffer queue/pg counters #2360

Merged

Conversation

nazariig
Copy link
Collaborator

@nazariig nazariig commented Jun 28, 2022

commit 6f1199afc3458e526c26b5d7c498fdaf810fbaca
Author: Shlomi Bitton <shlomibi@nvidia.com>
Date:   Sun Jan 2 16:55:58 2022 +0000

Filter unconfigured ports buffers queue/pg counters configurations on init.
If no buffer configurations available, no counters will be created.
Allow creating/removing counters on runtime if buffer PG/Queue is created or removed.
New UT added to verify new flow.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>

Signed-off-by: Nazarii Hnydyn nazariig@nvidia.com

Propagating #2143 with resolved merge conflicts
Depends on: sonic-net/sonic-utilities#2239

What I did
Currently in SONiC all ports queue and pg counters are created by default with the max possible amount of counters.
This feature change this behavior to poll only configured counters provided by the config DB BUFFER_PG and BUFFER_QUEUE tables.
If no tables are present in the DB, no counters will be created for ports.
Filter the unwanted queues/pgs returned by SAI API calls and skip the creation of these queue/pg counters.
Also allow creating/removing counters on runtime if buffer PG/Queue is configured or removed.

Why I did it
Improve performance by filtering unconfigured queue/pg counters on init.

How I verified it
Check after enabling the counters, if configured counters created in Counters DB according to the configurations.
Add/Remove buffer PG/Queue configurations and observe the corresponding counters created/removed accordingly.
New UT added to verify this flow.

Details if related

  • N/A

@nazariig nazariig requested a review from liat-grozovik June 28, 2022 17:06
@nazariig nazariig requested a review from dprital June 28, 2022 17:22
@nazariig nazariig force-pushed the master-buffer-queues-pgs-counters branch from 1c2c36e to a56fb77 Compare June 29, 2022 12:08
@nazariig
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nazariig nazariig force-pushed the master-buffer-queues-pgs-counters branch from a56fb77 to 30c9089 Compare June 30, 2022 08:19
@nazariig
Copy link
Collaborator Author

nazariig commented Jul 4, 2022

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nazariig
Copy link
Collaborator Author

nazariig commented Jul 4, 2022

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yxieca
Copy link
Contributor

yxieca commented Jul 5, 2022

@nazariig can you look into the test failures?

@nazariig
Copy link
Collaborator Author

nazariig commented Jul 8, 2022

@nazariig can you look into the test failures?

@yxieca sure. Will do.

@nazariig nazariig force-pushed the master-buffer-queues-pgs-counters branch from 30c9089 to 7917273 Compare July 9, 2022 13:35
@nazariig
Copy link
Collaborator Author

nazariig commented Jul 9, 2022

@yxieca looks like the tests are unstable because each run is totally different story

@nazariig
Copy link
Collaborator Author

nazariig commented Jul 9, 2022

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nazariig nazariig force-pushed the master-buffer-queues-pgs-counters branch from 7917273 to 46487d5 Compare July 12, 2022 10:14
@nazariig
Copy link
Collaborator Author

The issue is observed with: p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails.

Looks like the test has been executed twice:

2022-07-12T13:47:06.9867130Z p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteWithNexthopAddUpdateDeletePass PASSED [ 98%]
2022-07-12T13:47:20.2662519Z p4rt/test_l3.py::TestP4RTL3::test_IPv6WithWcmpRouteAddUpdateDeletePass PASSED [ 98%]
2022-07-12T13:47:24.5227387Z p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteAddWithInvalidNexthopFail PASSED [ 98%]
2022-07-12T13:47:28.8312874Z p4rt/test_l3.py::TestP4RTL3::test_IPv6RouteAddWithInvalidWcmpFail PASSED [ 98%]
2022-07-12T13:47:42.0593122Z p4rt/test_l3.py::TestP4RTL3::test_PruneAndRestoreNextHop PASSED          [ 99%]
2022-07-12T13:48:16.0895897Z p4rt/test_l3.py::TestP4RTL3::test_PruneNextHopOnWarmBoot PASSED          [ 99%]
2022-07-12T13:48:28.2509772Z p4rt/test_l3.py::TestP4RTL3::test_CreateWcmpMemberForOperUpWatchportOnly PASSED [ 99%]
2022-07-12T13:48:39.3870004Z p4rt/test_l3.py::TestP4RTL3::test_RemovePrunedWcmpGroupMember PASSED     [ 99%]
2022-07-12T13:48:39.3884297Z p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRulesAddUpdateDelPass SKIPPED [ 99%]
2022-07-12T13:49:18.4729702Z p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails PASSED [ 99%]
2022-07-12T13:49:18.4743042Z p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails ERROR [ 99%]
2022-07-12T13:50:00.5598863Z p4rt/test_p4rt_mirror.py::TestP4RTMirror::test_MirrorSessionAddModifyAndDelete PASSED [100%]
2022-07-12T13:50:00.5599324Z 
2022-07-12T13:50:00.5599755Z ==================================== ERRORS ====================================
2022-07-12T13:50:00.5600312Z _ ERROR at teardown of TestP4RTAcl.test_AclRuleAddWithoutTableDefinitionFails __
2022-07-12T13:50:00.5600557Z 
2022-07-12T13:50:00.5609259Z tp = <class 'ValueError'>, value = None, tb = None
2022-07-12T13:50:00.5609494Z 
2022-07-12T13:50:00.5609895Z     def reraise(tp, value, tb=None):
2022-07-12T13:50:00.5610103Z         try:
2022-07-12T13:50:00.5610252Z             if value is None:
2022-07-12T13:50:00.5610401Z                 value = tp()
2022-07-12T13:50:00.5610606Z             if value.__traceback__ is not tb:
2022-07-12T13:50:00.5610798Z                 raise value.with_traceback(tb)
2022-07-12T13:50:00.5610972Z >           raise value
2022-07-12T13:50:00.5611039Z 
2022-07-12T13:50:00.5611357Z /usr/lib/python3/dist-packages/six.py:703: 
2022-07-12T13:50:00.5611593Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2022-07-12T13:50:00.5611926Z /usr/lib/python3/dist-packages/six.py:703: in reraise
2022-07-12T13:50:00.5612122Z     raise value
2022-07-12T13:50:00.5612408Z /usr/lib/python3/dist-packages/six.py:703: in reraise
2022-07-12T13:50:00.5612598Z     raise value
2022-07-12T13:50:00.5612759Z conftest.py:1828: in testlog
2022-07-12T13:50:00.5613091Z     dvs.runcmd(f"logger -t pytest === finish test {request.node.nodeid} ===")
2022-07-12T13:50:00.5613320Z conftest.py:599: in runcmd
2022-07-12T13:50:00.5613497Z     res = self.ctn.exec_run(cmd)
2022-07-12T13:50:00.5613936Z /usr/local/lib/python3.8/dist-packages/docker/models/containers.py:198: in exec_run
2022-07-12T13:50:00.5614556Z     exec_output = self.client.api.exec_start(
2022-07-12T13:50:00.5615014Z /usr/local/lib/python3.8/dist-packages/docker/utils/decorators.py:19: in wrapped
2022-07-12T13:50:00.5615265Z     return f(self, resource_id, *args, **kwargs)
2022-07-12T13:50:00.5615739Z /usr/local/lib/python3.8/dist-packages/docker/api/exec_api.py:169: in exec_start
2022-07-12T13:50:00.5616203Z     return self._read_from_socket(res, stream, tty=tty, demux=demux)
2022-07-12T13:50:00.5616886Z /usr/local/lib/python3.8/dist-packages/docker/api/client.py:422: in _read_from_socket
2022-07-12T13:50:00.5617313Z     return consume_socket_output(gen, demux=demux)
2022-07-12T13:50:00.5617942Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:137: in consume_socket_output
2022-07-12T13:50:00.5618340Z     return six.binary_type().join(frames)
2022-07-12T13:50:00.5618932Z /usr/local/lib/python3.8/dist-packages/docker/api/client.py:416: in <genexpr>
2022-07-12T13:50:00.5619382Z     gen = (data for (_, data) in gen)
2022-07-12T13:50:00.5620025Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:94: in frames_iter_no_tty
2022-07-12T13:50:00.5620457Z     (stream, n) = next_frame_header(socket)
2022-07-12T13:50:00.5621111Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:66: in next_frame_header
2022-07-12T13:50:00.5621524Z     data = read_exactly(socket, 8)
2022-07-12T13:50:00.5622094Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:51: in read_exactly
2022-07-12T13:50:00.5622691Z     next_data = read(socket, n - len(data))
2022-07-12T13:50:00.5623089Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2022-07-12T13:50:00.5623301Z 
2022-07-12T13:50:00.5623640Z socket = <socket.SocketIO object at 0x7fb093f4a7f0>, n = 8
2022-07-12T13:50:00.5623754Z 
2022-07-12T13:50:00.5623905Z     def read(socket, n=4096):
2022-07-12T13:50:00.5624084Z         """
2022-07-12T13:50:00.5624370Z         Reads at most n bytes from socket
2022-07-12T13:50:00.5624656Z         """
2022-07-12T13:50:00.5624877Z     
2022-07-12T13:50:00.5625166Z         recoverable_errors = (errno.EINTR, errno.EDEADLK, errno.EWOULDBLOCK)
2022-07-12T13:50:00.5625354Z     
2022-07-12T13:50:00.5625533Z         if six.PY3 and not isinstance(socket, NpipeSocket):
2022-07-12T13:50:00.5625736Z >           select.select([socket], [], [])
2022-07-12T13:50:00.5625950Z E           ValueError: filedescriptor out of range in select()
2022-07-12T13:50:00.5626056Z 
2022-07-12T13:50:00.5626382Z /usr/local/lib/python3.8/dist-packages/docker/utils/socket.py:31: ValueError
2022-07-12T13:50:00.5626776Z ---------------------------- Captured stdout setup -----------------------------
2022-07-12T13:50:00.5626990Z remove extra link dummy
2022-07-12T13:50:00.5627324Z ---------------------------- Captured stderr setup -----------------------------
2022-07-12T13:50:00.5627596Z Exception ignored in: <function ApplDbValidator.__del__ at 0x7fb09542b820>
2022-07-12T13:50:00.5627829Z Traceback (most recent call last):
2022-07-12T13:50:00.5628044Z   File "/agent/_work/1/s/tests/conftest.py", line 164, in __del__
2022-07-12T13:50:00.5628275Z     neighbors = self.get_keys(self.NEIGH_TABLE)
2022-07-12T13:50:00.5628507Z   File "/agent/_work/1/s/tests/dvslib/dvs_database.py", line 115, in get_keys
2022-07-12T13:50:00.5628762Z     table = swsscommon.Table(self.db_connection, table_name)
2022-07-12T13:50:00.5629400Z   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2335, in __init__
2022-07-12T13:50:00.5629768Z     _swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args))
2022-07-12T13:50:00.5630175Z RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address

Locally test passes:

root@sonic-vs:~# pytest --imgname=docker-sonic-vs:latest --force-flaky --log-cli-level=info -vvv -k "TestWatermark or TestP4RTL3 or TestP4RTAcl or TestP4RTMirror"
========================================================================= test session starts =========================================================================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /root/work/debug/counters/sonic-swss/tests
plugins: flaky-3.7.0
collected 601 items / 587 deselected / 14 selected

test_watermark.py::TestWatermark::test_telemetry_period PASSED                                                                                                  [  7%]
test_watermark.py::TestWatermark::test_lua_plugins SKIPPED (This test is not stable enough)                                                                     [ 14%]
test_watermark.py::TestWatermark::test_clear SKIPPED (This test is not stable enough)                                                                           [ 21%]
p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteWithNexthopAddUpdateDeletePass PASSED                                                                                [ 28%]
p4rt/test_l3.py::TestP4RTL3::test_IPv6WithWcmpRouteAddUpdateDeletePass PASSED                                                                                   [ 35%]
p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteAddWithInvalidNexthopFail PASSED                                                                                     [ 42%]
p4rt/test_l3.py::TestP4RTL3::test_IPv6RouteAddWithInvalidWcmpFail PASSED                                                                                        [ 50%]
p4rt/test_l3.py::TestP4RTL3::test_PruneAndRestoreNextHop PASSED                                                                                                 [ 57%]
p4rt/test_l3.py::TestP4RTL3::test_PruneNextHopOnWarmBoot PASSED                                                                                                 [ 64%]
p4rt/test_l3.py::TestP4RTL3::test_CreateWcmpMemberForOperUpWatchportOnly PASSED                                                                                 [ 71%]
p4rt/test_l3.py::TestP4RTL3::test_RemovePrunedWcmpGroupMember PASSED                                                                                            [ 78%]
p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRulesAddUpdateDelPass SKIPPED (p4orch is not enabled)                                                               [ 85%]
p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails PASSED                                                                           [ 92%]
p4rt/test_p4rt_mirror.py::TestP4RTMirror::test_MirrorSessionAddModifyAndDelete PASSED                                                                           [100%]

========================================================================== warnings summary ===========================================================================

...

p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteWithNexthopAddUpdateDeletePass
p4rt/test_p4rt_acl.py::TestP4RTAcl::test_AclRuleAddWithoutTableDefinitionFails
p4rt/test_p4rt_mirror.py::TestP4RTMirror::test_MirrorSessionAddModifyAndDelete
  /usr/local/lib/python3.8/dist-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function ApplDbValidator.__del__ at 0x7fe15c72d040>

  Traceback (most recent call last):
    File "/root/work/debug/counters/sonic-swss/tests/conftest.py", line 159, in __del__
      neighbors = self.get_keys(self.NEIGH_TABLE)
    File "/root/work/debug/counters/sonic-swss/tests/dvslib/dvs_database.py", line 115, in get_keys
      table = swsscommon.Table(self.db_connection, table_name)
    File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2173, in __init__
      _swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args))
  RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address

    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===Flaky Test Report===

test_telemetry_period passed 1 out of the required 1 times. Success!
test_IPv4RouteWithNexthopAddUpdateDeletePass passed 1 out of the required 1 times. Success!
test_IPv6WithWcmpRouteAddUpdateDeletePass passed 1 out of the required 1 times. Success!
test_IPv4RouteAddWithInvalidNexthopFail passed 1 out of the required 1 times. Success!
test_IPv6RouteAddWithInvalidWcmpFail passed 1 out of the required 1 times. Success!
test_PruneAndRestoreNextHop passed 1 out of the required 1 times. Success!
test_PruneNextHopOnWarmBoot passed 1 out of the required 1 times. Success!
test_CreateWcmpMemberForOperUpWatchportOnly passed 1 out of the required 1 times. Success!
test_RemovePrunedWcmpGroupMember passed 1 out of the required 1 times. Success!
test_AclRuleAddWithoutTableDefinitionFails passed 1 out of the required 1 times. Success!
test_MirrorSessionAddModifyAndDelete passed 1 out of the required 1 times. Success!

===End Flaky Test Report===
=============================================== 11 passed, 3 skipped, 587 deselected, 47 warnings in 604.22s (0:10:04) ================================================

@nazariig
Copy link
Collaborator Author

@qiluo-msft / @yxieca i can't reproduce this issue in my environment. It is definitely not related to the current code changes. Can you please assist here?

@qiluo-msft
Copy link
Contributor

@prsunny do you have insight on vstest failures?

@prabhataravind
Copy link
Contributor

@prsunny do you have insight on vstest failures?

It appears to be the same as #2373 / #2350

@nazariig
Copy link
Collaborator Author

/usr/local/lib/python3.8/dist-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function ApplDbValidator.del at 0x7fe15c72d040>

Traceback (most recent call last):
File "/root/work/debug/counters/sonic-swss/tests/conftest.py", line 159, in del
neighbors = self.get_keys(self.NEIGH_TABLE)
File "/root/work/debug/counters/sonic-swss/tests/dvslib/dvs_database.py", line 115, in get_keys
table = swsscommon.Table(self.db_connection, table_name)
File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2173, in init
_swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args))
RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address

@prabhataravind could be triggered by ApplDbValidator.__del__

@nazariig nazariig force-pushed the master-buffer-queues-pgs-counters branch from 46487d5 to f1de7ce Compare July 12, 2022 23:10
@nazariig nazariig force-pushed the master-buffer-queues-pgs-counters branch 2 times, most recently from ffbfa93 to 4b8ded6 Compare August 19, 2022 11:41
@nazariig
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nazariig nazariig force-pushed the master-buffer-queues-pgs-counters branch from 4b8ded6 to fcdde2c Compare August 19, 2022 15:44
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
@nazariig nazariig force-pushed the master-buffer-queues-pgs-counters branch from fcdde2c to c728f8c Compare August 19, 2022 16:08
@nazariig nazariig requested a review from bingwang-ms August 21, 2022 22:09
@liat-grozovik liat-grozovik merged commit a4b8992 into sonic-net:master Aug 25, 2022
@liat-grozovik
Copy link
Collaborator

@nazariig can you please check that it is cherrypickable for 202205? if not, please create a PR against 202205

@yxieca
Copy link
Contributor

yxieca commented Aug 26, 2022

@nazariig this change cannot be cherry-picked into 202205 cleanly. Can you raise new PR?

dprital added a commit to dprital/sonic-buildimage that referenced this pull request Aug 29, 2022
Update sonic-swss submodule pointer to include the following:
* Dynamic port configuration - add port buffer cfg to the port ref counter ([sonic-net#2194](sonic-net/sonic-swss#2194))
* tlm_teamd: Filter portchannel subinterface events from STATE_DB LAG_TABLE ([sonic-net#2408](sonic-net/sonic-swss#2408))
* [counters] Improve performance by polling only configured ports buffer queue/pg counters ([sonic-net#2360](sonic-net/sonic-swss#2360))
* added support for Xsight platform ([sonic-net#2426](sonic-net/sonic-swss#2426))
* [ci][asan] add DVS tests run with ASAN ([sonic-net#2363](sonic-net/sonic-swss#2363))
* Handle dual ToR neighbor miss scenario ([sonic-net#2151](sonic-net/sonic-swss#2151))
* Upstream new development on p4orch ([sonic-net#2237](sonic-net/sonic-swss#2237))
* [lgtm] Fix dependency ([sonic-net#2419](sonic-net/sonic-swss#2419))
* [muxorch] Returning true if nbr in skip_neighbor_ in isNeighborActive() ([sonic-net#2415](sonic-net/sonic-swss#2415))
* [macsec]: Set MTU for MACsec ([sonic-net#2398](sonic-net/sonic-swss#2398))
* Delete Invalid if condition in intfsorch.cpp ([sonic-net#2411](sonic-net/sonic-swss#2411))

Signed-off-by: dprital <drorp@nvidia.com>
prsunny added a commit that referenced this pull request Sep 19, 2022
…ts buffer queue/pg counters (#2360)"

This reverts commit a4b8992.
prsunny added a commit that referenced this pull request Sep 23, 2022
…ts buffer queue/pg counters (#2360)" (#2458)

*Revert "[counters] Improve performance by polling only configured ports buffer queue/pg counters (#2360)" (#2458)
This reverts commit a4b8992.
vadymhlushko-mlnx added a commit to vadymhlushko-mlnx/sonic-swss that referenced this pull request Sep 29, 2022
vadymhlushko-mlnx added a commit to vadymhlushko-mlnx/sonic-swss that referenced this pull request Oct 10, 2022
vadymhlushko-mlnx added a commit to vadymhlushko-mlnx/sonic-swss that referenced this pull request Oct 11, 2022
vadymhlushko-mlnx added a commit to vadymhlushko-mlnx/sonic-swss that referenced this pull request Oct 20, 2022
vadymhlushko-mlnx added a commit to vadymhlushko-mlnx/sonic-swss that referenced this pull request Oct 21, 2022
vadymhlushko-mlnx added a commit to vadymhlushko-mlnx/sonic-swss that referenced this pull request Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants