Skip to content

Commit

Permalink
Add UT for orchagent watchdog (sonic-net#8306)
Browse files Browse the repository at this point in the history
### Description of PR
Add UT for orchagent watchdog.

Summary:
SWSS service will add watchdog mechanism to generate keepalive message, and generate alert when swss have issue.
This PR will add new UT to cover the watchdog mechanism.

### Type of change

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [x] Test case(new/improvement)


### Back port request
- [ ] 201911
- [ ] 202012
- [ ] 202205

### Approach
#### What is the motivation for this PR?
Add new UT to test and protect watchdog mechanism from code change.

#### How did you do it?
Pause orchagent service with 'kill -stop' command and check if the watchdog can send alert.

#### How did you verify/test it?
Manually test new UT.
Pass PR validation.

#### Any platform specific information?
No

#### Supported testbed topology if it's a new test case?
Any

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
  • Loading branch information
liuh-80 authored and parmarkj committed Oct 3, 2023
1 parent 984c587 commit ab90a3f
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 0 deletions.
1 change: 1 addition & 0 deletions .azure-pipelines/pr_test_scripts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ t0:
- test_interfaces.py
- test_procdockerstatsd.py
- database/test_db_scripts.py
- system_health/test_watchdog.py

t0-2vlans:
- dhcp_relay/test_dhcp_relay.py
Expand Down
63 changes: 63 additions & 0 deletions tests/system_health/test_watchdog.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import logging
import pytest
import time
from tests.common.helpers.assertions import pytest_assert

pytestmark = [
pytest.mark.disable_loganalyzer,
pytest.mark.topology('any')
]

logger = logging.getLogger(__name__)

SLEEP_TIME = 10


@pytest.fixture
def pause_orchagent(duthost):
# find orchagent pid
pid = duthost.shell(
r"pgrep orchagent",
module_ignore_errors=True)['stdout']
logger.info('Get orchagent pid: {}'.format(pid))

# pause orchagent and clear syslog
duthost.shell(r"sudo kill -STOP {}".format(pid), module_ignore_errors=True)
duthost.shell(r"sudo truncate -s 0 /var/log/syslog", module_ignore_errors=True)

yield

# resume orchagent and clear syslog
duthost.shell(r"sudo kill -CONT {}".format(pid), module_ignore_errors=True)
duthost.shell(r"sudo truncate -s 0 /var/log/syslog", module_ignore_errors=True)


def test_orchagent_watchdog(duthosts, enum_rand_one_per_hwsku_hostname, pause_orchagent):
duthost = duthosts[enum_rand_one_per_hwsku_hostname]

result = duthost.shell(
r"docker exec -i swss sh -c 'test -f /etc/supervisor/watchdog_processes && echo exist'",
module_ignore_errors=True)['stdout']
logger.info('Check watchdog exist: {}'.format(result))
if result != 'exist':
pytest.skip("Skip orchagent watchdog test.")

# wait watchdog emit alert, orchagent watchdog timeout is 60 seconds
WATCHDOG_TIMEOUT = 120
current_attempt = 0
while (True):
time.sleep(SLEEP_TIME)
alert = duthost.shell(
r"sudo cat /var/log/syslog | grep 'is stuck in namespace'",
module_ignore_errors=True)['stdout']
logger.info('Get alert from host: {}'.format(alert))
if "orchagent" in str(alert):
return
else:
# orchagent watchdog timeout is 60 seconds
if current_attempt >= WATCHDOG_TIMEOUT/SLEEP_TIME:
pytest_assert(
False,
"orchagent watchdog did not been trigger after {} seconds".format(WATCHDOG_TIMEOUT))
else:
current_attempt += 1

0 comments on commit ab90a3f

Please sign in to comment.