Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect ICMP packet loss information #14

Merged
merged 23 commits into from
Feb 1, 2022

Conversation

zjswhhh
Copy link
Contributor

@zjswhhh zjswhhh commented Jan 3, 2022

Description of PR

Summary:
Fixes # (issue)

This PR is to collect ICMP packet loss information.

sign-off: Jing Zhang zhangjing@microsoft.com

Type of change

  • Bug fix
  • New feature
  • Doc/Design
  • Unit test

Approach

What is the motivation for this PR?

When ICMP heartbeat loss happens, we want to know how long it lasts. We also want to collect the packet loss ratio information.

How did you do it?

  • Post link prober state change events and time stamps to state db metrics, when get in or out link prober unknown state;
  • Post link prober pck loss ratio every 300 ms to state db.

How did you verify/test it?

Tested on dual testbed.

Table entries created as expected:

$ redis-cli -n 6 
127.0.0.1:6379[6]> KEYS *LINK_PROBE*
1) "LINK_PROBE_STATS|Ethernet48"
2) "LINK_PROBE_STATS|Ethernet84"
      ... ...

Before ICMP responder was turned on:

~$ redis-cli -n 6 HGETALL "LINK_PROBE_STATS|Ethernet44" 
1) "pck_loss_count"
2) "24"
3) "pck_expected_count"
4) "24"
5) "link_prober_unknown_start"
6) "2022-Jan-26 02:59:58.138248"

After running a link_failure test case, link_prober_unknown_start and link_prober_unknown_end are updated:

~$ redis-cli -n 6 HGETALL "LINK_PROBE_STATS|Ethernet44"
1) "pck_loss_count"
2) "612"
3) "pck_expected_count"
4) "840"
5) "link_prober_unknown_start"
6) "2022-Jan-26 03:13:05.366900"
7) "link_prober_unknown_end"
8) "2022-Jan-26 03:17:35.446580"

After resetting the packet loss counts:

~$ redis-cli -n 4 HSET "MUX_CABLE|Ethernet44" pck_loss_data_reset reset 
(integer) 0

~$ redis-cli -n 6 HGETALL "LINK_PROBE_STATS|Ethernet44"
1) "pck_loss_count"
2) "0"
3) "pck_expected_count"
4) "0"
5) "link_prober_unknown_start"
6) "2022-Jan-26 03:13:05.366900"
7) "link_prober_unknown_end"
8) "2022-Jan-26 03:17:35.446580"

~$ redis-cli -n 6 HGETALL "LINK_PROBE_STATS|Ethernet44"
1) "pck_loss_count"
2) "0"
3) "pck_expected_count"
4) "6"
5) "link_prober_unknown_start"
6) "2022-Jan-26 03:13:05.366900"
7) "link_prober_unknown_end"
8) "2022-Jan-26 03:17:35.446580"

Any platform specific information?

Documentation

Copy link
Contributor

@lolyu lolyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a method to reset the packet loss ratio other than restarting the linkmgrd?

@zjswhhh
Copy link
Contributor Author

zjswhhh commented Jan 6, 2022

Do we need a method to reset the packet loss ratio other than restarting the linkmgrd?

Good question. I think we do? Once the ping is back working, we can simply reset so that the ratio number is more meaningful.

@zjswhhh zjswhhh marked this pull request as draft January 6, 2022 20:49
lolyu and others added 8 commits January 24, 2022 08:44
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Summary:
Fixes # (issue)

This PR is to make linkmgrd subscribe events from ROUTE_TABLE in State DB, and react accordingly:  
- If any of the two default route state appears to be 'na', linkmgrd should switch to standby. 
- If both are 'ok', there will be a mux state probing and what happens next depends on linkmgrd state machine and the probing response.  

Type of change:
New feature

Motivation for this PR:
To make linkmgrd subscribe state DB route event, and handle the switchovers. 

Documentation: 
Related PR: sonic-net/sonic-swss#2009
@zjswhhh zjswhhh force-pushed the collectIcmpLossInfo branch from 4a2405c to f3b5966 Compare January 24, 2022 08:54
@zjswhhh zjswhhh marked this pull request as ready for review January 24, 2022 09:07
@zjswhhh zjswhhh force-pushed the collectIcmpLossInfo branch from 2121f60 to bd885ee Compare January 24, 2022 09:49
@zjswhhh zjswhhh force-pushed the collectIcmpLossInfo branch from 1559298 to 14aea6a Compare January 26, 2022 03:28
@zjswhhh zjswhhh requested review from yxieca and lolyu January 26, 2022 19:40
@zjswhhh zjswhhh merged commit bcd74b4 into sonic-net:master Feb 1, 2022
@zjswhhh zjswhhh deleted the collectIcmpLossInfo branch February 1, 2022 17:52
zjswhhh added a commit to sonic-net/sonic-utilities that referenced this pull request Feb 25, 2022
… ICMP packet loss data (#2046)

Stemming from sonic-net/sonic-linkmgrd#14
sign-off: Jing Zhang zhangjing@microsoft.com

#### What I did
Added support to retrieve and reset ICMP packet loss data in state db for muxcable.  

#### How I did it
Changes made in show/muxcable.py and config/muxcable.py

#### How to verify it
- Added unit tests. 
- Tested the command lines on testbeds. 
- Ran dualtor_io/test_link_failure.py. 

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
``` show muxcable pckloss <port_name>```
```
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet96 
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet96  pck_loss_count        10439
Ethernet96  pck_expected_count    11406
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet96  link_prober_unknown_start  2022-Jan-27 19:47:17.819699
Ethernet96  link_prober_unknown_end    2022-Jan-27 22:28:36.736928
```
```config muxcable pckloss reset <port_name>```
```
admin@str2-7050cx3-acs-07:~$ sudo config muxcable packetloss reset Ethernet96
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet96
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet96  pck_expected_count        0
Ethernet96  pck_loss_count            0
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet96  link_prober_unknown_start  2022-Jan-27 19:47:17.819699
Ethernet96  link_prober_unknown_end    2022-Jan-27 22:28:36.736928
```
```config muxcable pckloss reset all```
```
admin@str2-7050cx3-acs-07:~$ sudo config muxcable packetloss reset all
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet68
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet68  pck_loss_count            0
Ethernet68  pck_expected_count        3
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet68  link_prober_unknown_start  2022-Jan-27 19:47:17.702760
Ethernet68  link_prober_unknown_end    2022-Jan-27 22:28:36.756113

```
zjswhhh added a commit to zjswhhh/sonic-utilities that referenced this pull request Mar 4, 2022
… ICMP packet loss data (sonic-net#2046)

Stemming from sonic-net/sonic-linkmgrd#14
sign-off: Jing Zhang zhangjing@microsoft.com

Added support to retrieve and reset ICMP packet loss data in state db for muxcable.

Changes made in show/muxcable.py and config/muxcable.py

- Added unit tests.
- Tested the command lines on testbeds.
- Ran dualtor_io/test_link_failure.py.

``` show muxcable pckloss <port_name>```
```
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet96
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet96  pck_loss_count        10439
Ethernet96  pck_expected_count    11406
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet96  link_prober_unknown_start  2022-Jan-27 19:47:17.819699
Ethernet96  link_prober_unknown_end    2022-Jan-27 22:28:36.736928
```
```config muxcable pckloss reset <port_name>```
```
admin@str2-7050cx3-acs-07:~$ sudo config muxcable packetloss reset Ethernet96
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet96
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet96  pck_expected_count        0
Ethernet96  pck_loss_count            0
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet96  link_prober_unknown_start  2022-Jan-27 19:47:17.819699
Ethernet96  link_prober_unknown_end    2022-Jan-27 22:28:36.736928
```
```config muxcable pckloss reset all```
```
admin@str2-7050cx3-acs-07:~$ sudo config muxcable packetloss reset all
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet68
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet68  pck_loss_count            0
Ethernet68  pck_expected_count        3
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet68  link_prober_unknown_start  2022-Jan-27 19:47:17.702760
Ethernet68  link_prober_unknown_end    2022-Jan-27 22:28:36.756113

```
zjswhhh added a commit to sonic-net/sonic-utilities that referenced this pull request Mar 11, 2022
… ICMP packet loss data (#2046) (#2094)

Stemming from sonic-net/sonic-linkmgrd#14
sign-off: Jing Zhang zhangjing@microsoft.com

Added support to retrieve and reset ICMP packet loss data in state db for muxcable.

Changes made in show/muxcable.py and config/muxcable.py

- Added unit tests.
- Tested the command lines on testbeds.
- Ran dualtor_io/test_link_failure.py.

``` show muxcable pckloss <port_name>```
```
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet96
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet96  pck_loss_count        10439
Ethernet96  pck_expected_count    11406
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet96  link_prober_unknown_start  2022-Jan-27 19:47:17.819699
Ethernet96  link_prober_unknown_end    2022-Jan-27 22:28:36.736928
```
```config muxcable pckloss reset <port_name>```
```
admin@str2-7050cx3-acs-07:~$ sudo config muxcable packetloss reset Ethernet96
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet96
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet96  pck_expected_count        0
Ethernet96  pck_loss_count            0
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet96  link_prober_unknown_start  2022-Jan-27 19:47:17.819699
Ethernet96  link_prober_unknown_end    2022-Jan-27 22:28:36.736928
```
```config muxcable pckloss reset all```
```
admin@str2-7050cx3-acs-07:~$ sudo config muxcable packetloss reset all
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet68
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet68  pck_loss_count            0
Ethernet68  pck_expected_count        3
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet68  link_prober_unknown_start  2022-Jan-27 19:47:17.702760
Ethernet68  link_prober_unknown_end    2022-Jan-27 22:28:36.756113

```
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this pull request Aug 3, 2023
… ICMP packet loss data (#2046)

Stemming from sonic-net/sonic-linkmgrd#14
sign-off: Jing Zhang zhangjing@microsoft.com

#### What I did
Added support to retrieve and reset ICMP packet loss data in state db for muxcable.  

#### How I did it
Changes made in show/muxcable.py and config/muxcable.py

#### How to verify it
- Added unit tests. 
- Tested the command lines on testbeds. 
- Ran dualtor_io/test_link_failure.py. 

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
``` show muxcable pckloss <port_name>```
```
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet96 
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet96  pck_loss_count        10439
Ethernet96  pck_expected_count    11406
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet96  link_prober_unknown_start  2022-Jan-27 19:47:17.819699
Ethernet96  link_prober_unknown_end    2022-Jan-27 22:28:36.736928
```
```config muxcable pckloss reset <port_name>```
```
admin@str2-7050cx3-acs-07:~$ sudo config muxcable packetloss reset Ethernet96
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet96
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet96  pck_expected_count        0
Ethernet96  pck_loss_count            0
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet96  link_prober_unknown_start  2022-Jan-27 19:47:17.819699
Ethernet96  link_prober_unknown_end    2022-Jan-27 22:28:36.736928
```
```config muxcable pckloss reset all```
```
admin@str2-7050cx3-acs-07:~$ sudo config muxcable packetloss reset all
admin@str2-7050cx3-acs-07:~$ show muxcable packetloss Ethernet68
PORT        COUNT                 VALUE
----------  ------------------  -------
Ethernet68  pck_loss_count            0
Ethernet68  pck_expected_count        3
PORT        EVENT                      TIME
----------  -------------------------  ---------------------------
Ethernet68  link_prober_unknown_start  2022-Jan-27 19:47:17.702760
Ethernet68  link_prober_unknown_end    2022-Jan-27 22:28:36.756113

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants