LAG keepalive script to reduce lacp session wait during warm-reboot #2806

vaibhavhd · 2023-04-20T22:53:59Z

What I did

A new mechanism is added here to to reduce LAG flap issue during hitless upgrades.

Problem being solved:

During warm upgrades T0 goes down and with that wait time for LACP session starts.
If the waittime to refresh LACP session is > 90s then T1 initiates LAG teardown, and as a result dataplane impact is seen.
This script makes sure that LACPDUs are sent in the going down path continuously.

How time is saved w/ this mechanism:

The lacpsession wait period earlier used to start from when teamd container goes down.
New lacpsession wait period starts when kexec in current kernel is issued, and new kernel boots up.

Implementation:

When warm-reboot starts, capture LACPDUs sent from all LAG member ports.
For this allow 60s of prep + collection time.
Start sending LACPDUs w/ ~1s interval.
The last LACPDU is sent after all containers are down and kexec is issued.

Results:

Tested this on different platforms and images. Some results for time saved:

BRCM: 201811 -> 202012 --- 18s
BRCM: 202012 -> 202012 --- 20s
MLNX: 201911 -> 202205 --- 10s

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

vaibhavhd · 2023-04-20T22:58:42Z

@stepanblyschak please review. You might find this interesting, since you are also focused on reducing lacp session wait period during warm reboot. This is little hacky, but is taking advantage of the fact that packets can still be sent even if teamd and syncd are down, as the physical ports are still up, and sockets can still function.

It would help if you also test this in your setup and share the time saved in your tests.
This new mechanism saves ~10s for Nvidia platform where I tested.

scripts/lag_keepalive.py

stepanblyschak · 2023-04-26T11:05:30Z

@vaibhavhd This is an interesting finding. The original motivation of stopping teamd prior to syncd was to have an ability to send last LACPDU before syncd removes netdevs from kernel. Since, in your observation, kernel netdevs live until kexec, I believe this is no longer true. At least for Nvidia it is not true since - sonic-net/sonic-buildimage#2572. But I did not know if other vendor syncd stop removes netdevs or not. Can we swap the teamd, syncd shutdown order to achive similar results?

BTW, a bit more general question. I know you're using "control plane assistant" to reply on ARP/NDP while DUT undergoes warm-reboot. Have you considered the same for LACP?

vaibhavhd · 2023-04-26T22:24:25Z

@vaibhavhd This is an interesting finding. The original motivation of stopping teamd prior to syncd was to have an ability to send last LACPDU before syncd removes netdevs from kernel. Since, in your observation, kernel netdevs live until kexec, I believe this is no longer true. At least for Nvidia it is not true since - sonic-net/sonic-buildimage#2572. But I did not know if other vendor syncd stop removes netdevs or not. Can we swap the teamd, syncd shutdown order to achive similar results?

BTW, a bit more general question. I know you're using "control plane assistant" to reply on ARP/NDP while DUT undergoes warm-reboot. Have you considered the same for LACP?

@stepanblyschak , I tried 201911 image on 2700 and I still see that the last lacpdu is sent when kexec is issued. Can you re-evaluate on your end? I think even if teamd/syncd sequence is swapped, this hack remains helpful as it can send LACPDU even after teamd is down.

BTW, a bit more general question. I know you're using "control plane assistant" to reply on ARP/NDP while DUT undergoes warm-reboot. Have you considered the same for LACP?

Great question! I have been pondering on that idea as well for the last few weeks. It might be possible. However, not w/ the same vxlan tunnel that we use w/ CPA. As vxan tunnel is for vlan ports, and here we are talking about LAG ports which aren't part of a vlan. There are other complicated ways of getting that done that I haven't tried yet. This keepalive hack helped use the whole time of shutdown path.
But, if CPA like mechanism works it will prevent boot up path lacp-session stalling as well.

stepanblyschak

I've tested these changed on 202211 and observe 9-10 sec improvement on Nvidia. Thanks.

stepanblyschak · 2023-04-27T09:43:00Z

scripts/lag_keepalive.py

+from scapy.config import conf
+conf.ipv6_enabled = False
+from scapy.all import sendp, sniff
+from swsssdk import ConfigDBConnector


swsssdk is now getting deprecated. Use swsscommon.

FYI, on 202211:

admin@arc-switch1004:~$ python3 -c "from swsssdk import ConfigDBConnector" Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'swsssdk'

Addressed this, thanks for catching it.

stepanblyschak · 2023-04-27T10:00:45Z

scripts/fast-reboot

+debug "Starting lag_keepalive to send LACPDUs ..."
+timeout 300 python ${LAG_KEEPALIVE_SCRIPT} &
+# give the lag_keepalive script a chance to get ready (30s) and collect one lacpdu before going down (30s)
+sleep 60


Can we run the lag_keepalive.py script in the foreground and in the lag_keepalive.py script do fork() and run lag_keepalive() with a timeout? This way we only wait until LACPDUs are collected, probably much less than 60 sec.

@vaibhavhd this is critical issue to our 202211 release, can you take care for review and merge?

@stepanblyschak I did not prefer to run this on foreground to avoid any odd chance of getting hung by the called process.

Additionally, the wait here has to be minimum 30s to collect LACPDU in worst case. The additional 30s is just a buffer. In my observation some platforms are very slow and just importing from scapy.all import sendp, sniff takes around 10-15s.
This wait can be optimized (if possible) in future PRs. What do you think?

@vaibhavhd Ok, agree

…-reboot (#2828) Cherry pick of #2806 for 202012 branch A new mechanism is added here to to reduce LAG flap issue during hitless upgrades. Problem being solved: During warm upgrades T0 goes down and with that wait time for LACP session starts. If the waittime to refresh LACP session is > 90s then T1 initiates LAG teardown, and as a result dataplane impact is seen. This script makes sure that LACPDUs are sent in the going down path continuously. How time is saved w/ this mechanism: The lacpsession wait period earlier used to start from when teamd container goes down. New lacpsession wait period starts when kexec in current kernel is issued, and new kernel boots up. Implementation: When warm-reboot starts, capture LACPDUs sent from all LAG member ports. For this allow 60s of prep + collection time. Start sending LACPDUs w/ ~1s interval. The last LACPDU is sent after all containers are down and kexec is issued. Results: Tested this on different platforms and images. Some results for time saved: BRCM: 201811 -> 202012 --- 18s BRCM: 202012 -> 202012 --- 20s MLNX: 201911 -> 202205 --- 10s

moshemos · 2023-05-10T13:41:40Z

@StormLiangMS can you cherry pick to 202211?

…2806) A new mechanism is added here to to reduce LAG flap issue during hitless upgrades. Problem being solved: During warm upgrades T0 goes down and with that wait time for LACP session starts. If the waittime to refresh LACP session is > 90s then T1 initiates LAG teardown, and as a result dataplane impact is seen. This script makes sure that LACPDUs are sent in the going down path continuously. How time is saved w/ this mechanism: The lacpsession wait period earlier used to start from when teamd container goes down. New lacpsession wait period starts when kexec in current kernel is issued, and new kernel boots up. Implementation: When warm-reboot starts, capture LACPDUs sent from all LAG member ports. For this allow 60s of prep + collection time. Start sending LACPDUs w/ ~1s interval. The last LACPDU is sent after all containers are down and kexec is issued. Results: Tested this on different platforms and images. Some results for time saved: BRCM: 201811 -> 202012 --- 18s BRCM: 202012 -> 202012 --- 20s MLNX: 201911 -> 202205 --- 10s MLNX: 202205 -> 202205 --- 10s

…onic-net#2806) A new mechanism is added here to to reduce LAG flap issue during hitless upgrades. Problem being solved: During warm upgrades T0 goes down and with that wait time for LACP session starts. If the waittime to refresh LACP session is > 90s then T1 initiates LAG teardown, and as a result dataplane impact is seen. This script makes sure that LACPDUs are sent in the going down path continuously. How time is saved w/ this mechanism: The lacpsession wait period earlier used to start from when teamd container goes down. New lacpsession wait period starts when kexec in current kernel is issued, and new kernel boots up. Implementation: When warm-reboot starts, capture LACPDUs sent from all LAG member ports. For this allow 60s of prep + collection time. Start sending LACPDUs w/ ~1s interval. The last LACPDU is sent after all containers are down and kexec is issued. Results: Tested this on different platforms and images. Some results for time saved: BRCM: 201811 -> 202012 --- 18s BRCM: 202012 -> 202012 --- 20s MLNX: 201911 -> 202205 --- 10s MLNX: 202205 -> 202205 --- 10s

LAG keepalive script to reduce lacp session wait during warm upgrade

172f0c0

vaibhavhd added the Request for 202205 Branch label Apr 20, 2023

vaibhavhd requested review from saiarcot895 and yxieca April 20, 2023 22:53

saiarcot895 reviewed Apr 21, 2023

View reviewed changes

scripts/lag_keepalive.py Outdated Show resolved Hide resolved

vaibhavhd closed this Apr 26, 2023

vaibhavhd reopened this Apr 26, 2023

Fix the check for missed lags and corresponding error

882cb12

stepanblyschak reviewed Apr 27, 2023

View reviewed changes

moshemos added the Request for 202211 Branch label May 3, 2023

Fix incorrect import

eee5415

vaibhavhd requested a review from saiarcot895 May 4, 2023 16:22

saiarcot895 approved these changes May 4, 2023

View reviewed changes

yxieca approved these changes May 4, 2023

View reviewed changes

vaibhavhd merged commit 634ac77 into sonic-net:master May 4, 2023

vaibhavhd deleted the lag-keepalive branch May 4, 2023 18:35

vaibhavhd mentioned this pull request May 5, 2023

[202012] LAG keepalive script to reduce lacp session wait during warm-reboot #2828

Merged

StormLiangMS added the Included in 202211 Branch label May 11, 2023

yxieca added the Included in 202205 Branch label May 17, 2023

slutati1536 mentioned this pull request Jun 12, 2023

[Non-Functional ] [Warm-reboot] | degradation in the time period until dut is down after reboot sonic-net/sonic-buildimage#15438

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LAG keepalive script to reduce lacp session wait during warm-reboot #2806

LAG keepalive script to reduce lacp session wait during warm-reboot #2806

vaibhavhd commented Apr 20, 2023

vaibhavhd commented Apr 20, 2023

stepanblyschak commented Apr 26, 2023

vaibhavhd commented Apr 26, 2023 •

edited

Loading

stepanblyschak left a comment

stepanblyschak Apr 27, 2023

vaibhavhd May 4, 2023

stepanblyschak Apr 27, 2023

moshemos May 3, 2023

vaibhavhd May 4, 2023

stepanblyschak May 4, 2023

moshemos commented May 10, 2023 •

edited

Loading

LAG keepalive script to reduce lacp session wait during warm-reboot #2806

LAG keepalive script to reduce lacp session wait during warm-reboot #2806

Conversation

vaibhavhd commented Apr 20, 2023

What I did

Problem being solved:

Implementation:

Results:

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

vaibhavhd commented Apr 20, 2023

stepanblyschak commented Apr 26, 2023

vaibhavhd commented Apr 26, 2023 • edited Loading

stepanblyschak left a comment

Choose a reason for hiding this comment

stepanblyschak Apr 27, 2023

Choose a reason for hiding this comment

vaibhavhd May 4, 2023

Choose a reason for hiding this comment

stepanblyschak Apr 27, 2023

Choose a reason for hiding this comment

moshemos May 3, 2023

Choose a reason for hiding this comment

vaibhavhd May 4, 2023

Choose a reason for hiding this comment

stepanblyschak May 4, 2023

Choose a reason for hiding this comment

moshemos commented May 10, 2023 • edited Loading

vaibhavhd commented Apr 26, 2023 •

edited

Loading

moshemos commented May 10, 2023 •

edited

Loading