From 96ffa97191cb2ab0cd297c4155a4d0675ff4bf85 Mon Sep 17 00:00:00 2001 From: tianshangfei Date: Sun, 9 Oct 2022 00:32:07 +0800 Subject: [PATCH 1/5] teamd warm-restart in fast mode hld Signed-off-by: tianshangfei --- ...warm-restart_with_LACP_in_fast_mode_HLD.md | 110 ++++++++++++++++++ .../images/structure_of_teamd_container.svg | 4 + doc/lag/images/teamd_smooth_upgrade.svg | 4 + doc/lag/images/teamd_smooth_upgrade_flow.svg | 4 + ...eamd_smooth_upgrade_module_interaction.svg | 4 + 5 files changed, 126 insertions(+) create mode 100644 doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md create mode 100644 doc/lag/images/structure_of_teamd_container.svg create mode 100644 doc/lag/images/teamd_smooth_upgrade.svg create mode 100644 doc/lag/images/teamd_smooth_upgrade_flow.svg create mode 100644 doc/lag/images/teamd_smooth_upgrade_module_interaction.svg diff --git a/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md b/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md new file mode 100644 index 0000000000..5e580714ed --- /dev/null +++ b/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md @@ -0,0 +1,110 @@ +# Teamd warm-restart with LACP in fast mode HLD # + +## Table of Content + +### Revision + +### Scope + +This design supports teamd warm-restart in fast mode. + +### Definitions/Abbreviations + +NA + +### Overview + +We expect that the restart of teamd docker should not cause link flapping or any traffic loss. All lags at data plane should remain the same. But it's hard to implement in some scenarios. + +During teamd warm-restart, the control plane remains up for a maximum of 90 seconds in LACP slow mode. However, in LACP fast mode, the control plane can only remain up for 3 seconds. This is because LACPDUs are sent every second. LACP protocol considers a LAG to be down if three LACPDUs are not received. + +Teamd containers are not restarted that fast, so teamd warm-restart in LACP fast mode always results in lag down and the kernel LAG state in mess. However, in a data center, it is necessary to set LACP to fast mode to ensure faster link convergence and less traffic loss. + +Therefore, supporting teamd warm-restart in LACP fast mode is very important. With this feature, we can support teamd bug hotfix and smooth upgrades. + +This design supports teamd warm-restart in fast mode by switching between active and standby teamd container. + +### Requirements + +Support warm-restart teamd in LACP fast mode + +LACP protocol is not modified, Lacp interaction is not affected + +### Limitations + +This design does not support the warm-reboot process. The reason is that during warm-reboot process, the kernel has been reset and the local end cannot continue to interact with the peer, which will be aware of the warm-reboot process. + +During the teamd warm-restart process, no modification of the teamd-related configuration is allowed. + +### Architecture Design + +NA + +### High-Level Design +During the teamd warm-restart process, a new Teamd container is created, and the old and new Teamd containers need to be fully synchronized before the old Teamd container is killed. After the warm-restart action, only the new teamd is run. + +![teamd smooth update](/doc/lag/images/teamd_smooth_upgrade.svg) + +Teamd container contains multiple processes, such as teamd, teammgrd, teamdctl, teamsyncd etc. teamd process can send and receive LACPDUs with the peer through the port. teamd can update kernel module (team.ko) status via netlink. teamsyncd can receive netlink events and convert them to ASIC as configuration. + +![ teamd structure](/doc/lag/images/structure_of_teamd_container.svg) + +During the teamd warm-restart process, the modules interact as follows +![teamd smooth update module interaction](/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg) + + + +The flow of teamd warm-restart +![The flow of teamd smooth upgrade](/doc/lag/images/teamd_smooth_upgrade_flow.svg) + +The flow of sonic_installer or warm_restart is : +1. rename teamd container to teamd_bak container +2. create new teamd container +3. wait for teamd process ok in new teamd container +4. use SIGUSR1 to stop old teamd process.(teamd and teamd_bak container can share files so that the record files of lacpdu can be passed from the old container to the new one ) +5. delete teamd_bak container +6. use SIGUSR2 to apply data with new teamd process + +The flow of teamd process is: +1. If teamd start with the warm-restart flag, lacpdu is not sent and the parameters of the kernel module are not changed, but inconsistent data with kernel parameters need to be recorded in memory. + +2. When SIGUSR2 is received, compare the data and set it to the kernel and ASIC, start sending lacpdu, and update the kernel parameters in real time when the lacp status changes + +3. When SIGUSR1 is received, the old design is reused, LACPDU record file is generated, and exit + +### SAI API + +NA + +### Configuration and management + +NA + +#### Manifest (if the feature is an Application Extension) + +NA + +#### CLI/YANG model Enhancements + +NA + +#### Config DB Enhancements + +NA + +### Warmboot and Fastboot Design Impact + +NA + +### Restrictions/Limitations + +### Testing Requirements/Design +NA + +#### Unit Test cases + +#### System Test cases + +### Open/Action items - if any + +NA \ No newline at end of file diff --git a/doc/lag/images/structure_of_teamd_container.svg b/doc/lag/images/structure_of_teamd_container.svg new file mode 100644 index 0000000000..0e8309f377 --- /dev/null +++ b/doc/lag/images/structure_of_teamd_container.svg @@ -0,0 +1,4 @@ + + + +
Kernel
Kernel
teamd container
teamd container
ASIC
ASIC
netlink
netlink
teamd process
teamd process
lacpdu
lacpdu
teamd
teamd
teammgrd
teammgrd
ASIC configuration
ASIC configuration
teamsyncd
teamsyncd
teamdctl
teamdctl
netlink event
netlink event
team.ko
team.ko
port
port
lacpdu
lacpdu
netlink
netlink
ASIC configuration
ASIC configuration
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/teamd_smooth_upgrade.svg b/doc/lag/images/teamd_smooth_upgrade.svg new file mode 100644 index 0000000000..d5b3d3a8d5 --- /dev/null +++ b/doc/lag/images/teamd_smooth_upgrade.svg @@ -0,0 +1,4 @@ + + + +
In-between state
Role switching state
In-between state...
In-between state
Waiting for ready state
In-between state...
teamd container
teamd container
teamd container
(OLD)
teamd container...
teamd container
(NEW)
teamd container...
teamd container
(OLD)
teamd container...
teamd container
(NEW)
teamd container...
teamd container
(NEW)
teamd container...
preparation stage
backup old teamd docker image
upload new teamd docker image
sonic_installer update docker teamd 
preparation stage...
Waiting for new teamd container ready
Both old and new teamd containers exist,
 only the old one reprocessing requests
Waiting for new teamd container ready...
SIGUSR1
SIGUSR1
SIGUSR2
SIGUSR2
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
sonic_installer /
warm_restart teamd
sonic_installer /...
teamd container
teamd container
teamd container
(OLD)
teamd container...
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
Role switch state
The old teamd container is terminated with SIGUSR1
The new teamd container is enabled with SIGUSR2
Role switch state...
Complete smooth upgrade
Complete smooth upgrade
lag
lag
lag
lag
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/teamd_smooth_upgrade_flow.svg b/doc/lag/images/teamd_smooth_upgrade_flow.svg new file mode 100644 index 0000000000..46f17137fa --- /dev/null +++ b/doc/lag/images/teamd_smooth_upgrade_flow.svg @@ -0,0 +1,4 @@ + + + +
teamd
teamd
record lacpdu to file
record lacpdu to file
teammgrd
teammgrd
teamsyncd
teamsyncd
teamd_bak container
teamd_bak container
teamd
teamd
read lacpdu file
read lacpdu file
lacpdu_process
lacpdu_process
apply data
apply data
sonic_installer/
warm_restart
sonic_installer/...
teammgrd
teammgrd
teamsyncd
teamsyncd
warm_restart
sync to APPL_DB
warm_restart...
teamdctl test teamd ready
teamdctl test teamd ready
lacp
protocal
lacp...
lacpdu
lacpdu
lacpdu
lacpdu
port info
port info
LACP partner
LACP partner
Lacpdu is not received
 for more than 3 packets
 and lags down
Lacpdu is not received...
team_refresh()
sync with kernel
team_refresh()...
teamd container
teamd container
create process and setwarm_restart flag
/usr/bin/teamd -w -o -t ...
create process and setwarm_restart flag...
kernel
kernel
port link change
port link change
kill -USR2 teamd 
apply data
kill -USR2 teamd...
teamsync
70 seconds
teamsync...
kill -USR1teamd 
stop teamd
kill -USR1teamd...
rename teamd to teamd_bak
rename teamd to teamd_bak

docker rm -f teamd_bak
docker rm...
systemctl restart teamd
creat new teamd
systemctl restart teamd...
LACP actor
LACP actor
port info
port info
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg b/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg new file mode 100644 index 0000000000..95c6f50a19 --- /dev/null +++ b/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg @@ -0,0 +1,4 @@ + + + +
local node
local node
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
master
teamd container
master...
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
lacpdu
lacpdu
netlink
netlink
ASIC configuration
ASIC configuration
netlink event
netlink event
local node
local node
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
master
teamd container(old-teamd)
master...
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
netlink event
netlink event
backup
teamd container(teamd)
backup...
ASIC
configuration
ASIC...
X
X
lacpdu
lacpdu
netlink change
netlink change
X
X
netlink event
netlink event
local node
local node
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
master
teamd container(old-teamd)
master...
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
netlink event
netlink event
backup
teamd container(teamd)
backup...
netlink event
netlink event
local node
local node
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
new master
teamd container(teamd)
new master...
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
netlink event
netlink event
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change

Phase  1

Waiting for ready state
Phase  1...

Phase  2

Role change state
Phase  2...
SIGUSR1
SIGUSR1
SIGUSR2
SIGUSR2

Phase  3

Complete smooth upgrage
Phase  3...
Text is not SVG - cannot display
\ No newline at end of file From ea669fa3975604d28a99d218cd0085a7addeb46a Mon Sep 17 00:00:00 2001 From: tianshangfei Date: Mon, 10 Oct 2022 21:43:42 +0800 Subject: [PATCH 2/5] update some detailes Signed-off-by: tianshangfei --- ...warm-restart_with_LACP_in_fast_mode_HLD.md | 66 ++++++++++--------- doc/lag/images/teamd_smooth_upgrade.svg | 2 +- doc/lag/images/teamd_smooth_upgrade_flow.svg | 2 +- ...eamd_smooth_upgrade_module_interaction.svg | 2 +- ...amd_smooth_upgrade_module_interaction1.svg | 4 ++ ...amd_smooth_upgrade_module_interaction2.svg | 4 ++ doc/lag/images/upgrade_scenarios.svg | 4 ++ 7 files changed, 51 insertions(+), 33 deletions(-) create mode 100644 doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg create mode 100644 doc/lag/images/teamd_smooth_upgrade_module_interaction2.svg create mode 100644 doc/lag/images/upgrade_scenarios.svg diff --git a/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md b/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md index 5e580714ed..5eb47250a4 100644 --- a/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md +++ b/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md @@ -6,27 +6,26 @@ ### Scope -This design supports teamd warm-restart in fast mode. - -### Definitions/Abbreviations - -NA +Teamd warm-restart with LACP in slow mode is already supported in the community version. This design is an enhancement to support teamd warm-restart with LACP in fast mode. +This design is unable to handle teamd unplanned restart, That is to say, If the teamd container is restarted by restarting the teamd service, The peer side is affected, resulting in link flapping and traffic loss. ### Overview -We expect that the restart of teamd docker should not cause link flapping or any traffic loss. All lags at data plane should remain the same. But it's hard to implement in some scenarios. +We expect the restart of teamd docker should not cause link flapping or any traffic loss. All lags at data plane should remain the same. But it's hard to implement in some scenarios. During teamd warm-restart, the control plane remains up for a maximum of 90 seconds in LACP slow mode. However, in LACP fast mode, the control plane can only remain up for 3 seconds. This is because LACPDUs are sent every second. LACP protocol considers a LAG to be down if three LACPDUs are not received. Teamd containers are not restarted that fast, so teamd warm-restart in LACP fast mode always results in lag down and the kernel LAG state in mess. However, in a data center, it is necessary to set LACP to fast mode to ensure faster link convergence and less traffic loss. -Therefore, supporting teamd warm-restart in LACP fast mode is very important. With this feature, we can support teamd bug hotfix and smooth upgrades. +Let's take a look at the scenario shown in the figure below, there are multiple LAGs between switch_a and switch 1-n and LACP in fast mode, We can only control switch_a, other devices belong to other organizations and are not under our control. we want to upgrade the teamd container of switch_a without causing link flapping on other devices. Therefore, supporting teamd warm-restart in LACP fast mode is very important. With this feature, we can support teamd bug hotfixes and smooth upgrades. + +![teamd upgread scenarios](/doc/lag/images/upgrade_scenarios.svg) -This design supports teamd warm-restart in fast mode by switching between active and standby teamd container. +This design supports teamd warm-restart by switching between active and standby teamd container. ### Requirements -Support warm-restart teamd in LACP fast mode +Support warm-restart teamd in LACP fast/slow mode LACP protocol is not modified, Lacp interaction is not affected @@ -38,40 +37,46 @@ During the teamd warm-restart process, no modification of the teamd-related conf ### Architecture Design -NA +This feature does not change the existing SONiC architecture. ### High-Level Design -During the teamd warm-restart process, a new Teamd container is created, and the old and new Teamd containers need to be fully synchronized before the old Teamd container is killed. After the warm-restart action, only the new teamd is run. -![teamd smooth update](/doc/lag/images/teamd_smooth_upgrade.svg) +During the teamd warm-restart process, a new Teamd container is created, and the old and new teamd containers need to be fully synchronized before the old teamd container is killed. After the sonic_installer upgrade_docker teamd action, only the new teamd container is left in the system. -Teamd container contains multiple processes, such as teamd, teammgrd, teamdctl, teamsyncd etc. teamd process can send and receive LACPDUs with the peer through the port. teamd can update kernel module (team.ko) status via netlink. teamsyncd can receive netlink events and convert them to ASIC as configuration. +The process is divided into the following stages: +stage 1: Waiting for ready state. We need to wait for all LAG processes to be created in the new teamd container and wait for processes to synchronize with the kernel module. +stage 2: Role change state. The old LAG processes are terminated with SIGUSR1 and the new LAG processes are enabled with SIGUSR2. +stage 3: Warm-restart finish state. The old teamd container will exit, and only the new teamd container is left. -![ teamd structure](/doc/lag/images/structure_of_teamd_container.svg) +![teamd smooth update](/doc/lag/images/teamd_smooth_upgrade.svg) -During the teamd warm-restart process, the modules interact as follows -![teamd smooth update module interaction](/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg) +The Teamd container contains multiple processes, such as teamd, teammgrd, teamdctl, teamsyncd, etc. teamd process can send and receive LACPDUs with the peer through the port. teamd can update the kernel module (team.ko) status via Netlink. teamsyncd can receive Netlink events and convert them to ASIC as configuration through SWSS. +![ teamd structure](/doc/lag/images/structure_of_teamd_container.svg) +During the teamd warm-restart process, the module interactions change as follows: +stage 1: Waiting for ready state. The new teamd only receives data, and the interaction with the teamd container is one-way. The new teamd container does not modify the parameters of the kernel module and ASIC -The flow of teamd warm-restart -![The flow of teamd smooth upgrade](/doc/lag/images/teamd_smooth_upgrade_flow.svg) +![teamd smooth update module interaction](/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg) -The flow of sonic_installer or warm_restart is : -1. rename teamd container to teamd_bak container -2. create new teamd container -3. wait for teamd process ok in new teamd container -4. use SIGUSR1 to stop old teamd process.(teamd and teamd_bak container can share files so that the record files of lacpdu can be passed from the old container to the new one ) -5. delete teamd_bak container -6. use SIGUSR2 to apply data with new teamd process +stage 2: Role change state. Old teamd processes receive SIGUSR1 and send the last LACPDU to the partner, Interacting with the new teamd container will become two-way. The new teamd container start sending LACPDUs, and start modifying the parameters of the kernel module and ASIC +![teamd smooth update module interaction](/doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg) -The flow of teamd process is: -1. If teamd start with the warm-restart flag, lacpdu is not sent and the parameters of the kernel module are not changed, but inconsistent data with kernel parameters need to be recorded in memory. +stage 3: Warm-restart finish state. The old teamd container exits without setting parameters for the kernel or ASIC. +![teamd smooth update module interaction](/doc/lag/images/teamd_smooth_upgrade_module_interaction2.svg) -2. When SIGUSR2 is received, compare the data and set it to the kernel and ASIC, start sending lacpdu, and update the kernel parameters in real time when the lacp status changes -3. When SIGUSR1 is received, the old design is reused, LACPDU record file is generated, and exit +The flow of the teamd warm-restart process is as follows: +* a1. rename teamd container to teamd_bak container +* a2. create a new teamd container. (teamd flow: If teamd starts with the warm-restart flag, LACPDU is not sent and the parameters of the kernel module are not changed, but inconsistent data with kernel parameters need to be recorded in memory.) +* a3. wait for the teamd processes to ok in the new teamd container. Run teamdctl to get the new teamd status. +* b1. use SIGUSR1 to stop the old teamd processes. teamd and teamd_bak containers can share files so that the record files of LACPDU can be passed from the old container to the new one. (teamd flow: When SIGUSR1 is received, the old design is reused, the LACPDU record file is generated, and exit) +* b2. wait for SIGUSR1 processing, which must less than 3 seconds. Otherwise, LAG considers down. +* b3. use SIGUSR2 to apply data with the new teamd processes. (teamd flow: When SIGUSR2 is received, We will compare the data with the kernel and set the new data to the kernel and ASIC, start sending LACPDU, and update the kernel parameters in real-time when the LACP status changes) +* c1. delete teamd_bak container +![The flow of teamd smooth upgrade](/doc/lag/images/teamd_smooth_upgrade_flow.svg) +The process of sonic_installer rollback_docker is the same as that for warm-restart above. ### SAI API NA @@ -99,7 +104,8 @@ NA ### Restrictions/Limitations ### Testing Requirements/Design -NA + +Same as regular LAG testbed. LAG Configures LACP in fast mode, run sonic_installer upgrade_docker --warm teamd docker-teamd.gz -y on DUT will not cause link flapping or any traffic loss. #### Unit Test cases diff --git a/doc/lag/images/teamd_smooth_upgrade.svg b/doc/lag/images/teamd_smooth_upgrade.svg index d5b3d3a8d5..e339cf5efc 100644 --- a/doc/lag/images/teamd_smooth_upgrade.svg +++ b/doc/lag/images/teamd_smooth_upgrade.svg @@ -1,4 +1,4 @@ -
In-between state
Role switching state
In-between state...
In-between state
Waiting for ready state
In-between state...
teamd container
teamd container
teamd container
(OLD)
teamd container...
teamd container
(NEW)
teamd container...
teamd container
(OLD)
teamd container...
teamd container
(NEW)
teamd container...
teamd container
(NEW)
teamd container...
preparation stage
backup old teamd docker image
upload new teamd docker image
sonic_installer update docker teamd 
preparation stage...
Waiting for new teamd container ready
Both old and new teamd containers exist,
 only the old one reprocessing requests
Waiting for new teamd container ready...
SIGUSR1
SIGUSR1
SIGUSR2
SIGUSR2
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
sonic_installer /
warm_restart teamd
sonic_installer /...
teamd container
teamd container
teamd container
(OLD)
teamd container...
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
Role switch state
The old teamd container is terminated with SIGUSR1
The new teamd container is enabled with SIGUSR2
Role switch state...
Complete smooth upgrade
Complete smooth upgrade
lag
lag
lag
lag
Text is not SVG - cannot display
\ No newline at end of file +
Stage 3  Warm-restart finish state
Stage 3  Warm-restart finish state
Stage 2  Role change state
Stage 2  Role change state
teamd container
teamd container
teamd container
(OLD)
teamd container...
teamd container
(NEW)
teamd container...
teamd container
(OLD)
teamd container...
teamd container
(NEW)
teamd container...
teamd container
(NEW)
teamd container...
SIGUSR1
SIGUSR1
SIGUSR2
SIGUSR2
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
sonic_installer 
sonic_installer 
teamd container
teamd container
teamd container
(OLD)
teamd container...
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
Before teamd warm-restart
Before teamd warm-restart
Statge 1 Waiting for ready state
Statge 1 Waiting for ready state
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/teamd_smooth_upgrade_flow.svg b/doc/lag/images/teamd_smooth_upgrade_flow.svg index 46f17137fa..d0a6912424 100644 --- a/doc/lag/images/teamd_smooth_upgrade_flow.svg +++ b/doc/lag/images/teamd_smooth_upgrade_flow.svg @@ -1,4 +1,4 @@ -
teamd
teamd
record lacpdu to file
record lacpdu to file
teammgrd
teammgrd
teamsyncd
teamsyncd
teamd_bak container
teamd_bak container
teamd
teamd
read lacpdu file
read lacpdu file
lacpdu_process
lacpdu_process
apply data
apply data
sonic_installer/
warm_restart
sonic_installer/...
teammgrd
teammgrd
teamsyncd
teamsyncd
warm_restart
sync to APPL_DB
warm_restart...
teamdctl test teamd ready
teamdctl test teamd ready
lacp
protocal
lacp...
lacpdu
lacpdu
lacpdu
lacpdu
port info
port info
LACP partner
LACP partner
Lacpdu is not received
 for more than 3 packets
 and lags down
Lacpdu is not received...
team_refresh()
sync with kernel
team_refresh()...
teamd container
teamd container
create process and setwarm_restart flag
/usr/bin/teamd -w -o -t ...
create process and setwarm_restart flag...
kernel
kernel
port link change
port link change
kill -USR2 teamd 
apply data
kill -USR2 teamd...
teamsync
70 seconds
teamsync...
kill -USR1teamd 
stop teamd
kill -USR1teamd...
rename teamd to teamd_bak
rename teamd to teamd_bak

docker rm -f teamd_bak
docker rm...
systemctl restart teamd
creat new teamd
systemctl restart teamd...
LACP actor
LACP actor
port info
port info
Text is not SVG - cannot display
\ No newline at end of file +
teamd
teamd
record lacpdu to file
record lacpdu to file
teammgrd
teammgrd
teamsyncd
teamsyncd
teamd_bak container
teamd_bak container
teamd
teamd
apply the data
apply the data
lacpdu_process
lacpdu_process
sonic_installer
sonic_installer
b2. wait for SIGUSR1 processing 
b2. wait for SIGUSR1 processing 
teammgrd
teammgrd
teamsyncd
teamsyncd
warm_restart
sync to APPL_DB
warm_restart...

sync with kernel
team_refresh()
sync with kernel...
a3.wait for teamd ready
a3.wait for teamd ready
a2. create new teamd container
a2. create new teamd container
lacp
protocal
lacp...
lacpdu
lacpdu
LACP partner
LACP partner
The LAG considered down,
If more three LACPDUs
 are not received
The LAG considered down,...
teamd container
teamd container
create process and setwarm_restart flag
/usr/bin/teamd -w -o -t ...
create process and setwarm_restart flag...
kernel
kernel
port link change
port link change
b3. send SIGUSR2
b3. send SIGUSR2
b1. send SIGUSR1
b1. send SIGUSR1
a1.rename teamd to teamd_bak
a1.rename teamd to teamd_bak
LACP actor
LACP actor
Stage 1 
Waiting for read state
Stage 1...
c1 stop teamd_bak
c1 stop t...
Stage 2 
Role change state
Stage 2...
Stage 3 
warm-restart finish state
Stage 3...
lacpdu
lacpdu
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg b/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg index 95c6f50a19..3a6c7e10e8 100644 --- a/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg +++ b/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg @@ -1,4 +1,4 @@ -
local node
local node
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
master
teamd container
master...
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
lacpdu
lacpdu
netlink
netlink
ASIC configuration
ASIC configuration
netlink event
netlink event
local node
local node
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
master
teamd container(old-teamd)
master...
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
netlink event
netlink event
backup
teamd container(teamd)
backup...
ASIC
configuration
ASIC...
X
X
lacpdu
lacpdu
netlink change
netlink change
X
X
netlink event
netlink event
local node
local node
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
master
teamd container(old-teamd)
master...
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
netlink event
netlink event
backup
teamd container(teamd)
backup...
netlink event
netlink event
local node
local node
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
new master
teamd container(teamd)
new master...
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
netlink event
netlink event
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change

Phase  1

Waiting for ready state
Phase  1...

Phase  2

Role change state
Phase  2...
SIGUSR1
SIGUSR1
SIGUSR2
SIGUSR2

Phase  3

Complete smooth upgrage
Phase  3...
Text is not SVG - cannot display
\ No newline at end of file +
local node
local node
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
old
teamd container(teamd_bak)
old...
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
netlink event
netlink event
new
teamd container(teamd)
new...
ASIC
configuration
ASIC...
X
X
lacpdu
lacpdu
netlink change
netlink change
X
X
netlink event
netlink event

Stage 1 Waiting for ready state

Stage 1 Waiting for ready sta...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg b/doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg new file mode 100644 index 0000000000..c420cb4b66 --- /dev/null +++ b/doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg @@ -0,0 +1,4 @@ + + + +
local node
local node
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
old
teamd container(teamd_bak)
old...
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
netlink event
netlink event
new
teamd container(teamd)
new...
netlink event
netlink event
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change

Stage 2 Role change state

Stage 2 Role change state
SIGUSR1
SIGUSR1
SIGUSR2
SIGUSR2
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/teamd_smooth_upgrade_module_interaction2.svg b/doc/lag/images/teamd_smooth_upgrade_module_interaction2.svg new file mode 100644 index 0000000000..88f9831ec8 --- /dev/null +++ b/doc/lag/images/teamd_smooth_upgrade_module_interaction2.svg @@ -0,0 +1,4 @@ + + + +
local node
local node
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
new
teamd container(teamd)
new...
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
netlink event
netlink event

Stage 3 Warm-restart finish state

Stage 3 Warm-restart finish sta...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/upgrade_scenarios.svg b/doc/lag/images/upgrade_scenarios.svg new file mode 100644 index 0000000000..8aded9d7a8 --- /dev/null +++ b/doc/lag/images/upgrade_scenarios.svg @@ -0,0 +1,4 @@ + + + +
...
...
switch a 
switch a 
switch 1
switch 1
switch 2
switch 2
switch n
switch n
teamd
teamd
Text is not SVG - cannot display
\ No newline at end of file From 5f1326fc5feb2f6f2aab77806f99159acb8f8daf Mon Sep 17 00:00:00 2001 From: tianshangfei Date: Tue, 11 Oct 2022 16:44:48 +0800 Subject: [PATCH 3/5] update Signed-off-by: tianshangfei --- ...warm-restart_with_LACP_in_fast_mode_HLD.md | 27 +++++++++++-------- doc/lag/images/teamd_smooth_upgrade_flow.svg | 2 +- doc/lag/images/upgrade_scenarios.svg | 2 +- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md b/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md index 5eb47250a4..3e82572cce 100644 --- a/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md +++ b/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md @@ -6,16 +6,15 @@ ### Scope -Teamd warm-restart with LACP in slow mode is already supported in the community version. This design is an enhancement to support teamd warm-restart with LACP in fast mode. -This design is unable to handle teamd unplanned restart, That is to say, If the teamd container is restarted by restarting the teamd service, The peer side is affected, resulting in link flapping and traffic loss. - +Teamd warm-restart with LACP in slow mode is already supported as described in [SONiC_Warmboot](/doc/warm-reboot/system-warmboot.md). This design is an enhancement to support teamd warm-restart with LACP in fast mode. +This design is not support teamd container unplanned restart. ### Overview We expect the restart of teamd docker should not cause link flapping or any traffic loss. All lags at data plane should remain the same. But it's hard to implement in some scenarios. During teamd warm-restart, the control plane remains up for a maximum of 90 seconds in LACP slow mode. However, in LACP fast mode, the control plane can only remain up for 3 seconds. This is because LACPDUs are sent every second. LACP protocol considers a LAG to be down if three LACPDUs are not received. -Teamd containers are not restarted that fast, so teamd warm-restart in LACP fast mode always results in lag down and the kernel LAG state in mess. However, in a data center, it is necessary to set LACP to fast mode to ensure faster link convergence and less traffic loss. +Teamd container is not restarted that fast, so teamd warm-restart in LACP fast mode always results in LAG down and the kernel LAG state in mess. However, in a data center, it is necessary to set LACP to fast mode to ensure faster link convergence and less traffic loss. Let's take a look at the scenario shown in the figure below, there are multiple LAGs between switch_a and switch 1-n and LACP in fast mode, We can only control switch_a, other devices belong to other organizations and are not under our control. we want to upgrade the teamd container of switch_a without causing link flapping on other devices. Therefore, supporting teamd warm-restart in LACP fast mode is very important. With this feature, we can support teamd bug hotfixes and smooth upgrades. @@ -27,7 +26,7 @@ This design supports teamd warm-restart by switching between active and standby Support warm-restart teamd in LACP fast/slow mode -LACP protocol is not modified, Lacp interaction is not affected +LACP protocol is not modified, LACP interaction is not affected ### Limitations @@ -44,9 +43,12 @@ This feature does not change the existing SONiC architecture. During the teamd warm-restart process, a new Teamd container is created, and the old and new teamd containers need to be fully synchronized before the old teamd container is killed. After the sonic_installer upgrade_docker teamd action, only the new teamd container is left in the system. The process is divided into the following stages: -stage 1: Waiting for ready state. We need to wait for all LAG processes to be created in the new teamd container and wait for processes to synchronize with the kernel module. -stage 2: Role change state. The old LAG processes are terminated with SIGUSR1 and the new LAG processes are enabled with SIGUSR2. -stage 3: Warm-restart finish state. The old teamd container will exit, and only the new teamd container is left. + +* stage 1: Waiting for ready state. We need to wait for all LAG processes to be created in the new teamd container and wait for processes to synchronize with the kernel module. + +* stage 2: Role change state. The old LAG processes are terminated with SIGUSR1 and the new LAG processes are enabled with SIGUSR2. + +* stage 3: Warm-restart finish state. The old teamd container will exit, and only the new teamd container is left. ![teamd smooth update](/doc/lag/images/teamd_smooth_upgrade.svg) @@ -55,14 +57,17 @@ The Teamd container contains multiple processes, such as teamd, teammgrd, teamdc ![ teamd structure](/doc/lag/images/structure_of_teamd_container.svg) During the teamd warm-restart process, the module interactions change as follows: -stage 1: Waiting for ready state. The new teamd only receives data, and the interaction with the teamd container is one-way. The new teamd container does not modify the parameters of the kernel module and ASIC + +* stage 1: Waiting for ready state. The new teamd only receives data, and the interaction with the teamd container is one-way. The new teamd container does not modify the parameters of the kernel module and ASIC ![teamd smooth update module interaction](/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg) -stage 2: Role change state. Old teamd processes receive SIGUSR1 and send the last LACPDU to the partner, Interacting with the new teamd container will become two-way. The new teamd container start sending LACPDUs, and start modifying the parameters of the kernel module and ASIC +* stage 2: Role change state. Old teamd processes receive SIGUSR1 and send the last LACPDU to the partner, Interacting with the new teamd container will become two-way after receive SIGUSR2. The new teamd container start sending LACPDUs, and start modifying the parameters of the kernel module and ASIC + ![teamd smooth update module interaction](/doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg) -stage 3: Warm-restart finish state. The old teamd container exits without setting parameters for the kernel or ASIC. +* stage 3: Warm-restart finish state. The old teamd container exits without setting parameters for the kernel or ASIC. + ![teamd smooth update module interaction](/doc/lag/images/teamd_smooth_upgrade_module_interaction2.svg) diff --git a/doc/lag/images/teamd_smooth_upgrade_flow.svg b/doc/lag/images/teamd_smooth_upgrade_flow.svg index d0a6912424..b340d251e3 100644 --- a/doc/lag/images/teamd_smooth_upgrade_flow.svg +++ b/doc/lag/images/teamd_smooth_upgrade_flow.svg @@ -1,4 +1,4 @@ -
teamd
teamd
record lacpdu to file
record lacpdu to file
teammgrd
teammgrd
teamsyncd
teamsyncd
teamd_bak container
teamd_bak container
teamd
teamd
apply the data
apply the data
lacpdu_process
lacpdu_process
sonic_installer
sonic_installer
b2. wait for SIGUSR1 processing 
b2. wait for SIGUSR1 processing 
teammgrd
teammgrd
teamsyncd
teamsyncd
warm_restart
sync to APPL_DB
warm_restart...

sync with kernel
team_refresh()
sync with kernel...
a3.wait for teamd ready
a3.wait for teamd ready
a2. create new teamd container
a2. create new teamd container
lacp
protocal
lacp...
lacpdu
lacpdu
LACP partner
LACP partner
The LAG considered down,
If more three LACPDUs
 are not received
The LAG considered down,...
teamd container
teamd container
create process and setwarm_restart flag
/usr/bin/teamd -w -o -t ...
create process and setwarm_restart flag...
kernel
kernel
port link change
port link change
b3. send SIGUSR2
b3. send SIGUSR2
b1. send SIGUSR1
b1. send SIGUSR1
a1.rename teamd to teamd_bak
a1.rename teamd to teamd_bak
LACP actor
LACP actor
Stage 1 
Waiting for read state
Stage 1...
c1 stop teamd_bak
c1 stop t...
Stage 2 
Role change state
Stage 2...
Stage 3 
warm-restart finish state
Stage 3...
lacpdu
lacpdu
Text is not SVG - cannot display
\ No newline at end of file +
teamd
teamd
record LACPDU to file
record LACPDU to file
teammgrd
teammgrd
teamsyncd
teamsyncd
teamd_bak container
teamd_bak container
teamd
teamd
apply the data
apply the data
lacpdu_process
lacpdu_process
sonic_installer
sonic_installer
b2. wait for SIGUSR1 processing 
b2. wait for SIGUSR1 processing 
teammgrd
teammgrd
teamsyncd
teamsyncd
warm_restart
sync to APPL_DB
warm_restart...

sync with kernel
team_refresh()
sync with kernel...
a3.wait for teamd ready
a3.wait for teamd ready
a2. create new teamd container
a2. create new teamd container
lacp
protocal
lacp...
LACPDU
LACPDU
LACP partner
LACP partner
The LAG considered down,
If more three LACPDUs
 are not received
The LAG considered down,...
teamd container
teamd container
create process and setwarm_restart flag
/usr/bin/teamd -w -o -t ...
create process and setwarm_restart flag...
kernel
kernel
port link change
port link change
b3. send SIGUSR2
b3. send SIGUSR2
b1. send SIGUSR1
b1. send SIGUSR1
a1.rename teamd to teamd_bak
a1.rename teamd to teamd_bak
LACP actor
LACP actor
Stage 1 
Waiting for read state
Stage 1...
c1 stop teamd_bak
c1 stop t...
Stage 2 
Role change state
Stage 2...
Stage 3 
warm-restart finish state
Stage 3...
lacpdu
lacpdu
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/upgrade_scenarios.svg b/doc/lag/images/upgrade_scenarios.svg index 8aded9d7a8..837e32da1b 100644 --- a/doc/lag/images/upgrade_scenarios.svg +++ b/doc/lag/images/upgrade_scenarios.svg @@ -1,4 +1,4 @@ -
...
...
switch a 
switch a 
switch 1
switch 1
switch 2
switch 2
switch n
switch n
teamd
teamd
Text is not SVG - cannot display
\ No newline at end of file +
...
...
switch a 
switch a 
switch 1
switch 1
switch 2
switch 2
switch n
switch n
teamd
teamd
Text is not SVG - cannot display
\ No newline at end of file From ff0b7da4a71ff0101dae5b207c3ca46f90bf77e4 Mon Sep 17 00:00:00 2001 From: tianshangfei Date: Tue, 11 Oct 2022 20:00:22 +0800 Subject: [PATCH 4/5] update Signed-off-by: tianshangfei --- doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md b/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md index 3e82572cce..af5220f093 100644 --- a/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md +++ b/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md @@ -7,7 +7,9 @@ ### Scope Teamd warm-restart with LACP in slow mode is already supported as described in [SONiC_Warmboot](/doc/warm-reboot/system-warmboot.md). This design is an enhancement to support teamd warm-restart with LACP in fast mode. -This design is not support teamd container unplanned restart. + +unplanned restart for Teamd docker is out of scope. + ### Overview We expect the restart of teamd docker should not cause link flapping or any traffic loss. All lags at data plane should remain the same. But it's hard to implement in some scenarios. @@ -52,7 +54,7 @@ The process is divided into the following stages: ![teamd smooth update](/doc/lag/images/teamd_smooth_upgrade.svg) -The Teamd container contains multiple processes, such as teamd, teammgrd, teamdctl, teamsyncd, etc. teamd process can send and receive LACPDUs with the peer through the port. teamd can update the kernel module (team.ko) status via Netlink. teamsyncd can receive Netlink events and convert them to ASIC as configuration through SWSS. +The teamd container contains multiple processes, such as teamd, teammgrd, teamdctl, teamsyncd, etc. teamd process can send and receive LACPDUs with the peer through the port. teamd can update the kernel module (team.ko) status via Netlink. teamsyncd can receive Netlink events and convert them to ASIC as configuration through SWSS. ![ teamd structure](/doc/lag/images/structure_of_teamd_container.svg) From cda2dae8f14d7966bff4f1da4a38b692e8da94a6 Mon Sep 17 00:00:00 2001 From: tianshangfei Date: Wed, 12 Oct 2022 17:51:54 +0800 Subject: [PATCH 5/5] Update cli, test, signal according to review Signed-off-by: tianshangfei --- ...warm-restart_with_LACP_in_fast_mode_HLD.md | 66 ++++++++++++++++--- doc/lag/images/teamd_smooth_upgrade.svg | 2 +- doc/lag/images/teamd_smooth_upgrade_flow.svg | 2 +- ...amd_smooth_upgrade_module_interaction1.svg | 2 +- 4 files changed, 60 insertions(+), 12 deletions(-) diff --git a/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md b/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md index af5220f093..b0a96aad35 100644 --- a/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md +++ b/doc/lag/Teamd_warm-restart_with_LACP_in_fast_mode_HLD.md @@ -2,8 +2,17 @@ ## Table of Content -### Revision - + * [Scope](#scope) + * [Overview](#overview) + * [Limitations](#limitations) + * [High-Level Design](#high-level-design) + * [CLI/YANG model Enhancements](#cliyang-model-enhancements) + * [Testing Requirements/Design](#testing-requirementsdesign) +### Revision + + | Rev | Date | Author | Change Description | + |:---:|:-----------:|:------------------:|-----------------------------------| + | 0.1 | 10/12/2022 | timstian | Initial version | ### Scope Teamd warm-restart with LACP in slow mode is already supported as described in [SONiC_Warmboot](/doc/warm-reboot/system-warmboot.md). This design is an enhancement to support teamd warm-restart with LACP in fast mode. @@ -42,13 +51,13 @@ This feature does not change the existing SONiC architecture. ### High-Level Design -During the teamd warm-restart process, a new Teamd container is created, and the old and new teamd containers need to be fully synchronized before the old teamd container is killed. After the sonic_installer upgrade_docker teamd action, only the new teamd container is left in the system. +During the teamd warm-restart process, a new Teamd container is created, and the old and new teamd containers need to be fully synchronized before the old teamd container is killed. After the warm_restart upgrade_docker teamd action, only the new teamd container is left in the system. The process is divided into the following stages: * stage 1: Waiting for ready state. We need to wait for all LAG processes to be created in the new teamd container and wait for processes to synchronize with the kernel module. -* stage 2: Role change state. The old LAG processes are terminated with SIGUSR1 and the new LAG processes are enabled with SIGUSR2. +* stage 2: Role change state. The old LAG processes are terminated with SIGUSR1 and the new LAG processes are enabled with SIGRTMIN. * stage 3: Warm-restart finish state. The old teamd container will exit, and only the new teamd container is left. @@ -64,7 +73,7 @@ During the teamd warm-restart process, the module interactions change as follows ![teamd smooth update module interaction](/doc/lag/images/teamd_smooth_upgrade_module_interaction.svg) -* stage 2: Role change state. Old teamd processes receive SIGUSR1 and send the last LACPDU to the partner, Interacting with the new teamd container will become two-way after receive SIGUSR2. The new teamd container start sending LACPDUs, and start modifying the parameters of the kernel module and ASIC +* stage 2: Role change state. Old teamd processes receive SIGUSR1 and send the last LACPDU to the partner, Interacting with the new teamd container will become two-way after receive SIGRTMIN. The new teamd container start sending LACPDUs, and start modifying the parameters of the kernel module and ASIC ![teamd smooth update module interaction](/doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg) @@ -79,11 +88,11 @@ The flow of the teamd warm-restart process is as follows: * a3. wait for the teamd processes to ok in the new teamd container. Run teamdctl to get the new teamd status. * b1. use SIGUSR1 to stop the old teamd processes. teamd and teamd_bak containers can share files so that the record files of LACPDU can be passed from the old container to the new one. (teamd flow: When SIGUSR1 is received, the old design is reused, the LACPDU record file is generated, and exit) * b2. wait for SIGUSR1 processing, which must less than 3 seconds. Otherwise, LAG considers down. -* b3. use SIGUSR2 to apply data with the new teamd processes. (teamd flow: When SIGUSR2 is received, We will compare the data with the kernel and set the new data to the kernel and ASIC, start sending LACPDU, and update the kernel parameters in real-time when the LACP status changes) +* b3. use SIGRTMIN to apply data with the new teamd processes. (teamd flow: When SIGRTMIN is received, We will compare the data with the kernel and set the new data to the kernel and ASIC, start sending LACPDU, and update the kernel parameters in real-time when the LACP status changes) * c1. delete teamd_bak container ![The flow of teamd smooth upgrade](/doc/lag/images/teamd_smooth_upgrade_flow.svg) -The process of sonic_installer rollback_docker is the same as that for warm-restart above. +The process of warm_restart rollback_docker is the same as that for warm-restart above. ### SAI API NA @@ -98,8 +107,44 @@ NA #### CLI/YANG model Enhancements -NA +Several commands will be added. + +* warm_restart --help + +Usage: warm_restart [OPTIONS] COMMAND [ARGS]... + docker upgrae manager + +Options: + --help Show this message and exit. + +Commands: + rollback_docker Rollback docker image to previous version + upgrade_docker Upgrade docker image from local binary or URL + +* warm_restart upgrade_docker --help +Usage: warm_restart upgrade_docker [OPTIONS] URL + + Upgrade docker image from local binary or URL + +Options: + -y, --yes + --cleanup_image Clean up old docker image + --skip_check Skip task check for docker upgrade + --view_check Enforce asic view check for docker upgrade + --tag TEXT Tag for the new docker image + --warm Perform warm upgrade + --help Show this message and exit. + +* warm_restart rollback_docker --help + +Usage: warm_restart rollback_docker [OPTIONS] + + Rollback docker image to previous version + +Options: + -y, --yes + --help Show this message and exit. #### Config DB Enhancements NA @@ -112,8 +157,11 @@ NA ### Testing Requirements/Design -Same as regular LAG testbed. LAG Configures LACP in fast mode, run sonic_installer upgrade_docker --warm teamd docker-teamd.gz -y on DUT will not cause link flapping or any traffic loss. +Same as regular LAG testbed. LAG Configures LACP in fast mode. + +* Running the warm_restart upgrade_docker or rollback command on the DUT is not expected to cause link flapping or traffic loss. +* During warm_restart process, down the physical link,the LAG status is not expected to be mess. #### Unit Test cases #### System Test cases diff --git a/doc/lag/images/teamd_smooth_upgrade.svg b/doc/lag/images/teamd_smooth_upgrade.svg index e339cf5efc..bd2ac3de86 100644 --- a/doc/lag/images/teamd_smooth_upgrade.svg +++ b/doc/lag/images/teamd_smooth_upgrade.svg @@ -1,4 +1,4 @@ -
Stage 3  Warm-restart finish state
Stage 3  Warm-restart finish state
Stage 2  Role change state
Stage 2  Role change state
teamd container
teamd container
teamd container
(OLD)
teamd container...
teamd container
(NEW)
teamd container...
teamd container
(OLD)
teamd container...
teamd container
(NEW)
teamd container...
teamd container
(NEW)
teamd container...
SIGUSR1
SIGUSR1
SIGUSR2
SIGUSR2
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
sonic_installer 
sonic_installer 
teamd container
teamd container
teamd container
(OLD)
teamd container...
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
Before teamd warm-restart
Before teamd warm-restart
Statge 1 Waiting for ready state
Statge 1 Waiting for ready state
Text is not SVG - cannot display
\ No newline at end of file +
Stage 3  Warm-restart finish state
Stage 3  Warm-restart finish state
Stage 2  Role change state
Stage 2  Role change state
teamd container
teamd container
teamd container
(OLD)
teamd container...
teamd container
(NEW)
teamd container...
teamd container
(OLD)
teamd container...
teamd container
(NEW)
teamd container...
teamd container
(NEW)
teamd container...
SIGUSR1
SIGUSR1
SIGRTMIN
SIGRTMIN
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
warm_restart
warm_restart
teamd container
teamd container
teamd container
(OLD)
teamd container...
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
lag
Before teamd warm-restart
Before teamd warm-restart
Statge 1 Waiting for ready state
Statge 1 Waiting for ready state
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/teamd_smooth_upgrade_flow.svg b/doc/lag/images/teamd_smooth_upgrade_flow.svg index b340d251e3..a07dbf902c 100644 --- a/doc/lag/images/teamd_smooth_upgrade_flow.svg +++ b/doc/lag/images/teamd_smooth_upgrade_flow.svg @@ -1,4 +1,4 @@ -
teamd
teamd
record LACPDU to file
record LACPDU to file
teammgrd
teammgrd
teamsyncd
teamsyncd
teamd_bak container
teamd_bak container
teamd
teamd
apply the data
apply the data
lacpdu_process
lacpdu_process
sonic_installer
sonic_installer
b2. wait for SIGUSR1 processing 
b2. wait for SIGUSR1 processing 
teammgrd
teammgrd
teamsyncd
teamsyncd
warm_restart
sync to APPL_DB
warm_restart...

sync with kernel
team_refresh()
sync with kernel...
a3.wait for teamd ready
a3.wait for teamd ready
a2. create new teamd container
a2. create new teamd container
lacp
protocal
lacp...
LACPDU
LACPDU
LACP partner
LACP partner
The LAG considered down,
If more three LACPDUs
 are not received
The LAG considered down,...
teamd container
teamd container
create process and setwarm_restart flag
/usr/bin/teamd -w -o -t ...
create process and setwarm_restart flag...
kernel
kernel
port link change
port link change
b3. send SIGUSR2
b3. send SIGUSR2
b1. send SIGUSR1
b1. send SIGUSR1
a1.rename teamd to teamd_bak
a1.rename teamd to teamd_bak
LACP actor
LACP actor
Stage 1 
Waiting for read state
Stage 1...
c1 stop teamd_bak
c1 stop t...
Stage 2 
Role change state
Stage 2...
Stage 3 
warm-restart finish state
Stage 3...
lacpdu
lacpdu
Text is not SVG - cannot display
\ No newline at end of file +
teamd
teamd
record LACPDU to file
record LACPDU to file
teammgrd
teammgrd
teamsyncd
teamsyncd
teamd_bak container
teamd_bak container
teamd
teamd
apply the data
apply the data
lacpdu_process
lacpdu_process
warm_restart
warm_restart
b2. wait for SIGUSR1 processing 
b2. wait for SIGUSR1 processing 
teammgrd
teammgrd
teamsyncd
teamsyncd
warm_restart
sync to APPL_DB
warm_restart...

sync with kernel
team_refresh()
sync with kernel...
a3.wait for teamd ready
a3.wait for teamd ready
a2. create new teamd container
a2. create new teamd container
lacp
protocal
lacp...
LACPDU
LACPDU
LACP partner
LACP partner
The LAG considered down,
If more three LACPDUs
 are not received
The LAG considered down,...
teamd container
teamd container
create process and setwarm_restart flag
/usr/bin/teamd -w -o -t ...
create process and setwarm_restart flag...
kernel
kernel
port link change
port link change
b3. send SIGRTMIN
b3. send SIGRTMIN
b1. send SIGUSR1
b1. send SIGUSR1
a1.rename teamd to teamd_bak
a1.rename teamd to teamd_bak
LACP actor
LACP actor
Stage 1 
Waiting for read state
Stage 1...
c1 stop teamd_bak
c1 stop t...
Stage 2 
Role change state
Stage 2...
Stage 3 
warm-restart finish state
Stage 3...
lacpdu
lacpdu
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg b/doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg index c420cb4b66..68269cb2f9 100644 --- a/doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg +++ b/doc/lag/images/teamd_smooth_upgrade_module_interaction1.svg @@ -1,4 +1,4 @@ -
local node
local node
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
old
teamd container(teamd_bak)
old...
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
netlink event
netlink event
new
teamd container(teamd)
new...
netlink event
netlink event
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change

Stage 2 Role change state

Stage 2 Role change state
SIGUSR1
SIGUSR1
SIGUSR2
SIGUSR2
Text is not SVG - cannot display
\ No newline at end of file +
local node
local node
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change
old
teamd container(teamd_bak)
old...
Kernel
team.ko
Kernel...
ASIC
ASIC
lacpdu
lacpdu
partner node
partner node
netlink event
netlink event
new
teamd container(teamd)
new...
netlink event
netlink event
ASIC
configuration
ASIC...
lacpdu
lacpdu
netlink change
netlink change

Stage 2 Role change state

Stage 2 Role change state
SIGUSR1
SIGUSR1
SIGRTMIN
SIGRTMIN
Text is not SVG - cannot display
\ No newline at end of file