Skip to content

Commit 69d76c5

Browse files
committed
Add design for basic validation of swss state consistency
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
1 parent dccc2c9 commit 69d76c5

File tree

1 file changed

+22
-4
lines changed

1 file changed

+22
-4
lines changed

doc/warm-reboot/swss_warm_restart.md

+22-4
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
Table of Contents
55
=================
6+
67
* [Overview](#overview)
78
* [Input Data for swss](#input-data-for-swss)
89
* [configDB](#configdb)
@@ -11,13 +12,16 @@ Table of Contents
1112
* [BGP and fpmsyncd](#bgp-and-fpmsyncd)
1213
* [JSON files](#json-files)
1314
* [Syncd](#syncd)
14-
* [Swss state restore](#swss-state-restore)
15+
* [SWSS state restore](#swss-state-restore)
1516
* [PORT, VLAN and INTF](#port-vlan-and-intf)
1617
* [ARP, LAG and route data in orchagent](#arp-lag-and-route-data-in-orchagent)
1718
* [QoS, Buffer, CRM, PFC WD and ACL data in orchagent](#qos-buffer-crm-pfc-wd-and-acl-data-in-orchagent)
1819
* [COPP, Tunnel and Mirror data in orchagent](#copp-tunnel-and-mirror-data-in-orchagent)
1920
* [FDB and port state in orchagent](#fdb-and-port-state-in-orchagent)
2021
* [OID for switch default objects in orchagent\.](#oid-for-switch-default-objects-in-orchagent)
22+
* [SWSS state consistency validation](#swss-state-consistency-validation)
23+
* [Pre\-restart state validation](#pre-restart-state-validation)
24+
* [Post\-restore state validation](#post-restore-state-validation)
2125
* [SWSS state sync up](#swss-state-sync-up)
2226
* [ARP sync up](#arp-sync-up)
2327
* [port state sync up](#port-state-sync-up)
@@ -72,7 +76,7 @@ For copp, tunnel and mirror related configurations, they are loaded from json f
7276
FDB and Port state notifications come from ASIC, syncd relays the data to orchagent.
7377
Orchagent also gets info for the objects created by ASIC by default, ex. the port list, hw lanes and queues.
7478

75-
# Swss state restore
79+
# SWSS state restore
7680
During swss warm restart, the state of swss should be restored. It is assumed that all data in APPDB has either been restored or been kept intact.
7781

7882
## PORT, VLAN and INTF
@@ -92,12 +96,26 @@ Orchagent fetch the existing data from configDB at startup.
9296
These configuration will be loaded to APPDB from JSON files then received by orchagent at startup.
9397

9498
## FDB and port state in orchagent
95-
The FDB data is restored from APPDB by orchagent.
96-
TODO: Port state restore.
99+
Both the FDB and port state data is restored from APPDB by orchagent.
97100

98101
## OID for switch default objects in orchagent.
99102
Orchagent relies on SAI get api to fetch the OID data from syncd for switch default objects.
100103

104+
# SWSS state consistency validation
105+
After swss state restore, the state of each swss processes especially orchagent should be consistent with the state before restart.
106+
For now, it is assumed that no configDB change during the whole warm restart window. Then the state of orchagent is mainly driven by APPDB data changes. Following basic pre-restart and post-restore validation could be applied.
107+
108+
## Pre-restart state validation
109+
A "restart prepare" request is sent to orchagent, if there no pending data in SyncMap (m_toSync) of all application consumers in orchagent, OrchDaemon will set a flag to stop processing any further APPDB data change and return success for the "restart prepare"
110+
request. Otherwise failure should be returned for the request to indicate that there is un-fullfilled dependency in orchagent which is not ready to do warm restart.
111+
112+
The existing ProducerStateTable/ConsumerStateTable implementation should be updated so that only consumer side modify the actual table.
113+
114+
## Post-restore state validation
115+
After swss state restore, same as that in pre-restart phase, no pending data in SyncMap (m_toSync) of all application consumers should exist. This should be done before swss state sync up.
116+
117+
*More exhaustive validation beyond this is to be designed and implemented.*
118+
101119
# SWSS state sync up
102120
During the restart window, dynamic data like ARP, port state, FDB, LAG and route may be changed. Orchagent needs to sync up with the latest network state.
103121

0 commit comments

Comments
 (0)