You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: doc/warm-reboot/swss_warm_restart.md
+22-4
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,7 @@
3
3
4
4
Table of Contents
5
5
=================
6
+
6
7
*[Overview](#overview)
7
8
*[Input Data for swss](#input-data-for-swss)
8
9
*[configDB](#configdb)
@@ -11,13 +12,16 @@ Table of Contents
11
12
*[BGP and fpmsyncd](#bgp-and-fpmsyncd)
12
13
*[JSON files](#json-files)
13
14
*[Syncd](#syncd)
14
-
*[Swss state restore](#swss-state-restore)
15
+
*[SWSS state restore](#swss-state-restore)
15
16
*[PORT, VLAN and INTF](#port-vlan-and-intf)
16
17
*[ARP, LAG and route data in orchagent](#arp-lag-and-route-data-in-orchagent)
17
18
*[QoS, Buffer, CRM, PFC WD and ACL data in orchagent](#qos-buffer-crm-pfc-wd-and-acl-data-in-orchagent)
18
19
*[COPP, Tunnel and Mirror data in orchagent](#copp-tunnel-and-mirror-data-in-orchagent)
19
20
*[FDB and port state in orchagent](#fdb-and-port-state-in-orchagent)
20
21
*[OID for switch default objects in orchagent\.](#oid-for-switch-default-objects-in-orchagent)
22
+
*[SWSS state consistency validation](#swss-state-consistency-validation)
23
+
*[Pre\-restart state validation](#pre-restart-state-validation)
24
+
*[Post\-restore state validation](#post-restore-state-validation)
21
25
*[SWSS state sync up](#swss-state-sync-up)
22
26
*[ARP sync up](#arp-sync-up)
23
27
*[port state sync up](#port-state-sync-up)
@@ -72,7 +76,7 @@ For copp, tunnel and mirror related configurations, they are loaded from json f
72
76
FDB and Port state notifications come from ASIC, syncd relays the data to orchagent.
73
77
Orchagent also gets info for the objects created by ASIC by default, ex. the port list, hw lanes and queues.
74
78
75
-
# Swss state restore
79
+
# SWSS state restore
76
80
During swss warm restart, the state of swss should be restored. It is assumed that all data in APPDB has either been restored or been kept intact.
77
81
78
82
## PORT, VLAN and INTF
@@ -92,12 +96,26 @@ Orchagent fetch the existing data from configDB at startup.
92
96
These configuration will be loaded to APPDB from JSON files then received by orchagent at startup.
93
97
94
98
## FDB and port state in orchagent
95
-
The FDB data is restored from APPDB by orchagent.
96
-
TODO: Port state restore.
99
+
Both the FDB and port state data is restored from APPDB by orchagent.
97
100
98
101
## OID for switch default objects in orchagent.
99
102
Orchagent relies on SAI get api to fetch the OID data from syncd for switch default objects.
100
103
104
+
# SWSS state consistency validation
105
+
After swss state restore, the state of each swss processes especially orchagent should be consistent with the state before restart.
106
+
For now, it is assumed that no configDB change during the whole warm restart window. Then the state of orchagent is mainly driven by APPDB data changes. Following basic pre-restart and post-restore validation could be applied.
107
+
108
+
## Pre-restart state validation
109
+
A "restart prepare" request is sent to orchagent, if there no pending data in SyncMap (m_toSync) of all application consumers in orchagent, OrchDaemon will set a flag to stop processing any further APPDB data change and return success for the "restart prepare"
110
+
request. Otherwise failure should be returned for the request to indicate that there is un-fullfilled dependency in orchagent which is not ready to do warm restart.
111
+
112
+
The existing ProducerStateTable/ConsumerStateTable implementation should be updated so that only consumer side modify the actual table.
113
+
114
+
## Post-restore state validation
115
+
After swss state restore, same as that in pre-restart phase, no pending data in SyncMap (m_toSync) of all application consumers should exist. This should be done before swss state sync up.
116
+
117
+
*More exhaustive validation beyond this is to be designed and implemented.*
118
+
101
119
# SWSS state sync up
102
120
During the restart window, dynamic data like ARP, port state, FDB, LAG and route may be changed. Orchagent needs to sync up with the latest network state.
0 commit comments