-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added changes to handle dependency check in FdbSyncd and FpmSyncd for warm-boot #1556
Changes from 9 commits
5530e69
cd04985
12d940a
d56fa46
b5c0e7c
486d26a
421f8e6
7fa56be
a3a9d04
a951e6b
a6b5521
8fbd131
787b1b8
08a3908
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,8 +9,24 @@ | |
#include "netmsg.h" | ||
#include "warmRestartAssist.h" | ||
|
||
// The timeout value (in seconds) for fdbsyncd reconcilation logic | ||
#define DEFAULT_FDBSYNC_WARMSTART_TIMER 30 | ||
/* | ||
* The timeout value (in seconds) for fdbsyncd reconcilation logic | ||
* This timers avoids indefinite wait for orchagent reconcillation | ||
*/ | ||
#define DEFAULT_FDBSYNC_WARMSTART_TIMER 600 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the purpose of increasing this timer. @zhenggen-xu can you also review? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The timer seems to be big, so we need 10 minutes for FDB to sync? how many entries and how do we get to that number? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the max timer to wait for orchagent reconciliation. This will not be hit in normal circumstances. This is just to avoid indefinite wait on orchagent to reconcile and hence a large value. The code will not wait for this timer once the orchagent had reconciled. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds like there were many timers, could you make a inline comments in the code to describe the relationships between different timers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reconciliation and restoration timers are not actually related, except that the reconciliation starts after restoration ends. And the warm-start timer is like fallback timeout to avoid indefinite wait for reconciliation. Will try to include some more description there There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me, we are mixing things here. FDBSYNC_RECON_WAIT_TIME in your code is the timer to wait for orchagent to be RECONSILED, that does not depending on other application timers. I don't see why it should be 120 based on fpmsyncd. Do we have a test log that showing the events in sequence? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FDBSYNC_RECON_WAIT_TIME is not the timer to wait for orchagent to reconcile. FDBSYNC_RECON_WAIT_TIME is the time for control plane ( BGP) to re-establish and reconcile its data ( routes) which are then programmed to fpmsyncd/fdbsyncd and hence the time given is same as fpmsyncd. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See your fdbsyncd.cpp in your PR: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did not understand the comment completely. Can you please elaborate a little ? FDBSYNC_RECON_WAIT_TIME is the time we are waiting for the control plane ( in this case BGP ) to converge after restart and after this we also need to check if orchagent has reconciled after warm-reboot. Hence when this timer FDBSYNC_RECON_WAIT_TIME expires, we check if orchagent is reconciled and hereon we keep checking for orchagent to be reconciled so that we can mark fdbsyncd as reconciled and it can start updating the APP-DB. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code or even comments in the code does not specify the relationship between this FDBSYNC_RECON_WAIT_TIME timer and BGP timer. If this process has dependency on BGP reconciliation, we should make it explicit, e,g, fpmsyncd could write a flag to somewhere (e,g stateDB) and this process can pick it up. Having a hardcoded timer here to assume the BGP has the same timer (BGP timer, which is configurable) does not seem to be a good idea. |
||
|
||
/* Time to wait to run fdb reconcillation after fdbsyncd replay. | ||
* fdbsyncd will wait for this time and for orchagnet to reconcile | ||
* before reconciling itself | ||
*/ | ||
#define FDBSYNC_RECON_WAIT_TIME 120 | ||
|
||
/* | ||
* This is the MAX time in seconds, fdbsyncd will wait after warm-reboot | ||
* for the interface entries to be recreated in kernel before attempting to | ||
* write the FDB data to kernel | ||
*/ | ||
#define INTF_RESTORE_MAX_WAIT_TIME 180 | ||
|
||
namespace swss { | ||
|
||
|
@@ -43,7 +59,8 @@ class FdbSync : public NetMsg | |
|
||
virtual void onMsg(int nlmsg_type, struct nl_object *obj); | ||
|
||
bool isFdbRestoreDone(); | ||
bool isIntfRestoreDone(); | ||
bool isReadyToReconcile(); | ||
|
||
AppRestartAssist *getRestartAssist() | ||
{ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,6 +35,8 @@ int main(int argc, char **argv) | |
Selectable *temps; | ||
int ret; | ||
Select s; | ||
SelectableTimer replayCheckTimer(timespec{0, 0}); | ||
SelectableTimer reconCheckTimer(timespec{0, 0}); | ||
|
||
using namespace std::chrono; | ||
|
||
|
@@ -45,7 +47,31 @@ int main(int argc, char **argv) | |
if (sync.getRestartAssist()->isWarmStartInProgress()) | ||
{ | ||
sync.getRestartAssist()->readTablesToMap(); | ||
|
||
steady_clock::time_point starttime = steady_clock::now(); | ||
while (!sync.isIntfRestoreDone()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
CPU is wasted on waiting. Could you subscribe Redis? #WontFix There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As this is not a continuous busy wait ( sleep is present), this should not cause the cpu to be continuously busy. Also there is nothing for fdbsyncd to do until the interface info is populated to kernel after system warm-reboot, hence it needs to wait till such time. |
||
{ | ||
duration<double> time_span = | ||
duration_cast<duration<double>>(steady_clock::now() - starttime); | ||
int pasttime = int(time_span.count()); | ||
|
||
if (pasttime > INTF_RESTORE_MAX_WAIT_TIME) | ||
{ | ||
SWSS_LOG_INFO("timed-out before all interface data was replayed to kernel!!!"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if intf is not restored after max_wait_time? Shouldn't we abort to avoid more issues? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. System will proceed further. Some mac programming to kernel might fail because underlying interface is not yet created. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we could not restore interface, why we should proceed further and get into some limbo state that may or may not have critical issues. I would suggest we abort to bring user's attention. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interfaces will be eventually restored. The only impact will be that warm-reboot might not be hitless and there will be traffic loss seen. Not sure if we need to go for full abort and impact everything and all traffic. Requesting @prsunny to comment on this too There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Some mac programming to kernel might fail because underlying interface is not yet created." so this condition will be recovered by someone later? Again, if it is a critical condition, we should raise/abort so we don't get into limbo state. |
||
break; | ||
} | ||
sleep(1); | ||
} | ||
SWSS_LOG_NOTICE("Starting ReconcileTimer"); | ||
sync.getRestartAssist()->startReconcileTimer(s); | ||
replayCheckTimer.setInterval(timespec{1, 0}); | ||
replayCheckTimer.start(); | ||
s.addSelectable(&replayCheckTimer); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove extra blank line #Closed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will do |
||
else | ||
{ | ||
sync.getRestartAssist()->warmStartDisabled(); | ||
sync.m_reconcileDone = true; | ||
} | ||
|
||
netlink.registerGroup(RTNLGRP_LINK); | ||
|
@@ -67,14 +93,51 @@ int main(int argc, char **argv) | |
{ | ||
s.select(&temps); | ||
|
||
if(temps == (Selectable *)sync.getFdbStateTable()) | ||
if (temps == (Selectable *)sync.getFdbStateTable()) | ||
{ | ||
sync.processStateFdb(); | ||
} | ||
else if (temps == (Selectable *)sync.getCfgEvpnNvoTable()) | ||
{ | ||
sync.processCfgEvpnNvo(); | ||
} | ||
else if (temps == &replayCheckTimer) | ||
{ | ||
if (sync.getFdbStateTable()->empty() && sync.getCfgEvpnNvoTable()->empty()) | ||
{ | ||
sync.getRestartAssist()->appDataReplayed(); | ||
SWSS_LOG_NOTICE("FDB Replay Complete, Start Reconcile Check"); | ||
reconCheckTimer.setInterval(timespec{FDBSYNC_RECON_WAIT_TIME, 0}); | ||
reconCheckTimer.start(); | ||
s.addSelectable(&reconCheckTimer); | ||
} | ||
else | ||
{ | ||
replayCheckTimer.setInterval(timespec{1, 0}); | ||
// re-start replay check timer | ||
replayCheckTimer.start(); | ||
} | ||
|
||
} | ||
else if (temps == &reconCheckTimer) | ||
{ | ||
if (sync.isReadyToReconcile()) | ||
{ | ||
if (sync.getRestartAssist()->isWarmStartInProgress()) | ||
{ | ||
sync.m_reconcileDone = true; | ||
sync.getRestartAssist()->stopReconcileTimer(s); | ||
sync.getRestartAssist()->reconcile(); | ||
SWSS_LOG_NOTICE("VXLAN FDB VNI Reconcillation Complete (Event)"); | ||
} | ||
} | ||
else | ||
{ | ||
reconCheckTimer.setInterval(timespec{1, 0}); | ||
// Restart and check again in 1 sec | ||
reconCheckTimer.start(); | ||
} | ||
} | ||
else | ||
{ | ||
/* | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,7 +16,9 @@ using namespace swss; | |
* Default warm-restart timer interval for routing-stack app. To be used only if | ||
* no explicit value has been defined in configuration. | ||
*/ | ||
const uint32_t DEFAULT_ROUTING_RESTART_INTERVAL = 120; | ||
const uint32_t DEFAULT_ROUTING_RESTART_INTERVAL = 600; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the purpose of increasing this timeout? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This timer is now similar to the fallback timer in fdbsyncd. This will take effect only to avoid indefinite wait on orchagent getting reconciled. The default reconcile timer is still 120 seconds as defined by the other macro There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above, we should well define these two timers: DEFAULT_ROUTING_RESTART_INTERVAL and DEFAULT_ROUTING_RECON_CHECK_INTERVAL in document/code-comments. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure will add comment in the code |
||
|
||
const uint32_t DEFAULT_ROUTING_RECON_CHECK_INTERVAL = 120; | ||
|
||
// Wait 3 seconds after detecting EOIU reached state | ||
// TODO: support eoiu hold interval config | ||
|
@@ -67,6 +69,10 @@ int main(int argc, char **argv) | |
SelectableTimer eoiuCheckTimer(timespec{0, 0}); | ||
// After eoiu flags are detected, start a hold timer before starting reconciliation. | ||
SelectableTimer eoiuHoldTimer(timespec{0, 0}); | ||
|
||
// Timers to wait for dependent reconcillation | ||
SelectableTimer reconcileCheckTimer(timespec{0, 0}); | ||
SelectableTimer reconcileHoldTimer(timespec{0, 0}); | ||
/* | ||
* Pipeline should be flushed right away to deal with state pending | ||
* from previous try/catch iterations. | ||
|
@@ -85,13 +91,14 @@ int main(int argc, char **argv) | |
{ | ||
/* Obtain warm-restart timer defined for routing application */ | ||
time_t warmRestartIval = sync.m_warmStartHelper.getRestartTimer(); | ||
warmStartTimer.setInterval(timespec{DEFAULT_ROUTING_RESTART_INTERVAL, 0}); | ||
if (!warmRestartIval) | ||
{ | ||
warmStartTimer.setInterval(timespec{DEFAULT_ROUTING_RESTART_INTERVAL, 0}); | ||
reconcileCheckTimer.setInterval(timespec{DEFAULT_ROUTING_RECON_CHECK_INTERVAL, 0}); | ||
} | ||
else | ||
{ | ||
warmStartTimer.setInterval(timespec{warmRestartIval, 0}); | ||
reconcileCheckTimer.setInterval(timespec{warmRestartIval, 0}); | ||
} | ||
|
||
/* Execute restoration instruction and kick off warm-restart timer */ | ||
|
@@ -100,6 +107,9 @@ int main(int argc, char **argv) | |
warmStartTimer.start(); | ||
s.addSelectable(&warmStartTimer); | ||
SWSS_LOG_NOTICE("Warm-Restart timer started."); | ||
reconcileCheckTimer.start(); | ||
s.addSelectable(&reconcileCheckTimer); | ||
SWSS_LOG_NOTICE("reconcileCheckTimer timer started "); | ||
} | ||
|
||
// Also start periodic eoiu check timer, first wait 5 seconds, then check every 1 second | ||
|
@@ -108,6 +118,10 @@ int main(int argc, char **argv) | |
s.addSelectable(&eoiuCheckTimer); | ||
SWSS_LOG_NOTICE("Warm-Restart eoiuCheckTimer timer started."); | ||
} | ||
else | ||
{ | ||
sync.m_warmStartHelper.setState(WarmStart::WSDISABLED); | ||
} | ||
|
||
while (true) | ||
{ | ||
|
@@ -122,17 +136,26 @@ int main(int argc, char **argv) | |
* select() loop. | ||
* Note: route reconciliation always succeeds, it will not be done twice. | ||
*/ | ||
if (temps == &warmStartTimer || temps == &eoiuHoldTimer) | ||
if (temps == &warmStartTimer || temps == &eoiuHoldTimer || temps == &reconcileHoldTimer) | ||
{ | ||
bool readyToReconcile = false; | ||
if (temps == &warmStartTimer) | ||
{ | ||
readyToReconcile = true; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In case of orchagent still not ready, should we just abort? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reconcile should take care of cleaning up stale entries. So not sure abort is necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there was a reason we wait for "sync.isReadyToReconcile", I assume it was a must condition for reconcile to be working as expected. Again, if that condition is broken, we should make it visible to user. This probably won't happen in normal case, if it does, we should have information for user to debug, so raise or abort could help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again same as earlier, orchagent will eventually reconcile. If it does not, it would have its own mechanism to recover/abort. Here we are making effort to reconcile the system to pre-warm-reboot state. Hence if we continue, we would reconcile prematurely and hence warm-reboot may not be hitless. However if we abort, we would impact everything and the full traffic will be hit. |
||
SWSS_LOG_NOTICE("Warm-Restart timer expired."); | ||
} | ||
else if (temps == &reconcileHoldTimer) | ||
{ | ||
readyToReconcile = true; | ||
SWSS_LOG_NOTICE("Warm-Restart reconcile hold timer expired."); | ||
} | ||
else | ||
{ | ||
readyToReconcile = sync.isReadyToReconcile(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will basically invalidate the eoiu design for fast reconciliation. Understood we probably have to rely on orchagent reconciliation status, then the DEFAULT_ROUTING_RECON_CHECK_INTERVAL should be reduced to way smaller to take advantage of eoui. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. DEFAULT_ROUTING_RECON_CHECK_INTERVAL timer is only fallback timer. Is will only come into action when the replay/reconcile does not happen in given time. Hence it should not affect eoui handling. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. eoiuHoldTimer is one-shot timer as is in the code, it could be triggered very early like after a few seconds, at that time readyToReconcile could be false (likely) , then the code will fall back to the reconcileCheckTimer (if timer not configured, then DEFAULT_ROUTING_RECON_CHECK_INTERVAL =120 seconds). This is what I meant invalidate the eoiu design. Example: eoui takes 10 seconds to finish, the orchagent takes 15 seconds to get readyToReconcile state, we still need wait at least 120 seconds to reconcile. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got your point. When eoiuHoldTimer expires, that means control plane ( BGP ) has converged. So now we just need to wait for orchangent reconcile and proceed once that is done. To implement this, will restart reconCheck timer with 2 seconds when eoiuHoldTimer expires, to check if orchagent has reconciled. Will implement this change. |
||
SWSS_LOG_NOTICE("Warm-Restart EOIU hold timer expired."); | ||
} | ||
if (sync.m_warmStartHelper.inProgress()) | ||
|
||
if (sync.m_warmStartHelper.inProgress() && readyToReconcile) | ||
{ | ||
sync.m_warmStartHelper.reconcile(); | ||
SWSS_LOG_NOTICE("Warm-Restart reconciliation processed."); | ||
|
@@ -174,6 +197,30 @@ int main(int argc, char **argv) | |
{ | ||
s.removeSelectable(&eoiuCheckTimer); | ||
} | ||
} | ||
else if (temps == &reconcileCheckTimer) | ||
{ | ||
SWSS_LOG_NOTICE("reconcile Check timer Expired "); | ||
if (sync.m_warmStartHelper.inProgress()) | ||
{ | ||
if (sync.isReadyToReconcile()) | ||
{ | ||
reconcileHoldTimer.setInterval(timespec{2, 0}); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the hold-timer for? Why do we need wait another 2 seconds before reconcile happens? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once fpmsyncd reconcile is done, this timer checks if orchagent is reconciled and we are ready to start updating the reconciled fpmsyncd entries into the APP-DB. Since did not want to check too frequently, kept the timer at 2 seconds. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reconcileHoldTimer is one-shot timer in the code, it just means we wait one time 2 seconds then do reconcile. Question was, why need wait another 2 seconds? what did we wait for? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reconcileHoldTimer is one shot timer which is fired once the control plane reconciliation is done. At this time if orchagnet is reconciled, we continue with fpnsyncd reconcillation. However when reconcileHoldTimer expires if orchangent is still not reconciled, we fire reconcileHoldTimer for another 2 seconds and re-check if the orchagent is reconciled yet. This continues till it finds orchagent has reconciled. The idea is it wait for minimum of control plane ( BGP ) reconciliation time ( default 120 seconds) and also check if orchagent is reconciled before reconciling fpmsyncd. Both conditions should be met. Orchagent reconcile can occur earlier or later based on system scale, however we need to wait for minimum control plane ( BGP ) reconcillation time, before reconciling fpmsyncd. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some correction in the above explanation, reconcileCheckTimer ( not reconcileHoldTimer) is fired again for 2 seconds to re-check if orchagent has reconciled. Hence reconcileHoldTimer does not need 2 seconds. Will change it to 1 seconds like eoiu hold timer. |
||
reconcileHoldTimer.start(); | ||
s.addSelectable(&reconcileHoldTimer); | ||
SWSS_LOG_NOTICE("Warm-Restart started reconcile hold timer "); | ||
s.removeSelectable(&reconcileCheckTimer); | ||
continue; | ||
} | ||
// re-start check timer for every 2 seconds now on | ||
reconcileCheckTimer.setInterval(timespec{2, 0}); | ||
reconcileCheckTimer.start(); | ||
SWSS_LOG_NOTICE("Warm-Restart reconcileCheckTimer restarted"); | ||
} | ||
else | ||
{ | ||
s.removeSelectable(&reconcileCheckTimer); | ||
} | ||
} | ||
else if (!warmStartEnabled || sync.m_warmStartHelper.isReconciled()) | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FdbSync::isIntfRestoreDone() , FdbSync::isReadyToReconcile() and RouteSync::isReadyToReconcile() are doing similar tasks, it seems we could make them into library calls with different input parameters. Later PR is probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure .. will consider in future updates to the code