-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[supervisor] Add patch to prevent 'supervisorctl start' command from hanging if system time has rolled backward #1311
[supervisor] Add patch to prevent 'supervisorctl start' command from hanging if system time has rolled backward #1311
Conversation
…hanging if system time has rolled backward
|
||
+ # If system clock has moved backward, reset self.laststart to current system time | ||
+ if now < self.laststart: | ||
+ self.laststart = now; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with this pariticular fix. however, I think the time travel can cause other problems in the code.
i checked the supervisord code, it looks like there are other places that save the current time to a value and use it later, like self.delay, it could also have the same problem in the back-off process.
if you are in the backoff state, and the system clock moved backward, then you need wait for a long time to catch up and the process won't restart for a long time.
another example is last_dispatch.
this does not sound like a trivial problem. I also see there are some unittest added. I think we need to fix all the problems I see above and add unit test code and add unit test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a more comprehensive fix and unit test code
* github: [minigraph]: Set hostname in all default minigraphs to 'sonic' (sonic-net#1333) Install sonic-platform-common package in platform-monitor docker for ledd (sonic-net#1330) Prevent supervisor from restarting configdb-load.sh (sonic-net#1324) [scripts]: Fix issues with checking status of the DB. Use one approach everywhere. (sonic-net#1323) [Arista7260cx3] Add platform specific reboot tool (sonic-net#1318) Install azure cli into docker-sonic-mgmt (sonic-net#1322) [sonic-py-swsssdk]: Update submodule pointer (sonic-net#1319) [supervisor] Add patch to prevent 'supervisorctl start' command from hanging if system time has rolled backward (sonic-net#1311) Move platform-specific hardware plugin base packages to sonic-platform-common submodule (sonic-net#1301) [baseimage]: Add missing dependency of igb & ixgbe (sonic-net#1316) [snmpagent]: Update sonic-snmpagent submodule (sonic-net#1314) Run docker containers with /tmp and /var/tmp mounted to tmpfs (sonic-net#1313) [Broadcom]: Update Boradcom SAI package to 3.0.3.3-3 (sonic-net#1312) [submodule]: Update sairedis (sonic-net#1310) [snmpagent]: Update sonic-snmpagent submodule (sonic-net#1308) [baseimage]: add mkfs.ext3 and fsck.ext3 in initrd to support ext3 partition (sonic-net#1306) [submodule]: update sonic-sairedis to enable syncd-rpc (sonic-net#1304) [device]: Fix Mellanox sku check (sonic-net#1303) Add support for Accton AS7712-32X platform (sonic-net#1299) [build]: build libsaithrift-dev and docker-ptf-[platform] (sonic-net#1300) [libsaithrift-dev]: Enable building libsaithrift-dev and pythonthrift libraries (sonic-net#1296) [Platform] Update switch configuration files and download link for Ingrasys S9130-32X/S9230-64X (sonic-net#1295) [Delta]: Add psuutil support for ag9032v1 (sonic-net#1298) Revert "[Dell S6100, Z9100] psusutil sysfs attribute changes for hwmon (sonic-net#1264)" (sonic-net#1297) [Dell S6100, Z9100] psusutil sysfs attribute changes for hwmon (sonic-net#1264) [Platform]As7712-32x update for sensors test (sonic-net#1292) Revert "[DHCP relay]: Add patch to always undef VLAN_TCI_PRESENT so as not to treat VLAN-tagged packets differently (sonic-net#1254)" (sonic-net#1291) [[submodule]: Update swss-common (sonic-net#1289) [baseimage]: Install sysfsutils package into SONiC host system (sonic-net#1290) Add caclmgrd and related files to translate and install control plane ACL rules (sonic-net#1240) [mellanox]: Update Mellanox buffers configuration (sonic-net#1263) [platform]: chmod 0644 for *.mk files (sonic-net#1284) [arista]: Update Arista platform modules and mount libraries to snmp docker (sonic-net#1283) [platform]: chmod a+x for debian/rules for platform-modules-delta (sonic-net#1282) Let debootstrap uses the same sources link as apt (sonic-net#1279) [doc]: update sonic-buildimage clone instructions (sonic-net#1278) [image]: Explicitly specify kernel_version as string (sonic-net#1280) Disable autosuspend for USB devices, preventing usb drives to be stopped and then renamed (sonic-net#1275) [platform]: As7712 32x add fancontrol (sonic-net#1270) [Platform] Add psuutil support for Ingrasys S9130-32X (sonic-net#1273) [submodules]: Update swss and utilitiles modules (sonic-net#1276) [Platform] Add psuutil and update submodule for Ingrasys S9100-32X, S8810-32Q, S9200-64X on master branch (sonic-net#1271) [centec]: support sai1.0 (sonic-net#1268) [build]: add build badge for nephos platform (sonic-net#1267) [build]: allow to use http(s) proxy in the build (sonic-net#1265) [Accton AS7816-64X] Add new platform and device for AS7816-64X. (sonic-net#1260) [Platform] Add Ingrasys S9130-32X and S9230-64X with Nephos Switch ASIC (sonic-net#1245) Add 'make reset' target with warning prompt to reset git repo and submodules (sonic-net#1258) [sudoers] Add 'docker ps' to READ_ONLY_CMDS (sonic-net#1259) Add set/get lpmode and mode_rst feature for qsfp (sonic-net#1261) [build] allow user to override the default number of build jobs (sonic-net#1255) [build] make second Accton Debian package extra package of the first one (sonic-net#1257) [arista] Delete sysfs entries for all Arista Digital Power Monitor/Management devices (sonic-net#1256) [DHCP relay]: Add patch to always undef VLAN_TCI_PRESENT so as not to treat VLAN-tagged packets differently (sonic-net#1254) [snmp]: Save S/N in state DB prior to starting service (sonic-net#1246) [device/accton] Correct exception function name (sonic-net#1249) [DHCP relay]: Fix circuit ID and remote ID bugs (sonic-net#1248) [sonic-py-swsssdk]: Update submodule pointer (sonic-net#1253) [swss]: update swss submodule (sonic-net#1244) [broadcom]: update sai to 3.0.3.3-1 (sonic-net#1243)
If the system time rolls backwards after
supervisorctl start <process_name>
has been called but while the process is still in theSTARTING
state (i.e., it has not yet entered theRUNNING
state), then, depending on how far backward the system time has rolled, thesupervisorctl start <process_name>
command can hang a very long time waiting for the system time to reachself.laststart + startsecs
.This patch creates a temporary workaround to mitigate this issue by resetting
self.laststart
to the current system time if it is ever determined that the system time has rolled backward.I have opened an issue on the Supervisor GitHub repo (Supervisor/supervisor#1043). Once the Supervisor folks merge an official solution, I will remove this patch and pull in the latest upstream changes.