-
Notifications
You must be signed in to change notification settings - Fork 650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lock to config reload/load_minigraph #3475
Conversation
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
1304f12
to
bee5942
Compare
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
7e99373
to
e861024
Compare
You may need to check all the reference of |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Should we cherry-pick this into other release branches?
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Hi @bingwang-ms @zjswhhh, added another commit to improve the log, please help review, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
What I did In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state. In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel. The file lock is binding to /etc/sonic/reload.lock. This is to fix issue: #19855 Microsoft ADO (number only): 28877643 Signed-off-by: Longxiang Lyu lolv@microsoft.com How I did it Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph. How to verify it UT and on testbed. New command output (if the output of a command-line utility has changed) reload with locking success # config reload Acquired lock on /etc/sonic/reload.lock Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y Disabling container monitoring ... Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock reload with locking failure # config reload Failed to acquire lock on /etc/sonic/reload.lock
Cherry-pick PR to 202405: #3497 |
What I did In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state. In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel. The file lock is binding to /etc/sonic/reload.lock. This is to fix issue: #19855 Microsoft ADO (number only): 28877643 Signed-off-by: Longxiang Lyu lolv@microsoft.com How I did it Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph. How to verify it UT and on testbed. New command output (if the output of a command-line utility has changed) reload with locking success # config reload Acquired lock on /etc/sonic/reload.lock Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y Disabling container monitoring ... Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock reload with locking failure # config reload Failed to acquire lock on /etc/sonic/reload.lock
What I did In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state. In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel. The file lock is binding to /etc/sonic/reload.lock. This is to fix issue: #19855 Microsoft ADO (number only): 28877643 Signed-off-by: Longxiang Lyu lolv@microsoft.com How I did it Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph. How to verify it UT and on testbed. New command output (if the output of a command-line utility has changed) reload with locking success Acquired lock on /etc/sonic/reload.lock Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y Disabling container monitoring ... Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock reload with locking failure Failed to acquire lock on /etc/sonic/reload.lock
What I did In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state. In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel. The file lock is binding to /etc/sonic/reload.lock. This is to fix issue: #19855 Microsoft ADO (number only): 28877643 Signed-off-by: Longxiang Lyu lolv@microsoft.com How I did it Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph. How to verify it UT and on testbed. New command output (if the output of a command-line utility has changed) reload with locking success Acquired lock on /etc/sonic/reload.lock Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y Disabling container monitoring ... Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock reload with locking failure Failed to acquire lock on /etc/sonic/reload.lock Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
What I did In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state. In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel. The file lock is binding to /etc/sonic/reload.lock. This is to fix issue: #19855 Microsoft ADO (number only): 28877643 Signed-off-by: Longxiang Lyu lolv@microsoft.com How I did it Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph. How to verify it UT and on testbed. New command output (if the output of a command-line utility has changed) reload with locking success Acquired lock on /etc/sonic/reload.lock Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y Disabling container monitoring ... Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock reload with locking failure Failed to acquire lock on /etc/sonic/reload.lock Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
What I did
In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state.
In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel.
The file lock is binding to
/etc/sonic/reload.lock
.This is to fix issue: #19855
Signed-off-by: Longxiang Lyu lolv@microsoft.com
How I did it
Add flock utility and decoate the
reload
andload_minigraph
with thetry_lock
to ensure the lock is acquired before reload/load_minigraph.How to verify it
UT and on testbed.
New command output (if the output of a command-line utility has changed)