-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed determine/process reboot-cause service dependency #17406
Fixed determine/process reboot-cause service dependency #17406
Conversation
Signed-off-by: anamehra <anamehra@cisco.com>
@StormLiangMS can you merge this for 202305 |
@yxieca please help cherry pick to 202205 |
@anamehra please help raise a ticket for 202205 branch directly. I don't think the automation will cherry-pick PR from feature branch to feature branch. |
@StormLiangMS Please help review/merge this to 202305. |
@anamehra , looks like automation is not able to cleanly pick this PR into 202405 branch. can you please submit a separate PR directly under the 202405 branch and link this PR to it. |
Hi @anamehra , I just realized that this PR you raised (#17406) |
Hi @gechiang , I raised this for 202405 only due to a discussion in the community forum. There is PR with 'Uphold' fix and I am waiting on responses to my queries on that before I raise a PR on master. My original PR in master was initially rejected to use uphold. |
…utomatically (#19415) #### Why I did it src/sonic-host-services ``` * 02d9b55 - (HEAD -> master, origin/master, origin/HEAD) Added support to render template format of `delayed` flag on Feature Table. (#135) (28 hours ago) [abdosi] * 60fdfea - Fixed determine/process reboot-cause service dependency (#17406) (#132) (13 days ago) [anamehra] ``` #### How I did it #### How to verify it #### Description for the changelog
…utomatically (#19551) #### Why I did it src/sonic-host-services ``` * aea0bef - (HEAD -> 202405, origin/202405) Ignore sonic_platform package fileNotFoundError on non-chassis vs platforms (#133) (#140) (4 minutes ago) [mssonicbld] * 0e7e4d5 - Added support to render template format of `delayed` flag on Feature Table. (#135) (#137) (11 days ago) [mssonicbld] * 235c2a4 - Fixed determine/process reboot-cause service dependency (#17406) (#132) (13 days ago) [anamehra] ``` #### How I did it #### How to verify it #### Description for the changelog
…utomatically (sonic-net#19415) #### Why I did it src/sonic-host-services ``` * 02d9b55 - (HEAD -> master, origin/master, origin/HEAD) Added support to render template format of `delayed` flag on Feature Table. (sonic-net#135) (28 hours ago) [abdosi] * 60fdfea - Fixed determine/process reboot-cause service dependency (sonic-net#17406) (sonic-net#132) (13 days ago) [anamehra] ``` #### How I did it #### How to verify it #### Description for the changelog
…utomatically (sonic-net#19415) #### Why I did it src/sonic-host-services ``` * 02d9b55 - (HEAD -> master, origin/master, origin/HEAD) Added support to render template format of `delayed` flag on Feature Table. (#135) (28 hours ago) [abdosi] * 60fdfea - Fixed determine/process reboot-cause service dependency (sonic-net#17406) (#132) (13 days ago) [anamehra] ``` #### How I did it #### How to verify it #### Description for the changelog
Signed-off-by: anamehra anamehra@cisco.com
Why I did it
Fixes #16990 for 202305/202205 branch
Note: This PR is for 202305 and 202205. For master, a new PR will be raised with a new field (Uphold=) provided by debian bookworm to handle the dependency failure restartability of the processes.
determine-reboot-cause and process-reboot-cause service does not start If the database service fails to restart in the first attempt. Even if the Database service succeeds in the next attempt, these reboot-cause services do not start.
The process-reboot-cause service also does not restart if the docker or database service restarts, which leads to an empty reboot-cause history
deploy-mg from sonic-mgmt also triggers the docker service restart. The restart of the docker service caused the issue stated in 2 above. The docker restart also triggers determine-reboot-cause to restart which creates an additional reboot-cause file in history and modifies the last reboot-cause.
This PR fixes these issues by making both processes start again when dependency meets after dependency failure, making both processes restart when the database service restarts, and preventing duplicate processing of the last reboot reason.
Work item tracking
How I did it
How to verify it
On single asic pizza box:
On Chassis:
Let database service on LC fail the first time. determine-reboot-cause and process-reboot-cause would fail to start due to dependency failure
start database-chassis on Supervisor. Database service on LC should now start successfully.
Verify determine-reboot-cause and process-reboot-cause also starts
Verify show reboot-cause history output
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)