Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedulers - implement automatic reboot #8651

Open
mapellidario opened this issue Aug 26, 2024 · 3 comments
Open

schedulers - implement automatic reboot #8651

mapellidario opened this issue Aug 26, 2024 · 3 comments

Comments

@mapellidario
Copy link
Member

intro

We want to automatically reboot all crab schedulers at least once a year.

prerequisite

We should figure out how to be nice to HammerCloud on schedd restart if we want to automatically reboot our schedds: #7410

implementation

We had a brainstorming session in udine and so far the simplest thing that we could think of is

  • move the list of enabled schedds that TW should use from gitlab to puppet.
  • every schedd in the list should have a new parameter alongside the current "enabled: 0/1", and it should be the desired date of schedd reboot. make sure that there is no overlap in these dates! If you want to be fancy, you could also have a list of dates, in order to accommodate for multiple reboots per year.
  • TW reads the configuration and if the "(current time - reboot time) < 1week", then TW should stop using that schedd
  • add a daily cronjob to the schedd. if the "(current time - reboot time) < 2d" (using 2 days with a daily cronjob should avoid negative results, maybe they can be a problem in a bash script) then reboot the schedd. the reboot procedure should be
    • hold all the running dagmans, specifying "schedd reboot" as hold reason
    • condor_off
    • reboot
    • (condor_on should be automatic)
    • add a systemctl service unit, systemctl timer, hourly cronjob, whatever system you like, that releases all the hold dagmans that match the "schedd reboot" hold reason
@belforte
Copy link
Member

spam/fishingh/trojan... is coming here too :-( Is there something that we can do ?

@belforte
Copy link
Member

I used github "report" option. But would be better if we could prevent

@belforte
Copy link
Member

@aspiringmind-code @novicecpp @mapellidario not so bad. This is from GH.

Our review of the account(s) and/or content named in your report has concluded. We have determined that one or more violations of GitHub’s Terms of Service have occurred and have taken appropriate action in response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@belforte @mapellidario @aspiringmind-code and others