-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup hosts marked as missing on startup #2240
Conversation
SingularityService/src/test/java/com/hubspot/singularity/scheduler/SingularityMachinesTest.java
Outdated
Show resolved
Hide resolved
PR looks like it will fix those. There is one addition we can make to the cleanup as well. A little context first:
So, I think we ca, in addition, clean up anything in InactiveAgentManager's list that is 1) being deleted by the checks you already have above or 2) older than some amount of time (likely longer than the AgentManager threshold, more like a few days). This will serve to also clean up the large Inactive Agents list you see as the very bottom table in the Singularity UI agents page |
checkDeadSlaves(); | ||
checkInactiveSlaves(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally missed this in earlier PRs. Can you do a slaves -> agents rename as you go through this PR please. Really thought I had caught most of those already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May do more of this in a separate PR; upon renaming tests, the order that the tests run changes and causes some more test failures likely due to interference between tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me, assuming a host that's missing on startup and cleaned up can be safely reintroduced later on.
SingularityService/src/test/java/com/hubspot/singularity/scheduler/SingularityMachinesTest.java
Outdated
Show resolved
Hide resolved
inactiveAgentManager.cleanInactiveAgentsList( | ||
System.currentTimeMillis() - | ||
TimeUnit.HOURS.toMillis(configuration.getCleanInactiveHostListEveryHours()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up anything in InactiveAgentManager's list that is 1) being deleted by the checks you already have above or 2) older than some amount of time (likely longer than the AgentManager threshold, more like a few days)
@ssalinas should this be addressing point 2 or are they different mechanisms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this looks like it should be doing the 'anything old than X hours' part already
🚢 |
This PR modifies the
SingularityAgentReconciliationPoller
to delete hosts that are marked as missing on startup. Previously, only hosts marked as dead were deleted by the poller, causing inactive hosts marked as missing on startup to still appear in Singularity.Hosts that are dead or missing on startup are deleted from both the
AgentManager
andInactiveAgentManager
lists to cleanup inactive hosts in the Singularity UI.https://git.hubteam.com/HubSpot/PaaS-Run/issues/1525