-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The zoo is under new management (zk cleanup) #1922
Conversation
We weren't using the correct code paths to hit the leader cache for slave + rack data, causing lots of unneeded zk calls for those. Fixed in this PR now |
/fixes #1823 |
SingularityService/src/main/java/com/hubspot/singularity/data/AbstractMachineManager.java
Show resolved
Hide resolved
if (leaderCache.active()) { | ||
return leaderCache.getRack(rackName); | ||
return leaderCache.getRack(rackId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using both rackName
(in SingularityLeaderCache
) and rackId
. We should pick one for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consolidating on rackId 👍
List<String> currentActive = curatorFramework.getChildren().forPath(ACTIVE_STATUSES_ROOT); | ||
for (String taskIdString : currentActive) { | ||
SingularityTaskId taskId = SingularityTaskId.valueOf(taskIdString); | ||
String oldPath = ZKPaths.makePath(ACTIVE_STATUSES_ROOT, taskIdString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're running this code based on whether ACTIVE_TASKS_ROOT
exists in ZK, but we're never actually reading from that path during this migration. Just double-checking, is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We remove it as the last step of the migration which is why it's my initial check. There isn't actually any data in those nodes, they are only an index we call getChildren on. The /tasks/statuses is where the actual data is and what I am namespacing. However, realizing that if this died half way though, some of the children on the next run would be request ids instead. Going to add a fix so this will be able to run on a half-migrated dataset as well
@@ -153,6 +153,9 @@ public void subscribed(Subscribed subscribed) { | |||
heartbeatIntervalSeconds = Optional.of(advertisedHeartbeatIntervalSeconds); | |||
} | |||
|
|||
// Should be called before activation of leader cache or cache could be left empty | |||
startup.checkMigrations(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect that this wouldn't have run at startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only runs when we take leadership, not as a Managed class. Moved it here since we need to migrate before filling leader cache, since leader cache is using the newer taskManager at this point
} | ||
|
||
saveResult(Optional.of(response.getStatusCode()), responseBody, Optional.<String> absent(), Optional.<Throwable>absent()); | ||
public void setHealthcehckUri(String healthcheckUri) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fixed
@@ -35,7 +35,7 @@ | |||
<basepom.release.profiles>oss-release</basepom.release.profiles> | |||
<dep.classmate.version>1.3.1</dep.classmate.version> | |||
<dep.commons-lang3.version>3.7</dep.commons-lang3.version> | |||
<dep.curator.version>2.12.0</dep.curator.version> | |||
<dep.curator.version>4.2.0</dep.curator.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet glory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
@@ -203,20 +222,33 @@ private String getHistoryUpdatePath(SingularityMachineStateHistoryUpdate history | |||
return ZKPaths.makePath(getHistoryPath(historyUpdate.getObjectId()), historyChildPath); | |||
} | |||
|
|||
public void clearOldHistory(String machineId) { | |||
List<SingularityMachineStateHistoryUpdate> histories = getHistory(machineId); | |||
histories.sort(Comparator.comparingLong(SingularityMachineStateHistoryUpdate::getTimestamp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to reverse this sort, or the skip()
below will keep the oldest n
entries instead of the newest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reversing the sort. Though I'm realizing we don't even really expose this data anywhere... Maybe don't even need it
🚢 |
Be much nicer to zookeeper by avoiding possible large payloads, specifically:
Still TODO on this PR: