-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort By Counters or Lists during GameServerAllocation 2716 #3091
Sort By Counters or Lists during GameServerAllocation 2716 #3091
Conversation
Build Succeeded 👏 Build Id: 08c46189-2374-438e-afaa-35de0518b5da The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
pkg/gameserverallocations/find.go
Outdated
@@ -40,6 +40,7 @@ func findGameServerForAllocation(gsa *allocationv1.GameServerAllocation, list [] | |||
var loop func(list []*agonesv1.GameServer, f func(i int, gs *agonesv1.GameServer)) | |||
|
|||
// packed is forward looping, distributed is random looping | |||
// TODO: Discuss if we change findGameServerForAllocation so all strategies use forward looping for the the pre-sorted list for each strategy once FeatureCountsAndLists gate is graduated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the new version, I think we can throw most of this complexity away, and do simple search and filter -- we don't need this fancy looping construct anymore, since it will be pre-sorted / pre-randomised.
(Also can then also do a duplicated function as well for counts and lists)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we have three paths A) Sort by Allocated and then loop through list. B) Sort by Allocated, then randomly shuffle list, and then loop through randomized list. C) Sort by CountsAndLists and then loop through list.
Where/when should I be randomizing? Should I make a 4th path D) Sort by CountsAndLists, then randomly shuffle list, and then loop through randomized list? Or are you suggesting a different set of paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought is this - for this part of the code -- the list should already be appropriately sorted when it is passed to it. So it should be a for .. range
operation.
Then for each strategy:
Distributed
- Randomise first (so that will give us a random distribution within each sorted Allocated/Ready section of the list we have)
- Sort by:
- Allocated
- Count/List as appropriate
Question on the above: Does K8s List already give us a generally random order of GameServers, such that we don't need to bother shuffling it? I'm actually not sure! May be worth checking.
Packed
Sort by:
- Allocated First
- Packed Strategy (basically what we have now)
- Tie break with counts and lists
I hope that makes things clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why randomize before sorting? I thought we had the entire list of Ready/Allocated game servers here, and not a subset?
If we're keeping our current definitions of Packed and Distributed, then it seems to make more sense to more or less keep the flow as-is:
- Sort by Allocated
Tie break sort by number of Ready servers on the Node
Tie break sort by Counts and Lists
Tie break sort by Game Server Name <- Separate discussion on this as it currently tie breaks on NodeName - If Distributed, randomize list
- Loop through list to find matching Game Servers (original sorted list for Packed, randomized list for Distributed)
A possible alternative would to first check for matches before sorting the Game Servers, and only sort for the Packed case:
- Filter List of Game Servers for Matches
- Return unsorted list of Matches for Distributed (not completely random, could be randomized before returning)
- Sort List (same as above) and return sorted list of Matches for Packed
Another possible alternative would to first check for matches before sorting the Game Servers, and then sort for both cases then randomize for distributed:
- Filter List of Game Servers for Matches
- Sort List (same as above)
Return sorted list of Matches for Packed
Randomize then return list of Matches for Distributed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One benefit to do sorting before matching is that we can return fast when we find the first match. If we do matching first, we have to go though all the GameServers to find out the ones that match, then perform sorting for them, and then return the first one. Though Mathematically speaking there is no difference on the time complexity (they both are O(nlogn) for sorting and O(n) for matching).
I kind of agree on Mark, the sorting/randomization should happen in ListSortedGameServers
. This gives us a central place to maintain different sorting logic (random is also a kind of sorting) for different scheduling strategy, and use the findGameServerForAllocation
purely to pick the first GameServer that match the selectors from the sorted GameServer list. What I'm not quite convinced is whether we should apply the sorting based on Allocated and Count/List after the randomize for distributed strategy. Right now, for distributed strategy, it looks like the although we perform the sorting based on Allocated and player tracking in ListSortedGameServers
, we shuffle the list eventually in findGameServerForAllocation
. I.e. it's a totally randomize without any sorting. So instead of "Randomize first -> Sort by: Allocated and Count/List as appropriate", maybe just a randomize in ListSortedGameServers
is good enough for distributed strategy. This also fit our definition of Distributed strategy better. ("Distributed" is aimed at static Kubernetes clusters, wherein we want to distribute resources across the entire cluster)
Another thought may outside of the scope. Can we be more smart for Distributed strategy? For example, Instead of using randomize to achieve distributed, can we sort on the GameServer count per node and pick the one node with least GameServers on it? I know a lot of other complexity will involve for the other allocation criteria, but just want to call out this basic thoughts which can make the distributed allocation deterministic. Distributed should not equal to Random, and Random sounds like a different placement/allocation strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the above being said, I think the current findGameServerForAllocation
totally works fine. We can leave all the refactoring outside this PR.
f2f7fd7
to
ca0fce5
Compare
Build Succeeded 👏 Build Id: 09c52bde-1174-4ff8-8fb3-fd750e99cda5 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
}) | ||
} | ||
|
||
if !(p.Order == Ascending || p.Order == Descending || p.Order == "") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming Order will be defaulted, so checking for "" is probably not something we need to do? (or want to if we have defaulting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done any defaulting. Where in the code should I do defaulting for this?
f267cb1
to
8418791
Compare
Build Failed 😱 Build Id: 5b55855c-a9ec-464c-818a-f783ef3a664a To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 08c0fe60-14a7-4e24-95fd-9bd91c156e3b To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
8418791
to
e3a669f
Compare
Build Succeeded 👏 Build Id: 8f81d742-e8e7-46be-80aa-64408fcc408a The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
e3a669f
to
924c6f1
Compare
Build Failed 😱 Build Id: 2b31ade1-b986-4ffc-a90c-1076463eeb31 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
924c6f1
to
8895853
Compare
// finally sort lexicographically, so we have a stable order | ||
return gs1.Status.NodeName < gs2.Status.NodeName | ||
return gs1.GetObjectMeta().GetName() < gs2.GetObjectMeta().GetName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markmandel I changed this to the ObjectMeta.Name instead of the Status.NodeName because the NodeName is not unique, so this didn't create a stable order. (I discovered this during test TestAllocationCacheListSortedGameServers as gs2 and gs4 are the same when comparing by NodeName.)
This change assumes that both 1) all Game Servers have ObjectMeta.Name, and 2) All ObjectMeta.Name are unique. Are these correct assumptions? If those assumptions are not correct, then this change would also not create a stable order. Would it be better to use the ObjectMeta.UID? Or would it be better to leave things as-is? Or should we leave it out for the sake of not making changes that are not behind a feature gate?
Build Succeeded 👏 Build Id: fd8bdc9a-849d-423b-857d-eebda4397fc9 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
8895853
to
7ada61e
Compare
4864d08
to
ac45472
Compare
Build Failed 😱 Build Id: ccc80805-9b8d-48e1-aa47-8c916fa1dadf To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
ac45472
to
6ca5667
Compare
Build Failed 😱 Build Id: 7157a452-c1e2-4d0b-9bc8-5068ad327887 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 48802fd3-b1aa-475d-8698-0d958111f929 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add unit test coverage for the new validate logic added in gameserverallocation.go
?
// finally sort lexicographically, so we have a stable order | ||
return gs1.Status.NodeName < gs2.Status.NodeName | ||
return gs1.GetObjectMeta().GetName() < gs2.GetObjectMeta().GetName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the assumptions are correct. https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#metadata
pkg/gameserverallocations/find.go
Outdated
@@ -40,6 +40,7 @@ func findGameServerForAllocation(gsa *allocationv1.GameServerAllocation, list [] | |||
var loop func(list []*agonesv1.GameServer, f func(i int, gs *agonesv1.GameServer)) | |||
|
|||
// packed is forward looping, distributed is random looping | |||
// TODO: Discuss if we change findGameServerForAllocation so all strategies use forward looping for the the pre-sorted list for each strategy once FeatureCountsAndLists gate is graduated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One benefit to do sorting before matching is that we can return fast when we find the first match. If we do matching first, we have to go though all the GameServers to find out the ones that match, then perform sorting for them, and then return the first one. Though Mathematically speaking there is no difference on the time complexity (they both are O(nlogn) for sorting and O(n) for matching).
I kind of agree on Mark, the sorting/randomization should happen in ListSortedGameServers
. This gives us a central place to maintain different sorting logic (random is also a kind of sorting) for different scheduling strategy, and use the findGameServerForAllocation
purely to pick the first GameServer that match the selectors from the sorted GameServer list. What I'm not quite convinced is whether we should apply the sorting based on Allocated and Count/List after the randomize for distributed strategy. Right now, for distributed strategy, it looks like the although we perform the sorting based on Allocated and player tracking in ListSortedGameServers
, we shuffle the list eventually in findGameServerForAllocation
. I.e. it's a totally randomize without any sorting. So instead of "Randomize first -> Sort by: Allocated and Count/List as appropriate", maybe just a randomize in ListSortedGameServers
is good enough for distributed strategy. This also fit our definition of Distributed strategy better. ("Distributed" is aimed at static Kubernetes clusters, wherein we want to distribute resources across the entire cluster)
Another thought may outside of the scope. Can we be more smart for Distributed strategy? For example, Instead of using randomize to achieve distributed, can we sort on the GameServer count per node and pick the one node with least GameServers on it? I know a lot of other complexity will involve for the other allocation criteria, but just want to call out this basic thoughts which can make the distributed allocation deterministic. Distributed should not equal to Random, and Random sounds like a different placement/allocation strategy.
6ca5667
to
e8d646b
Compare
Build Succeeded 👏 Build Id: 381d855b-36ed-4196-8c03-ee407227bcae The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Failed 😱 Build Id: cea29962-d9b3-40b6-ad40-a43196ea4e1f To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 930ad8cc-c190-42ef-8d4d-930430cc2f6d The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good set of initial changes 👍🏻 LGTM!
// finally sort lexicographically, so we have a stable order | ||
return gs1.Status.NodeName < gs2.Status.NodeName | ||
return gs1.GetObjectMeta().GetName() < gs2.GetObjectMeta().GetName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those assumptions are correct - so this sounds like a good change 👍🏻
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gongmax, igooch, markmandel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / Why we need it:
Which issue(s) this PR fixes:
Working on #2716
Special notes for your reviewer: