-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add podMetadata
to Allocation API
#2975
Comments
What does "not work properly" mean to you here?
The reason we never did this in the Agones code is because of the possibility for race conditions. It's entirely possible that a node shutdown could occur around the same time as an allocation -- so this seems very risky to me.
Can you expand on this? Why is this more useful from a telemetry perspective? |
Additionally, our sessions are deleted when the game ends. |
🤔 I am genuinely wondering if what you are seeing is a symptom of #2974 (which just got caught and fixed), and your solution was to allow the CA to re-pack GameServer pods?
I am wondering how you can ever escape the race condition? Even if you get the latest node information directly from the k8s api (not through an informer, which would put extra load on the control plane, which would affect throughput and scaling speed), you could still have:
Unless you have a fancy way of implementing this I'm not thinking of? @zmerlynn (who is on leave atm, so may take some time to respond), has been doing a lot of work on managing eviction, maybe you have thoughts?
I'm assuming you mean each
🤔 Is there another way you could expose this information up to Datadog? Since the game server binary already has the label information, could it append it to the metrics service somehow? I say this because we have to be cognisant of how much load we put on the control plane, since that directly affects scaling time and allocation throughput. Copying metadata from the GameServer to the Pod just for metrics seems like extra load we should try and avoid if there another way for you to expose that data to metric collection agents. |
You are right. I mentioned the issue after looking at the document and code that the predecessor had written. The document also mentioned avoiding tainted nodes "as much as possible" to prevent race conditions.
The reason I created this FR is actually not because of the safe-to-evict issue, but because of this issue. As I mentioned before, a lot of metadata is determined at allocation time for the game that I am servicing.
The solution I proposed was to add podMetadata to the Allocation API. This item is optional, so in most cases, there should be no problem. |
I'm not as familiar with the datadog api as you are, but, just looking through their docs: Could you instead write a sidecar/small controller for https://docs.datadoghq.com/api/latest/tags/#update-host-tags that looks at GameServer state through the Agones SDK and updates your Pod's tags that way? I'm just trying to avoid control plane churn. It sounds like a good third party project though. |
Long term, resetting the With higher permitted |
I put the change #2974 you mentioned to the code and tested it, and the results showed no difference in scaling performance from the control (blue is control, yellow is experimental) Regarding the pod metadata, I will try to add the logic to access the K8S API elsewhere as you suggested. Thank you. |
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
I encountered two issues while using Agones that could be resolved with a new feature.
Firstly, in my game, a very large number of sessions are used, and setting the safe-to-evict flag to false in the pod specification causes the cluster autoscaler (CA) to not work properly, resulting in wasted costs. However, setting safe-to-evict to true is also not a viable solution, as session durations vary from 20 minutes to 3 hours after allocation.
So, we added a code modification to the Agones allocator to change the safe-to-evict annotation to false at the moment of allocation, which resulted in a branching of the official Agones code.
Secondly, in my game, various information such as the map and match type is determined at the moment of allocation and not at the creation of the GameServer. I know that it is currently possible to modify the metadata of the GameServer. However, modifying the metadata of the Pod is currently not possible, and this metadata is more useful than the GameServer's metadata from a telemetry perspective.
Describe the solution you'd like
A clear and concise description of what you want to happen.
I propose that a new feature be added to Agones that would allow the modification of the metadata of the Pod during allocation time.
This can be achieved by adding the ability to modify the label and annotation of pods through the Allocation API.
ex) Add podMetadata to Allocation API.
With this feature, safe-to-evict can be dynamically changed from the annotation, and multiple data points determined at allocation time can be reflected in the Pod's metadata.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
An alternative solution could be to add the syncGameServerMetadataWithPods option to the GameServer resource and features to the controller.
If this feature is added, modifying the metadata of GameServer at allocation time will reflect it in pod, so you can see the same effect.
Additional context
Add any other context or screenshots about the feature request here.
I have considered developing this function myself.
The text was updated successfully, but these errors were encountered: