-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set default resource limits for ES/Kibana/APM #1454
Comments
I would be in favor of option 1. |
I don't think if we expand the quickstart example, we would also need to remove the default requests (or limits). Personally I think I'm okay with both. As far as I know, the only performance downside of adding a default limit is that the pod will not be able to use >2Gi for the cache. But that's a relatively small downside and I am okay with it. Another downside: other resources that use pod templates (e.g. deployments, ssets) just create pods and don't set defaults, so you just end up with regular pod defaults. So this is a little bit of a departure from "normal" behavior for types that create pods. That said, we know a little bit more about our use case here than those more generic types, so I think it is justified. But just to spell out the upside of a default limit: For expanding the quickstart example, I think starting people closer to a "Production" configuration outweighs the extra verbosity. If people want to change it in the future, they don't need to come back to our documentation to look up how to set resource requests, they can just |
The main concern I have with adding some default behind the scene is that it could break the user experience when there are some Limit Ranges applied in the cluster (must confess here that maybe I'm more sensitive to that because I have used them to manage some projects in my previous shop) tbh I think that applying an empty podTemplate:
spec:
containers:
- name: elasticsearch
resources:
limits: {} I don't think that expanding the quickstart example is really that bad and I agree with @anyasabo that it's closer to a "Production" configuration, it is a "quick start" and a "good start" 😄 What should be improved imho is that the user should not have to manage the memory settings in 2 different places, So I would vote for 2 but in the mean time check if the Docker image could be improved to automatically adjust the JVM settings depending on the control groups applied to the container. |
|
I'm wondering if it could be worth adding an optional extra field to the ES CRD. See
Behaviour:
Benefits:
Concerns:
Note this does not prevent to move forward with Option 2 above as a first step. |
Hey, I'm kind of late to the game here, but wanted to voice my opinion on this. While I understand we need a good solution here, I don't think I like option 2. that we've decided on, because:
Maybe we should we have ECK production checklist similar to the one ES has? It's also possible that I'm over sensitive on the config looks and that it doesn't matter/is obvious for someone that is already familiar with ES :) I really like the suggestion @sebgl made. It's clean, incorporates the logic behind recommended ratios, while still allows to be more precise/verbose using the pod template. |
Trying summarise the discussion and the arguments: Two (and a half) solutions were suggested:
2 a. remove all defaulting and encourage users to set resource requests and limits and JVM options themselves
2 b. introduce a new attribute "memory" (in addition to removing the defaults)
Thoughts
tl,dr
+1 for now (and let's maybe discuss our approach to defaults) |
Thoughts:
If the failure scenarios for these defaults on k8s (memory limit 2gb) are obvious kinds of failures, such as a pod fails to schedule, etc, then I think it makes sense. Having no memory limit by default seems dangerous and asking for a PoC to start failing unpredictably due to pod evictions, etc, especially with scenarios like #1716. Each time I've tried to abstract things in Kubernetes, it feels like doing it wrong. Kubernetes yaml is, typically, painfully verbose, even when using templating tools like Helm. I've experimented with Jsonnet to help abstract things ("memory 2g" would set memory request+limit to 2g and JVM Heap to half that), and that helps quite a bit, I feel there's little we can do automatically in a Kubernetes resource definition. When I look at the helm chart, there's a vast array of configurations such that abstraction isn't even really possible with all the templating values choices it supports. While I'd love to set "cpu X memory Y" and compute the rest, I don't know if that really aligns with how users expect Kubernetes operators to behave (I have no experience there). From helm charts, I find abstractions like this are uncommon, which is frustrating for me as a user since I kind of revel in such abstractions and want to build them when they don't exist. Focusing on abstractions, here's what I had experimented with in jsonnet:
This would create a cluster with the default replica count (statefulset, replica count 1). This also included an abstraction for disk (which created a volumeClaimTemplate as appropriate). As much as I want/need abstractions (to minimize mistakes), it's possible a Kubernetes resource is the wrong level to attempt such abstractions given there are so many different Kubernetes environments with weird, wild, and unforeseen configurations under which abstractions will fail. My current leaning is, therefore, to set defaults (1gb JVM heap, 2GB memory request+limit, some appropriate CPU if needed, etc). |
Doesn't not setting any defaults, but adding requests/limits in the quickstart accomplish the same goal but with less downside (e.g. having to read how to set it in the docs rather than starting off with an example you can just edit right there, us having to maintain defaulting code, interaction with limitranges)?
That's true, but it's also the case for any native resources you spin up too. That said, we have knowledge of what our application requires (which |
We had some more discussions offline and the decision is
|
#1810 applies the default 2Gi memory limit for Elasticsearch. However, since we don't apply any CPU requirements, the pod still ends up in QoS Burstable. Which means it will be evicted before Pods with QoS Guaranteed. |
Currently, we:
With no default limits, the Elasticsearch pod can be evicted from a k8s host if the host is out of memory. For stateful workloads such as Elasticsearch, this can be pretty bad.
We discussed 2 approaches to solve this:
1. Set default memory requests AND default memory limits
Just enforce 2GB memory (requests and limits) for the ES pod. We already do this for requests, so that's just also adding it for limits.
2. Remove any default and encourage the user to set one
We would remove our default memory request, and ask the user to provide one. Probably changing our quickstart example from:
to:
The text was updated successfully, but these errors were encountered: