-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a default memory/CPU limit for operand #872
Comments
@objectiser, @pavolloffay, @rubenvp8510 , what do you think? |
Sounds like a good idea to me. Would be good to run performance/scaling tests though. |
@jpkrohling if the user limits memory for the collector <= 128MiB, this will lead OOM exception right? |
Right, if the container's limit is set to the same size as the max queue size, then it might eventually lead to OOM. Ideally, we could set the container's limit to 256 MiB and the max queue size to 128 MiB. |
@jpkrohling Thought the dynamic queue size enhancement would avoid such situations? |
The dynamic queue will calculate the number of items that fit into the queue based on the observed span size. The amount of memory to allocate is still up to the user, so, if the user sets 128 MiB for the queue alone, and 128 MiB for the container, then it will lack space for the other parts. In earlier iterations of the PR, we did have a logic that calculated the memory size based on the host's memory, but we decided to remove, as it's too much magic. |
@jpkrohling I think we can get memory limit from Is it a good idea to limit the container's memory, if not defined in the CR? |
I think that's the main question. If we do specify a default limit, like 256 MiB, then we can set the queue size in memory as well and add the auto scaler. Without a default limit, we can't know which value to set the queue size, and there's no way for the pod scaler to scale once "90% of the memory" is used up. If our default isn't suitable for a given instance, it can be explicitly set in the CR for that instance. |
Q: If there are no default resource values specified in the CR. How much memory/CPU is assigned? Q: Why cannot be autoscaling used without specifying the default resources? |
That's something to discuss. I suggested 256MiB for memory and 200m CPU, but it's really open so far.
It's not that it requires a default, it's more that it requires a value. When no value is set by the CR, we need to have a default to be used instead. |
I mean what is the current behavior when the resources are not defined in Jaeger CR. How much resources are allocated to jaeger? |
Then I do not understand how autoscaling work. If we expose a metric showing the number of allocated memory then we might get also a metric which shows pod's total memory. Cannot auto-scaler work with these two? |
I don't think there's a default Kubernetes limit in that case. If the pod is consuming more memory than the node can allocate, it will just crash with an OOM. When a limit is set, the pod won't be scheduled for low-memory nodes, which is why it's always a good idea to set the minimum memory requirements for every workload.
If a limit isn't set, there's nothing like "pod's total memory". |
thanks @jpkrohling, This makes it clear to me. I thought that k8s will allocate a default limit if nothing is provided. |
@jkandasa do you have some free cycles to run some experiments? When preparing a demo for HPA, I found that the the optimal configuration for 10 tracegen clients was a collector with 128Mi requested, with a limit of 512Mi. The queue-size-memory was 64Mi (half of the limit). This would make the collector scale up to 10 replicas and still process data. Ideally, we would test this scenario (or a variant) with @jkandasa's performance test framework, and check how many spans/second we can achieve without constantly dropping spans. |
@jpkrohling I do not have any lab machines to test this. For now, I have only OCP in the OpenStack environment.
512Mi is the resource maximum limit? |
Yes, if we specify that 512Mi is the upper boundary, Kubernetes will kill the pod once this limit is reached. |
Now that the dynamic queue size feature got merged into Jaeger, we could think about adding a memory limit to the Collector's deployment. This way, we could enable the autoscale feature by default, while having a sensible default.
I think we could start small, with 128MiB: with tracegen, it would result in a queue size of
428810
, as its average span size is313
.A sample Quarkus application, with a JAX-RS server span + JAX-RS client span + explicit span with one tag calling a second application following the same pattern yields a similar average (
305
), with a queue size adjusted to440058
.The text was updated successfully, but these errors were encountered: