-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added TransactionalGatherer with Cached implementation (+benchmark) #929
Conversation
I guess the "done" callback could work in principle. I have no better idea so far. (And that we now have to add yet another part (the transactional gathering) to the whole register/collect/gather contraption provides even more evidence that a rewrite/redesign will be beneficial… But yeah, this might be a way to introduce it into the current state of the package without breaking change.) Having said that, the code doesn't really seem to make use of the feature. The mutex of the Talking about the type rawCollector interface {
Collect() []*dto.MetricFamily
} And here is the type Gatherer interface {
Gather() ([]*dto.MetricFamily, error)
} So perhaps some things can be simplified. Look at the existing What's making me a bit dizzy is that it is hashing again without collision detection (there is a bunch of that in the existing I think the "single flight" problem can be solved with an I'm very tempted to help out with more detailed suggestions, but, once more, lacking enough time… So my apologies for just uttering a few unsorted thoughts. |
Thanks!
Yea I might have missed the extra mutex lock in CachedCollector. Thanks for pointers, I am back on this, this week, so will ping you when I will update it. Especially the simplifications you mentioned are promising. I was thinking a bit after the implementation and I am worried if cache logic should be part of client_golang or in the pieces that needs that (e.g advisor). Plus looks like in the wild we have two kind of cache updates:
|
On the |
Simplified API, added tests |
9d5935e
to
6e95a3d
Compare
Found a race, changing a bit |
Simplified the abstraction yet added possibility to use this cache for both event based and session based approach cc @dgrisonnet @beorn7 @kakkoyun PTAL (: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just skimmed. I'll continue to review it.
Let's hold on to this until we merge #974 |
I hope I can review this in the not too far future. My queue is just long, and gets preempted often by urgent tasks. |
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> update. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Attempt 2 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Added blocking registry, with raw collector and transactional handler. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Added fast path to normal (empty) registry to save 8 allocs and 3K5B per Gather. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Simplified API, added tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Fix. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Simplified implementation. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Added benchmark. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Optimized. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
This reverts commit 2fcaf51. Optimization was not worth it: benchstat v1.txt v2.txt name old time/op new time/op delta CachedTGatherer_Update/Update_of_one_element_without_reset-12 2.64µs ± 0% 4.05µs ± 0% ~ (p=1.000 n=1+1) CachedTGatherer_Update/Update_of_all_elements_with_reset-12 701ms ± 0% 358ms ± 0% ~ (p=1.000 n=1+1) CachedTGatherer_Update/Gather-12 535µs ± 0% 703934µs ± 0% ~ (p=1.000 n=1+1) name old alloc/op new alloc/op delta CachedTGatherer_Update/Update_of_one_element_without_reset-12 208B ± 0% 208B ± 0% ~ (all equal) CachedTGatherer_Update/Update_of_all_elements_with_reset-12 40.2MB ± 0% 41.1MB ± 0% ~ (p=1.000 n=1+1) CachedTGatherer_Update/Gather-12 48.6kB ± 0% 84.3kB ± 0% ~ (p=1.000 n=1+1) name old allocs/op new allocs/op delta CachedTGatherer_Update/Update_of_one_element_without_reset-12 3.00 ± 0% 3.00 ± 0% ~ (all equal) CachedTGatherer_Update/Update_of_all_elements_with_reset-12 6.00 ± 0% 4003.00 ± 0% ~ (p=1.000 n=1+1) CachedTGatherer_Update/Gather-12 1.00k ± 0% 2.01k ± 0% ~ (p=1.000 n=1+1)
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Ok, I think I finished optimizing this. I preferred memory efficiency over slight delay in updating and gathering. There is some room to improve it further, but it will add more complexity to the code. Current results (million metrics over 1k families):
I think it's more than good for cadvisor/kubelet usage and potentially anyone else using this cache. |
benchstat -delta-test=none v6.txt v9.txt name old time/op new time/op delta CachedTGatherer_Update/Update_of_one_element_without_reset-12 13.1ms ± 0% 0.0ms ± 0% -99.81% CachedTGatherer_Update/Update_of_all_elements_with_reset-12 309ms ± 0% 282ms ± 0% -8.77% CachedTGatherer_Update/Gather-12 422ms ± 0% 0ms ± 0% -99.95% name old alloc/op new alloc/op delta CachedTGatherer_Update/Update_of_one_element_without_reset-12 208B ± 0% 208B ± 0% 0.00% CachedTGatherer_Update/Update_of_all_elements_with_reset-12 2.47kB ± 0% 1.67kB ± 0% -32.56% CachedTGatherer_Update/Gather-12 52.8kB ± 0% 24.6kB ± 0% -53.34% name old allocs/op new allocs/op delta CachedTGatherer_Update/Update_of_one_element_without_reset-12 3.00 ± 0% 3.00 ± 0% 0.00% CachedTGatherer_Update/Update_of_all_elements_with_reset-12 0.00 0.00 0.00% CachedTGatherer_Update/Gather-12 1.00k ± 0% 0.00k ± 0% -99.60% Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Manage to optimize further - pretty good result:
|
While those results are great I found some pain points in cadvisor usage of it. So apparently different kinds of metrics can be returned given the HTTP parameters. That makes it difficult for any caching logic which has only "memory" of the last state. The solution we could do is to use two different caches (and live with allocating both) since we are lucky that only two options exist in cadvisor AFAIK. That is unfortunately not very sustainable, plus we keep mem for 2 caches. Probably better than it was with There is one thoughts if we want to improve this situation:
All of this makes me think that such cache, for now, belongs to the user, not client_golang library. We should probably add Usage can be seen here: google/cadvisor#2974 cc @dgrisonnet @beorn7 @kakkoyun NOTE: I am 2 weeks sick off from today, so I might need to park this off. Anyone else is welcome to continue on this! |
I think that the issue of having to maintain 2 caches is a problem for cadvisor to handle as it really depends on their use case. If there is really a lot of difference between the two versions of the metrics, then it makes sense to have two separate caches. But, as far as I can tell, this docker option allows filtering the metrics that are returned. So this could perhaps be improved by hooking into the Gather() method and filtering the results. Although that approach might come with a slight increase in CPU usage, it should be reasonable.
I personally think otherwise. For most use cases the cache will never change, it will always have the same format, and the way to access it won't change. There are only a very few cases where users might want to have a more optimized cache (e.g kube-state-metrics). That said, I think it is worth making the cache pluggable, by defining an interface based on how it is accessed by the Gatherers, so that any projects could bring their own caches if they want to and it would still be beneficial to the majority of users to have a cache implementation in client_golang that they could easily reuse.
I think that would be a great addition for registries that are strictly working with const metrics since their update may be event-based, but it will not work well with collectors. Taking that into account, an approach could be to use a queue to update the cache from the events: From the consumer perspective:
In a goroutine started beforehand:
That approach would require having a new type of registry whose sole role would be to look gather data from the cache, a queue to which users could push const metrics and a goroutine that would be responsible for poping metrics from the queue and syncing them in the cache. |
Makes sense 👍🏽
That's my point. This PR makes it pluggable in the sense that anyone can implement client_golang/prometheus/cache/cache.go Line 52 in 961f8ca
Sorry, I don't get this queue idea, not sure what's the benefits of it. I created API that have both collector and event-based approach (based on |
From a user perspective, it sounds complex to interact with this new API if there is no default cache implementation.
I missed that, I thought the implementation wasn't event-based yet. I'll have a new look at your PR. |
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Decided to close this for now. I will add another PR with just transactional interface. Cache was efficient, but API causes extra work (those Insert structs) and errors like reusing same arrays. Let's try to come with something better in cadvisor first, only then port here. |
Lot's of extra structs here, so feedback welcome.
Things to do still here: