-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kafka_consumer] Add kafka support for monitor_unlisted_consumer_groups
#2730
[kafka_consumer] Add kafka support for monitor_unlisted_consumer_groups
#2730
Conversation
- another_zookeeper:2181 | ||
# zk_iteration_ival: 1 # how many seconds between ZK consumer offset | ||
# collections. If kafka consumer offsets disabled | ||
# this has no effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this param as it will be natively supported at the instance
level in v6 of the agent. Since it unnecessarily complicated the code + user experience when configuring the check, I proactively removed it. It is backwards compatible as it will simply be ignored if specified.
Got this finished up and working on our clusters. Let's see what Travis has to say... |
Travis appears to be passing with Kafka clusters >= 0.10, and failing otherwise. This is due to |
Thanks so much for all this great work 💯 🥇 |
Thanks for giving it a look-see. While I value tests, the bad news is that I will not have time to add them anytime soon. Some projects have come up at work that will be sucking up all available time the next few months, and since this is working great on our clusters, I can't spend any more work time on it right now. And my personal coding time is probably best spent fixing No objections from me if you have to the time/inclination to add tests yourself. Otherwise, when I eventually get the time I will add them, but it could be a while. |
Fare prioritization :) Let's leave this open for the time being. |
This adds support for dynamically discovering/fetching metrics for unlisted consumer groups that store their offsets in Kafka. Previously, `monitor_unlisted_consumer_groups` only worked for consumers offsets stored in Zookeeper. There are a number of edge cases to adding this feature, see the full decision tree in `conf.yaml.example` to understand how the check decides which offsets to fetch from which source. In order to accomplish this, I had to switch from `kakfa-python.KafkaClient` to the new [`kafka-python.KafkaAdminClient`](https://kafka-python.readthedocs.io/en/master/apidoc/KafkaAdminClient.html). In addition to allowing new functionality, this also resulted in removing a lot of code that was now superfluous.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. Note that the issue will not be automatically closed, but this notification will remind us to investigate why there's been inactivity. Thank you for participating in the Datadog open source community. |
@jeffwidman just a quick ping re: this PR. |
I am unlikely to be able to spend more time on this anytime soon, as it works great in our environment. You guys might want to take a second look at this though, as it resolves several complex/possibly buggy bits in the current version of the check related to how group coordinators are identified. I'm surprised none of your other customers are requesting that |
Hi @jeffwidman We updated tests a little on origin/master to increase confidence on these changes (sorry it took us so long). Basically we made sure we were asserting all metrics #3468 . Would you be able to solve conflicts? then we should be able to merge :) |
@gzussa I'm happy to fix conflicts, but the problem will be that the kafka-python's API that's used here will break on brokers < 0.10. The actual kafka protocol calls still work on older brokers, it's just a check within The current version of the check used in the Datadog Agent doesn't use the So it's a big win to switch to relying on the logic already built into But in order to do so, either that upstream library needs patching to not throw the error for brokers < 0.10, or this check needs to drop support for clusters < 0.10... It'd actually simplify the check greatly if you were willing to drop support for brokers <0.10 (or even simpler if you dropped support for < 0.10.2) because in addition to the broker communication being simpler, you could also simply drop support for zookeeper-based offsets as those were deprecated in the Kafka ecosystem years ago. It'd be useful if you knew how many of your customers were using the old versions to know whether it's safe to do that or not... my suspicion is your product manager won't be on board with dropping support for old versions, in which case it's better to patch |
Thanks for all these details. That is very useful! Maybe another approach would be to use both libraries behind an abstract factory pattern. A good example of this can be found in the |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. Note that the issue will not be automatically closed, but this notification will remind us to investigate why there's been inactivity. Thank you for participating in the Datadog open source community. |
Closing in favor of #3957 which offers the same functionality, but has better error handling, cleaner separation between legacy vs new code path, and also uses callbacks to do all the I/O in parallel for much faster querying. |
…_unlisted_consumer_groups` (#3957) * Add Kafka support for monitor_unlisted_consumer_groups This is a major refactor of `kafka_consumer` check with a new code path that will exist alongside the legacy code path. The new code path: 1. Eliminates legacy cruft: * drops support for fetching consumer offsets from zookeeper * drops support for brokers < 0.10.2 which do not support fetching all known offsets for a consumer group * migrates from the hand-rolled group-coordinator lookup code, retries, etc to relying on `kafka-python`'s `KafkaAdminClient` to handle all that under the covers 2. Adds support for `monitor_unlisted_consumer_groups` to fetch consumer offsets from Kafka 3. Refactors from sync to async using callbacks for improved performance, especially with larger clusters and more consumer groups. To clarify: Fetching consumer offsets from older brokers and from zookeeper is still supported, it just uses the legacy code path instead of this new code path. This has been under discussion in various forms for over two years, see #423 and #2730. The main blocker was coming up with a clean way to support legacy features like fetching from Zookeeper / older brokers while simultaneously migrating to `kafka-python`'s new `KafkaAdminClient` which only supports brokers >= `0.10.0`. After several rounds of discussion, the decision was to move the legacy code into a separate file and start from scratch with a new implementation of the check which would be the primary one for new features moving forward. * 1.4.7 was released * Update agent_requirements.in * for now..
This adds support for dynamically discovering/fetching metrics for
unlisted consumer groups that store their offsets in Kafka.
Previously,
monitor_unlisted_consumer_groups
only worked forconsumers offsets stored in Zookeeper.
There are a number of edge cases to adding this feature, see the full
decision tree in
conf.yaml.example
to understand how the check decideswhich offsets to fetch from which source.
In order to accomplish this, I had to switch from
kakfa-python.KafkaClient
to the newkafka-python.KafkaAdminClient
.In addition to allowing new functionality, this also resulted in
removing a lot of code that was now superfluous.
Note:
I suspect this will not work on clusters < 0.10.0 due to some bits of code in upstream kafka-python bombing out if the AdminClient is used against older Kafka clusters. So that may prevent merging unless upstream is patched to allow that.