-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't execute otel collector if configuration is "noop" #33680
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This would be nice improvement in more resource restricted environments. I also think, beyond this being an initial state, it would be nice if the opamp server could also send an empty config (e.g. an empty configmap) to stop running the collector until it gets another config. |
Currently, bootstrapping operates as described below (excerpt from the documentation): BootstrappingTo obtain the remote configuration from the OpAMP Backend, the Supervisor must send an AgentDescription to the Backend. Initially, the Supervisor doesn't have this information because the AgentDescription becomes available only after the Collector process is started and the AgentDescription is sent from the opamp extension to the Supervisor. However, it's impossible to start the Collector without a configuration. To address this issue, the Supervisor starts the Collector with a "noop" configuration that doesn't collect any data but allows the opamp extension to start. The "noop" configuration consists of a single pipeline with an OTLP receiver listening on a random port, a debug exporter, and the opamp extension. The purpose of this "noop" configuration is to ensure that the Collector starts and the opamp extension communicates with the Supervisor. Once the initial Collector launch is successful and the Supervisor receives the remote configuration, the Supervisor restarts the Collector with the new configuration. The new configuration is also cached by the Supervisor in a local file. This caching means subsequent restarts no longer need to use the "noop" configuration. It also allows the Supervisor to start the Collector without waiting for the OpAMP Backend to provide the remote configuration, mitigating any OpAMP Backend unavailability. I don't understand why the AgentDescription needs to be managed specifically by the Collector, requiring the Collectors to start in order to connect the Supervisor to an OpAMP Backend. This dependency seems to have only disadvantages, especially if the Supervisor needs to manage multiple Collectors. This will be a particular limitation if the Supervisor has to manage many Collectors simultaneously (see issue #33682). Why is the Collector considered the agent in a setup where the Supervisor is used? It would make more sense for the Supervisor to be the agent of OpAMP, with any connected Collector being transparent to the OpAMP Backend. Collectors should represent "any" host system as part of a subsystem registered via the Supervisor. In IoT systems, the Supervisor would act as an Hub, which serves as a gateway for all connected Collectors to connect to an OpAMP Backend that they cannot connect to directly for various reasons. |
The assumption is that users want to manage their Collector not their Supervisor. Supervisor is just the means to do it. OpAMP server needs to know what Collector it is managing so that it supplies the right configuration for example. And knowing what Collector it is requires receiving an AgentDescription that correctly describes the Collector (e.g. Collector's version number). The Supervisor does not have this knowledge and uses the bootstraping process to get that information from the Collector. |
The supervisor must always be aware of the presence of a collector. However, certain registration details, initially set and persistently maintained like the agent ID, should not change, as the supervisor manages the collector. Bootstrapping can be done without any prerequisites besides the supervisor. This means the collector is downloaded and installed the first time, and the supervisor has information about its capabilities (processors, extensions, receivers, exporters, etc.) through descriptive metadata (e.g., ocb build.yaml). Thus, the supervisor doesn't need to execute the collector to understand its characteristics. Alternatively, if the collector is already installed (managed by a third-party update) and the "opamp update feature" is off, descriptive metadata—available without execution but requiring maintenance or a persisted state file—is used. This metadata is established after the initial setup, similar to the agent ID. Somebody could even create the file and therefore even skip this creation by running the collector at least one time to describe itself. ALso the metdata could be delivered with the exe download by the opamp backend. The supervisor should cache this metadata for each agent. Permanent execution of collectors is not mandatory; instead, the supervisor initializes essential groundwork and can start the collector when necessary, optionally based on configuration changes (e.g., when cfg!=noop). This approach also supports future scenarios where one supervisor may manage multiple collectors, like e.g. the new profiling eBPF client donated by elastic |
The implementation of the Supervisor currently follows this design. What you are describing appears to be a different design. If you would like to propose an alternate design please post a complete design document so that it can be considered by Supervisor maintainers. (Please note: I do not know if the alternate design will be considered and whether it will be accepted, it may be worth attending a Collector SIG to gauge the interest first). |
The idea about "design changes" just came up because i have no idea how else to implement "to not run the collector until cfg chnages arrives which is !=noop, do you? |
I think the idea is we keep the bootstrapping logic to get the agent description (this is a very quick, less than a second run of the collector on startup of the supervisor), then we would simply not start the long-running collector process if we don't have a config. Does that make sense? |
According to @evan-bradley Bootstrapping:
related #32554 |
Bootstrapping does not require any connection to an outside OpAMP server. It connects to an OpAMP server that is internal to the supervisor, the communication during bootstrapping is only between the collector and the supervisor. Bootstrapping also is not to generate an agent ID (the supervisor actually generates the UUID), but rather the AgentDescription message, which contains metadata about the agent (e.g. the "name" of the agent, the version of the agent) that the supervisor doesn't necessarily know without somehow executing the collector. Bootstrapping is only concerned with getting this AgentDescription message, so once the message is received, the supervisor can (and currently does) stop the collector. Edit to add: |
Tx for clarification |
Is there already a decision on what criterai it is decided if the collector kept running or terminated until another cfg is sent with is !=noop or not empty Alternative A: Alternative B: condition of below rules is true
AND
AND extension: [] --
|
I personally like the empty config map solution. To me it seems natural to expect that having no config to run implies not running anything. |
…ded (#35430) **Description:** <Describe what has changed.> If an empty config map is received, the supervisor does not run the agent. ~The current logic here works fine, but I'm considering adding an option to only do this if the user opts into it. I'm not sure if there's a reason why a user might want to run the collector with the noop config though (maybe for the agent's self-telemetry?)~ I've thought about it some more, and I don't think we need a config option here. If users want the collector to use a noop config, they can send a basic noop config. I think we should also implement #32598 (closed as stale, we'll want to re-open this or open a new issue for it), which would allow users to configure a backup config to use when no config is provided by the server, if they would like. **Link to tracking Issue:** Closes #33680 **Testing:** e2e test added Manually tested with a modified OpAMP server to send an empty config map **Documentation:** Update spec where it seemed applicable to call out this behavior.
…ded (open-telemetry#35430) **Description:** <Describe what has changed.> If an empty config map is received, the supervisor does not run the agent. ~The current logic here works fine, but I'm considering adding an option to only do this if the user opts into it. I'm not sure if there's a reason why a user might want to run the collector with the noop config though (maybe for the agent's self-telemetry?)~ I've thought about it some more, and I don't think we need a config option here. If users want the collector to use a noop config, they can send a basic noop config. I think we should also implement open-telemetry#32598 (closed as stale, we'll want to re-open this or open a new issue for it), which would allow users to configure a backup config to use when no config is provided by the server, if they would like. **Link to tracking Issue:** Closes open-telemetry#33680 **Testing:** e2e test added Manually tested with a modified OpAMP server to send an empty config map **Documentation:** Update spec where it seemed applicable to call out this behavior.
Component(s)
cmd/opampsupervisor
Is your feature request related to a problem? Please describe.
Reduce overhead of overall runtime footprint in large fleets with a default of "wait and listen for commands" but being not operational sending telemetry, do not execute the collector
Describe the solution you'd like
Imagine a scenario where the supervisor is installed as basic part of a host (container or device) broadcasting DNS and searching for an OPAMP Backend until connected.
There no local "non default" config for the collector setup, just the default "Noop" cfg which would not send any telemetry but health of the collector.
The supervisor is just waiting to get connected to opamap Backend and afterwards waiting for a configuration update from remote for the collector.
To reduce overhead until collect receives a "job" the supervisor shall no execute the collector at all. As soon as an config is sent which overwrites noop default, only then execution (deamon) shall be started
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: