-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start order of extensions is not guaranteed? #8732
Comments
I'd be happy to work on this one. |
I'm not sure this is a bug. I think this was implemented intentionally. All extensions that we have in core and contrib do not depend on each other. This assumption is used in some discussions around config grouping/merging, e.g.: open-telemetry/opentelemetry-collector-contrib#18509 and #8394. I think we need a strong argument to introduce this behavior. @yurishkuro do you have an extension that depends on another extension? Can you please let us know why it's needed in your use case? |
The use case is explained here: https://docs.google.com/document/d/1s4_6VgAS7qAVp6iEm5KYvpiGw3h2Ja5T5HpKo29iv00/view#heading=h.1ded8prqa1xm flowchart LR
E[Exporter] -->|Get storage| S(StorageExtension)
Q[Jaeger-Query extension] -->|Get storage| S(StorageExtension)
I believe this may also be required by the Encoding extensions (#6272), which is a similar use case. The alternative to declaration order would be allowing to express dependencies explicitly and then building a dependency graph to control the initialization order. |
@djaglowski You mentioned service graph in open-telemetry/opentelemetry-collector-contrib#18509 (comment), is there a way to declare dependencies between extensions into that graph? |
The encoding extensions don't do much with their lifecycle, since the plan is just to cast them to marshaler/unmarshaler types. I'm more worried that our current documentation is making a promise it is not keeping. I don't really see the harm in being deterministic in the start/shutdown order, it might pay off to have the pprof extension load before others for example. |
it's not about their lifecycle, but about other components that depend on them, some of which may also be extensions. |
I agree. Current non-determinism leads to difficult to troubleshoot bugs - it took me a while to realize that this was the root cause of the failures I encountered. |
Not currently. The relationships between pipeline components are very rigidly defined, but the way extensions are used by pipeline components is more of a convention. e.g. the component defines a parameter called I agree extensions should be started deterministically. Since the current behavior is non-deterministic, users already must implicitly accept that extensions may start in any order, including the order we would choose deterministically. Would it be sufficient to start extensions in order, then start pipeline components as we do now? Stopping would be the opposite - pipeline components, then extensions in reverse order. |
That would work for me, and what the open PR is doing. |
If users would have to point the jaeger_query extension and the exporter to the same storage extension, maybe it can be just one extension (jaeger_query or any other name) implementing both serving the query path and the storage interface that exporters can use. The storage part of the extension can be configurable to use any storage under the hood and provide it to the exporter via the storage interface. I believe, that way, we would avoid users misconfiguring exporter and the jaeger_query extension to use different storages. WDYT?
I'm just worried that by introducing this behavior, we limit possible options for grouping/extending configs which are currently being discussed in different issues. Also, if we accept this change and decide to integrate the extensions into the pipeline graph later, it'll be a breaking change for the users expecting the order of initialization according to the config definition. And given that that order is currently not applicable to receivers and exporters, this change doesn't bring any consistency. |
That doesn't work well. First, traditional
What about processors, isn't their order enforced? |
I was talking only about embedding storage extension interface into the jaeger_query extension. It doesn't require bringing UI, right? They still should be different Go modules, but do they need to be exposed to the end users as different components?
It is, but as part of the pipeline graph. Extensions are not part of the graph. |
What I mean is - isn't the order of processors also determined by their declaration order in the config? afaik there is no other way to declare dependencies between processors. Frankly, having a graph-based mechanism of ordering would definitely be more preferable for extensions than config-based. When constructing an extension I would specify that it depends on another extension and the runtime would make sure they are started in the correct order (I think it's fine to create them in any order). |
I just wanted to confirm that exposing the extensions as different components in the configuration interface is useful for Jaeger end-users. If it is, I'm ok with merging the PR. Sooner or later, we eventually might run into other extensions with interdependencies. |
**Description**: Enforce order of start and shutdown of extensions according to their internally declared dependencies **Link to tracking Issue**: Resolves #8732 **Motivation**: This is an alternative approach to #8733 which uses declaration order in the config to start extensions. That approach (a) enforces order when it's not always necessary to enforce, and (b) exposes unnecessary complexity to the user by making them responsible for the order. This PR instead derives the desired order of extensions based on the dependencies they declare by implementing a `DependentExtension` interface. That means that extensions that must depend on others can expose this interface and be guaranteed to start after their dependencies, while other extensions can be started in arbitrary order (same as happens today because of iterating over a map). The extensions that have dependencies have two options to expose them: 1. if the dependency is always static (e.g. `jaeger_query` extension depending on `jaeger_storage` as in the OP), the extension can express this statically as well, by returning a predefined ID of the dependent extension 2. in cases where dependencies are dynamic, the extension can read the names of the dependencies from its configuration. The 2nd scenario is illustrated by the following configuration. Here each complex extension knows that it needs dependencies that implement `storage` and `encoding` interfaces (both existing APIs in collector & contrib), but does not know statically which instances of those, the actual names are supplied by the user in the configuration. ```yaml extensions: complex_extension_1: storage: filestorage encoding: otlpencoding complex_extension_2: storage: dbstorage encoding: jsonencoding filestorage: ... dbstorage: ... otlpencoding: jsonencoding: ``` **Changes**: * Introduce `DependentExtension` optional interface * Change `Extensions` constructor to derive the required order using a directed graph (similar to pipelines) * Inherited from #8733 - use new ordered list of IDs to start/stop/notify extensions in the desired order (previously a map was used to iterate over, which resulted in random order). * Tests **Testing**: Unit tests --------- Signed-off-by: Yuri Shkuro <github@ysh.us> Co-authored-by: Antoine Toulme <antoine@lunar-ocean.com>
Describe the bug
Maybe I am misreading the situation, but this code iterates over map when starting extensions, so it cannot guarantee in which order they are started:
opentelemetry-collector/service/extensions/extensions.go
Line 34 in e97ceca
Even though the documentation implies that the order is define by the declaration order:
opentelemetry-collector/docs/service-extensions.md
Lines 63 to 66 in e97ceca
I observed some spontaneous failures in my experiments with JaegerV2 which declares extensions with an order expectation:
extensions: [jaeger_storage, jaeger_query]
but the logs show differently:Steps to reproduce
It's random behavior, hard to reproduce reliably.
What did you expect to see?
Extensions should be starting in the order defined in the config.
What did you see instead?
Seeing different order.
What version did you use?
Version:
v0.87.0
What config did you use?
https://github.com/jaegertracing/jaeger/blob/28520b31471b36d9e949de4ffaa016c2640a3be7/cmd/jaeger-v2/internal/all-in-one.yaml#L1
Environment
OS: MacOS, Linux
Compiler(if manually compiled): go 1.20.x
The text was updated successfully, but these errors were encountered: