-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[component] Move component status reporting public API to new componentstatus module #10730
Changes from all commits
356a4a5
9654aa1
4132d05
1c35eb7
e40d606
270752d
5a17c56
a5366a6
e0c4f98
fb98b95
6251a9f
b94469a
e99f56d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Use this changelog template to create an entry for release notes. | ||
|
||
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | ||
change_type: new_component | ||
|
||
# The name of the component, or a single word describing the area of concern, (e.g. otlpreceiver) | ||
component: componentstatus | ||
|
||
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
note: Adds new componentstatus module that will soon replace status content in component. | ||
|
||
# One or more tracking issues or pull requests related to the change | ||
issues: [10730] | ||
|
||
# (Optional) One or more lines of additional information to render under the primary note. | ||
# These lines will be padded with 2 spaces and then inserted directly into the document. | ||
# Use pipe (|) for multiline entries. | ||
subtext: | ||
|
||
# Optional: The change log or logs in which this entry should be included. | ||
# e.g. '[user]' or '[user, api]' | ||
# Include 'user' if the change is relevant to end users. | ||
# Include 'api' if there is a change to a library API. | ||
# Default: '[user]' | ||
change_logs: [api] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
include ../../Makefile.Common |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
module go.opentelemetry.io/collector/component/componentstatus | ||
|
||
go 1.21.0 | ||
|
||
require ( | ||
github.com/stretchr/testify v1.9.0 | ||
go.opentelemetry.io/collector/component v0.106.1 | ||
) | ||
|
||
require ( | ||
github.com/davecgh/go-spew v1.1.1 // indirect | ||
github.com/gogo/protobuf v1.3.2 // indirect | ||
github.com/pmezard/go-difflib v1.0.0 // indirect | ||
go.opentelemetry.io/collector/config/configtelemetry v0.106.1 // indirect | ||
go.opentelemetry.io/collector/pdata v1.12.0 // indirect | ||
go.opentelemetry.io/otel v1.28.0 // indirect | ||
go.opentelemetry.io/otel/metric v1.28.0 // indirect | ||
go.opentelemetry.io/otel/trace v1.28.0 // indirect | ||
go.uber.org/multierr v1.11.0 // indirect | ||
go.uber.org/zap v1.27.0 // indirect | ||
golang.org/x/net v0.26.0 // indirect | ||
golang.org/x/sys v0.21.0 // indirect | ||
golang.org/x/text v0.16.0 // indirect | ||
google.golang.org/genproto/googleapis/rpc v0.0.0-20240528184218-531527333157 // indirect | ||
google.golang.org/grpc v1.65.0 // indirect | ||
google.golang.org/protobuf v1.34.2 // indirect | ||
gopkg.in/yaml.v3 v3.0.1 // indirect | ||
) | ||
|
||
replace go.opentelemetry.io/collector/config/configtelemetry => ../../config/configtelemetry | ||
|
||
replace go.opentelemetry.io/collector/component => ../ | ||
|
||
replace go.opentelemetry.io/collector/pdata => ../../pdata |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
package componentstatus // import "go.opentelemetry.io/collector/component/componentstatus" | ||
|
||
import "go.opentelemetry.io/collector/component" | ||
|
||
// InstanceID uniquely identifies a component instance | ||
// | ||
// TODO: consider moving this struct to a new package/module like `extension/statuswatcher` | ||
// https://github.com/open-telemetry/opentelemetry-collector/issues/10764 | ||
type InstanceID struct { | ||
ID component.ID | ||
Kind component.Kind | ||
PipelineIDs map[component.ID]struct{} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
// Package componentstatus is an experimental module that defines how components should | ||
// report health statues, how collector hosts should facilitate component status reporting, | ||
// and how extensions should watch for new component statuses. | ||
// | ||
// This package is currently under development and is exempt from the Collector SIG's | ||
// breaking change policy. | ||
package componentstatus // import "go.opentelemetry.io/collector/component/componentstatus" | ||
|
||
import ( | ||
"time" | ||
) | ||
|
||
// Watcher is an extra interface for Extension hosted by the OpenTelemetry | ||
// Collector that is to be implemented by extensions interested in changes to component | ||
// status. | ||
// | ||
// TODO: consider moving this interface to a new package/module like `extension/statuswatcher` | ||
// https://github.com/open-telemetry/opentelemetry-collector/issues/10764 | ||
type Watcher interface { | ||
// ComponentStatusChanged notifies about a change in the source component status. | ||
// Extensions that implement this interface must be ready that the ComponentStatusChanged | ||
// may be called before, after or concurrently with calls to Component.Start() and Component.Shutdown(). | ||
// The function may be called concurrently with itself. | ||
ComponentStatusChanged(source *InstanceID, event *Event) | ||
} | ||
|
||
type Status int32 | ||
|
||
// Enumeration of possible component statuses | ||
const ( | ||
// StatusNone indicates absence of component status. | ||
StatusNone Status = iota | ||
// StatusStarting indicates the component is starting. | ||
StatusStarting | ||
// StatusOK indicates the component is running without issues. | ||
StatusOK | ||
// StatusRecoverableError indicates that the component has experienced a transient error and may recover. | ||
StatusRecoverableError | ||
// StatusPermanentError indicates that the component has detected a condition at runtime that will need human intervention to fix. The collector will continue to run in a degraded mode. | ||
StatusPermanentError | ||
// StatusFatalError indicates that the collector has experienced a fatal runtime error and will shut down. | ||
StatusFatalError | ||
// StatusStopping indicates that the component is in the process of shutting down. | ||
StatusStopping | ||
// StatusStopped indicates that the component has completed shutdown. | ||
StatusStopped | ||
) | ||
|
||
// String returns a string representation of a Status | ||
func (s Status) String() string { | ||
switch s { | ||
case StatusStarting: | ||
return "StatusStarting" | ||
case StatusOK: | ||
return "StatusOK" | ||
case StatusRecoverableError: | ||
return "StatusRecoverableError" | ||
case StatusPermanentError: | ||
return "StatusPermanentError" | ||
case StatusFatalError: | ||
return "StatusFatalError" | ||
case StatusStopping: | ||
return "StatusStopping" | ||
case StatusStopped: | ||
return "StatusStopped" | ||
} | ||
return "StatusNone" | ||
} | ||
|
||
// Event contains a status and timestamp, and can contain an error | ||
type Event struct { | ||
status Status | ||
err error | ||
// TODO: consider if a timestamp is necessary in the default Event struct or is needed only for the healthcheckv2 extension | ||
// https://github.com/open-telemetry/opentelemetry-collector/issues/10763 | ||
timestamp time.Time | ||
} | ||
|
||
// Status returns the Status (enum) associated with the Event | ||
func (ev *Event) Status() Status { | ||
return ev.status | ||
} | ||
|
||
// Err returns the error associated with the Event. | ||
func (ev *Event) Err() error { | ||
return ev.err | ||
} | ||
|
||
// Timestamp returns the timestamp associated with the Event | ||
func (ev *Event) Timestamp() time.Time { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Timestamp seem to be use-less, because it is always "now", which means consumers (watchers) can also read if they need it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is important for the healthcheck extension, which also needs to synthesize events. When synthesizing events the timestamp needs to be settable. We used to do this in core, but have removed it, at least temporarily. I was proposing we add an option to set the timestamp in a followup: #10730 (comment). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about, if this is needed for that extension, we can have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, until we discuss the next PR, I would rather remove this for the moment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer to keep this PR a 1-to-1 move of stuff related to component status reporting into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Then update the title to "move code" instead of adding a new module which to me means we can do a better job. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a timestamp will be useful for other status watchers and worth considering for the generated status event. Otherwise, if there are multiple watchers that care about the timestamp, each one will have to wrap the event in a custom struct and generate timestamps independently. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When in doubt leave it out. I am not convinced yet, and until we have 2-3 use-cases we should not add it, since adding later is backwards compatible, but currently we pay the cost of reading the time as well as the maintenance cost. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Followup issue created: #10763 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will still suggest the timestamp has value. I wrote the following on the issue:
|
||
return ev.timestamp | ||
} | ||
|
||
// NewEvent creates and returns a Event with the specified status and sets the timestamp | ||
// time.Now(). To set an error on the event for an error status use one of the dedicated | ||
// constructors (e.g. NewRecoverableErrorEvent, NewPermanentErrorEvent, NewFatalErrorEvent) | ||
func NewEvent(status Status) *Event { | ||
TylerHelmuth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return &Event{ | ||
status: status, | ||
timestamp: time.Now(), | ||
} | ||
} | ||
|
||
// NewRecoverableErrorEvent wraps a transient error | ||
// passed as argument as a Event with a status StatusRecoverableError | ||
// and a timestamp set to time.Now(). | ||
func NewRecoverableErrorEvent(err error) *Event { | ||
ev := NewEvent(StatusRecoverableError) | ||
ev.err = err | ||
return ev | ||
} | ||
|
||
// NewPermanentErrorEvent wraps an error requiring human intervention to fix | ||
// passed as argument as a Event with a status StatusPermanentError | ||
// and a timestamp set to time.Now(). | ||
func NewPermanentErrorEvent(err error) *Event { | ||
ev := NewEvent(StatusPermanentError) | ||
ev.err = err | ||
return ev | ||
} | ||
|
||
// NewFatalErrorEvent wraps the fatal runtime error passed as argument as a Event | ||
// with a status StatusFatalError and a timestamp set to time.Now(). | ||
func NewFatalErrorEvent(err error) *Event { | ||
ev := NewEvent(StatusFatalError) | ||
ev.err = err | ||
return ev | ||
} | ||
|
||
// StatusIsError returns true for error statuses (e.g. StatusRecoverableError, | ||
// StatusPermanentError, or StatusFatalError) | ||
func StatusIsError(status Status) bool { | ||
return status == StatusRecoverableError || | ||
status == StatusPermanentError || | ||
status == StatusFatalError | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this + the
InstanceId
to the "extension/statuswatcher` or something like that to be consistent with storage and other type of extensions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're probably correct, but I'd prefer to encompass all public component status reporting features behind
componentstatus
for now while it is experimental and subject to lots of breaking API discussions. Once we feel better about its API we could move it intoextension/storage
as a final location. Likely other aspects ofcomponentstatus
will move around in the future as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not suggesting to
extension/storage
but toextension/statuswatcher
:)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about why we do this then. Is this to just remove unstable code (that is ok)? But still you should at least document all these things as todos (file issues) and then we can do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing unstable code from
component
andextension
is the goal. Issue created: #10764