Subscription Control Proposal #5080
travis-minke-sap
started this conversation in
Ideas
Replies: 3 comments 3 replies
-
OK... after some side conversations in Slack it seems that...
...and so we can consider this discussion... "closed" ; ) |
Beta Was this translation helpful? Give feedback.
0 replies
-
Next attempt is to only to replay in eventing-kafka, but trying to find agreeable solution to larger issue of how to expose dynamic / ephemeral control apis in knative / kubernetes... |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Knative Eventing - Subscription Control Proposal
The following proposal seeks to enhance the existing Knative Eventing Subscription API with optional configuration allowing Channel implementations to provide basic playback control such as Pause & Replay.
Due to their backing technologies, not all Channel implementations will be able to support these optional capabilities. The intent is that the API allow those that can, the ability to do so without requiring it of other implementations. While it is expected that all Subscriptions could support the "Pause" capability, most might not be able to support the "Replay" feature.
Problem Statement
The following use cases are driving the need for the proposed enhancements...
Planned Subscriber Downtime: Subscribers often need to incur downtime for basic maintenance and upgrades. Subscribers would like to manage this downtime in a controlled pro-active manner without incurring message loss. Deleting and re-creating a Subscription is not adequate as the subscriber will lose its location in the event-stream and thus miss events. Currently, the way to manage this is to set a retry strategy where the retry time exceeds the expected downtime, or to send events to the
deadLetterSink
and manually process them via custom logic, neither of which is great.Unplanned Subscriber Downtime: Subscribers might also experience unplanned downtime due to bugs, hardware failures, event content incompatibility, etc. Upon correction of the failure's cause, the subscriber would like the ability to resend (replay) the missed events. They might want to do this in parallel with the processing of new/current events, or they might instead wish to back up and start again at the original failure point.
Variable Starting Location: Independent of any downtime, a "new" subscriber might want to start receiving events starting from some prior point in time. Alternatively, it might want to just start receiving new events. Currently, there is no control over this, and the user is subject to the Channel implementations whims as to the point in time at which to start sending events. Usually this means receiving "new" events only. A recent example of this can be seen in Issue #420 and associated PR #428.
Proposal
The Subscription API changes being proposed are detailed in the following sections...
Features
The following new capabilities are being proposed...
Pause: The ability to pro-actively "pause" (disable, inactivate, etc) and "resume" Subscriptions will allow administrators to manage planned-for subscriber downtime without suffering any event loss.
Replay: The ability to "replay" old events provides a clean recovery for a subscriber which experienced an unexpected downtime. This replay might take several forms including the following...
Initial Positioning: The "Replay" capability described above inherently includes the ability to create a new Subscription with a specific starting position.
Requirements
The proposed capabilities should be provided at the Subscription level to ensure fine-grained control.
Not all Channel implementations will be able to provide these capabilities, so all API changes should be marked as MAY or OPTIONAL leaving the decision up to Channel implementers.
Channel implementations would ideally document their ability to support these new features.
New configuration values should be open and generic enough to allow Channel implementations freedom as to their use. For example a KafkaChannel might expose the ability to specify timestamps or offsets for positional indicators, while another channel might require a different value altogether.
New configuration should be included in the
Channel.spec.subscribers
data, that is populated by the Subscription Controller, to facilitate implementation of these new features.Subscription Status field should be expanded to indicate "enabled" state, thus impacting in the overall Subscription READY state.
Design
The following design is an initial rough-draft of how the above could be achieved. The exact field names or structures are open for improvement...
DeliverySpec Fields
The existing DeliverySpec duck type exists to control the actual "delivery" of events, and seems like the logical place to add new control fields. If, however, it is undesirable to include OPTIONAL fields in the DeliverySpec we could instead create a new
ControlSpec
structure to separate the two. The following new fields should allow the desired features to be controlled.Enabled
field could be renamed as "Paused" or "Active" or whatever term is preferred.StartPosition
field is "temporal" in that it will be removed by the Channel Controller once it has "started" from that location. This is admittedly a bit odd, in that the user provides some config that is automatically removed, but is necessary to prevent successive restarts from that location upon restarts. Implementations would likely want to make use of the new control-protocol for coordinating the new positions between Controllers and Dispatchers.StartPosition
andEndPosition
fields to be[]byte
instead ofstring
?StartPosition
field to an existing Subscription should move its position in the event stream back to the specified location.EndPosition
field is not temporal, and can block further event processing. If a user were to remove it then the Subscription would resume processing newer events.Status Tracking
The proposed Enabled field above would be used (if present) to populate a new Subscription Status Condition which would impact the rolled-up value of the overall
Ready
status for the Subscription. The correspondingSubscriptionStatus
lifecycle methods would be expanded to include aMarkSubscriptionEnabled()
function which the Subscription Controller would make use of according to the current implementation.Annotation History
In the case where a Subscription exists for a long time (months / years), and might have undergone several "pause" and "replay" cycles, it would be helpful for users of the system to be able to see a record of that history. A lightweight way to achieve this would be for Channels supporting these features to maintain this record as a custom annotation similar to the following...
Alternatives
The following alternatives were considered but rejected due to inherent disadvantages. These ideas could be debated further or enhanced on if they warrant it. (Any other suggestions?)
New Eventing CRD: Instead of modifying Subscriptions spec directly, a new Custom Resource (e.g. "SubscriptionControl") could be created to isolate this capability. This is very heavy-weight though, potentially requiring separate Controllers, and coordination with the actual Subscription custom resources out of band from the normal handling of the DeliverySpec configuration.
Custom Annotations: If there is no Knative Eventing level API specified way to achieve the pausing and replay of events, then Channel implementations requiring this functionality would be left to create their own custom set of Kubernetes annotations. This would also perturb the lifecycle as it would likely require watching Subscriptions directly instead of relying on the
Channel.spec.subscribers
block.Non YAML API: Alternatively a Channel implementation could expose HTTP endpoints as an interface for these "control" operations. This API could be one-off per Channel or standardized in Knative, but this introduces a new control paradigm which might not be desirable?
Resources
Beta Was this translation helpful? Give feedback.
All reactions