Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add property/mutable state updating events #196

Merged
merged 6 commits into from
Jul 19, 2022

Conversation

Sushisource
Copy link
Member

What changed?
Added new events that we want to use to make changes to mutable state

Also introduces a new top-level field to history event which indicates that the SDK may ignore an event. This is immediately useful for the ActivityPropertiesModifiedEvent, and also gives us a non-breaking upgrade path for any such events in the future which will function with SDKs that do not know how to interpret the new event, but do know it is safe to ignore.

Why?
Support new features

How did you test it?
:|

Potential risks
SDKs who are not updated to include these new protos and the corresponding logic will break if they encounter histories containing these events.

@Sushisource Sushisource requested a review from yiminc July 6, 2022 23:56
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all LGTM.
I'd add the workflowservice APIs to see if we can embed these new events or reuse the definitions somehow.

// If set, update the workflow run timeout to this value.
google.protobuf.Duration new_workflow_run_timeout = 3 [(gogoproto.stdduration) = true];
// If set, update the workflow execution timeout to this value.
google.protobuf.Duration new_workflow_execution_timeout = 4 [(gogoproto.stdduration) = true];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I set this back to "unlimited"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 would suffice. I'll add it to the comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the value is 0, does it mean set the execution timeout to unlimited or don't change the timeout?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 means unlimited. Don't change would mean the field is unset entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, for all of these we need a way to differentiate between unlimited and unset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no way to differentiate unset from 0 (the default) in go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.. maybe because it’s duration it’s nilable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, duration is nilable but sooner or later we will face same problem with int or string. There are these guys but they are inconvenient to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will handler of this event look like? Will there be bunch of if for nil/zero value for every field? Is it scalable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what other way you're going to do it? Even if Go had proper option types or whatever... you'd still want to check all of them.

This strikes me as not a big deal. For future int/string type things where you need to be able to un-set it, we can easily just make a new message to make it nillable.

// information in any way that the SDK need be concerned with. If an SDK encounters an event
// type which it does not understand, it must error unless this is true. If it is true, it's
// acceptable for the event type and/or attributes to be uninterpretable.
bool sdk_may_ignore = 300;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the future purpose for this value and why we should get it in here, can you confirm it has no value yet? Or are we planning on setting this to true for the properties modified event attributes?

Also, do we want to specify how an SDK reacts to half understanding of the event? Meaning, what if it understands the current modified properties but not the later ones? So if you added new_heartbeat_timeout to ActivityPropertiesModifiedEventAttributes an an event came across with both that and new_retry_policy, should the SDK still handle the one field it understands? (this can be SDK-specific behavior, but worth discussing)

Finally, can we name this something else besides using "sdk" in the name? Maybe bool nonessential?

Copy link
Member Author

@Sushisource Sushisource Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not nonessential to everything - the field is indeed SDK specific. It seems important to include that in the name.

It already applies to ActivityPropertiesModifiedEventAttributes with the one existing field. SDK does not care about that when processing workflow history, as activity info comes from the activity tasks. Same would apply to heartbeat timeout.

To generally address the question, yes, this would certainly be false if any individual field is non-ignorable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the field is indeed SDK specific

Maybe "worker_may_ignore" or something like that? "sdk" just feels icky to put into history. Obviously just a naming thing.

To generally address the question, yes, this would certainly be false if any individual field is non-ignorable.

Specifically, is it ok if the SDK only knows about some of the modified properties and applies those? Or should the SDK apply none if it doesn't recognize them all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to worker_may_ignore.

Specifically, is it ok if the SDK only knows about some of the modified properties and applies those? Or should the SDK apply none if it doesn't recognize them all?

Good point, this will break if we add new fields to an event the SDK recognizes unless the SDK's proto library supports unkown fields.

https://developers.google.com/protocol-buffers/docs/proto3#unknowns
tokio-rs/prost#2

Copy link
Member Author

@Sushisource Sushisource Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W.R.T this behavior, things are no different than any other thing anywhere else in the proto APIs right now. I think the SDK should attempt to apply everything as normal as long as it can interpret the event, even if maybe there are new fields it doesn't understand.

Getting an unspecified event type, or a oneof variant it does not understand, OTOH, should cause it to fail.

I renamed the field

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's true that we don't check for unknown fields today, and probably shouldn't, I think an update-properties event may be a special case where we should. But I don't feel that strongly about it.

// If set, update the workflow run timeout to this value.
google.protobuf.Duration new_workflow_run_timeout = 3 [(gogoproto.stdduration) = true];
// If set, update the workflow execution timeout to this value.
google.protobuf.Duration new_workflow_execution_timeout = 4 [(gogoproto.stdduration) = true];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the value is 0, does it mean set the execution timeout to unlimited or don't change the timeout?

@@ -634,6 +634,26 @@ message ChildWorkflowExecutionTerminatedEventAttributes {
int64 started_event_id = 5;
}

message WorkflowPropertiesModifiedEventAttributes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may also need new cron schedule and new workflow memo here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add these as required, for cron schedule I'm not sure if we're going to keep this feature or deprecate it now that we have scheduled workflows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add these as required

What do you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant as we get requests for those but @yycptt already mentions that there are plans to support updating a memo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about cron schedule though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added memo.

Copy link
Member

@cretz cretz Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, there is also a want for a command for updating memo from inside the workflow akin to how search attribute upsert works today. I figure a general purpose "update workflow properties" command with just that value for now makes sense. We can do it in a separate PR, but I am just hoping it is generic and we don't make a memo-specific command.

Copy link
Member

@alexshtin alexshtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate new_ prefix. I think it is obvious that we write new values to the history, not old ones.

// If set, update the workflow run timeout to this value.
google.protobuf.Duration new_workflow_run_timeout = 3 [(gogoproto.stdduration) = true];
// If set, update the workflow execution timeout to this value.
google.protobuf.Duration new_workflow_execution_timeout = 4 [(gogoproto.stdduration) = true];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, duration is nilable but sooner or later we will face same problem with int or string. There are these guys but they are inconvenient to use.

// If set, update the workflow run timeout to this value.
google.protobuf.Duration new_workflow_run_timeout = 3 [(gogoproto.stdduration) = true];
// If set, update the workflow execution timeout to this value.
google.protobuf.Duration new_workflow_execution_timeout = 4 [(gogoproto.stdduration) = true];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will handler of this event look like? Will there be bunch of if for nil/zero value for every field? Is it scalable?

@Sushisource
Copy link
Member Author

I hate new_ prefix. I think it is obvious that we write new values to the history, not old ones.

Does anyone else have an opinion here? I don't care that much, but I feel it is slightly more clear, and the extra characters aren't exactly hurting anyone.

@cretz
Copy link
Member

cretz commented Jul 15, 2022

I hate new_ prefix.

Does anyone else have an opinion here?

I'm fine with the new_ prefix. It makes it clear that it's unpopulated if there is nothing new (as opposed to having all fields populated even if they didn't change). It also disambiguates it from future additions that may not have an "old" version of the field.

But I'm fine without the new_ prefix too.

@rodrigozhou
Copy link
Contributor

In the replay, there's a check that verifies events with the corresponding command if the event is triggered by a command. It also checks if the attributes values matches.

Since I'm gonna add a command and re-use the event being added here, this event could be coming from either the API or the command. So, we need a way to differentiate them.

Here are a couple of thoughts:
a) Add a fields in the event attributes to indicate where it comes from.
b) Create an event to indicate the next event comes from API (I saw this in the code for another use case)

Thoughts?

@cretz
Copy link
Member

cretz commented Jul 15, 2022

In the replay, there's a check that verifies events with the corresponding command if the event is triggered by a command.

@rodrigozhou - I think we can just have the "check" in each SDK just make sure that if they made a command the event is present, but otherwise ignore the event. The only concern I can see is what if there is a command-based event and an API-based event in the same task, how will you know which to verify against the command? Since this event is unique in this way, I would support an "initiator" enum on the event to say whether it was done in or out of workflow (or bool).

@bergundy
Copy link
Member

Great point @rodrigozhou, we should add an initiator attribute as @cretz suggested.
Would be good to have the record anyways

@Sushisource
Copy link
Member Author

@rodrigozhou et all. Good point about the initiator field, but let's add that when it's needed for sending the command from SDK. No direct need to add it to this PR.

@Sushisource Sushisource force-pushed the add-property-updating-events branch from 6123c46 to 7de3e44 Compare July 19, 2022 20:33
@Sushisource Sushisource requested review from a team as code owners July 19, 2022 20:33
@Sushisource Sushisource enabled auto-merge (squash) July 19, 2022 22:11
@Sushisource Sushisource merged commit f24b135 into master Jul 19, 2022
@Sushisource Sushisource deleted the add-property-updating-events branch July 19, 2022 22:12
@bergundy
Copy link
Member

@rodrigozhou et all. Good point about the initiator field, but let's add that when it's needed for sending the command from SDK. No direct need to add it to this PR.

I would add this now rather than later, otherwise SDK versions that don't take this field into account would assume the initiator is a workflow and would throw a non-determinism error.

@Sushisource
Copy link
Member Author

@rodrigozhou After some discussion I think the command should have it's own event. It's icky for us to re-use the same event for the command b/c it leads to some poor type design where impossible states are representable. They might be a bit duplicative but that's totally fine.

bergundy added a commit that referenced this pull request Jul 19, 2022
bergundy added a commit that referenced this pull request Jul 19, 2022
bergundy added a commit that referenced this pull request Jul 21, 2022
bergundy added a commit that referenced this pull request Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants