Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending BatchActuateStreamResponse #103

Merged
merged 5 commits into from
Nov 21, 2024

Conversation

erikbosch
Copy link
Contributor

@erikbosch erikbosch commented Nov 14, 2024

This PR is based on analysis of OpenProviderStream at https://github.com/eclipse-kuksa/kuksa-databroker/blob/main/proto/kuksa/val/v2/val.proto#L184. One identified gap is if the provider by some reason does not like the actuate request received from Databroker. As of today the only counter measure it can do is to close the stream.

This PR propose to extend BatchActuateStreamResponse so that the provider can send error messages back to databroker. It can send as many BatchActuateStreamResponse as it likes.

It reuses

enum ErrorCode {
  OK                = 0;
  INVALID_ARGUMENT  = 1;
  NOT_FOUND         = 2;
  PERMISSION_DENIED = 3;
}

A hypothetical use case is the the provider by some reason has a different accepted value range for a signal. Then it could return INVALID_ARGUMENT . Databroker will as of today just print this info on debug level, but it could help for troubleshooting. The provider may in addition decide to close the stream, if it thinks the error is so severe that it do not want to talk to Databroker any longer. A hypothetical example could be if the Provider assumes that the Databroker is compromised or has a misconfiguration.

@erikbosch erikbosch marked this pull request as draft November 14, 2024 11:06
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 0% with 18 lines in your changes missing coverage. Please review.

Project coverage is 59.32%. Comparing base (fbbab49) to head (3693a94).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
databroker/src/grpc/kuksa_val_v2/val.rs 0.00% 18 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
- Coverage   59.35%   59.32%   -0.03%     
==========================================
  Files          33       33              
  Lines       16048    16063      +15     
==========================================
+ Hits         9525     9530       +5     
- Misses       6523     6533      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@erikbosch erikbosch marked this pull request as ready for review November 14, 2024 11:19
@erikbosch erikbosch marked this pull request as draft November 14, 2024 11:28
@erikbosch erikbosch marked this pull request as ready for review November 14, 2024 13:50
if let Some(signal_id) = batch_actuate_stream_response.signal_id {
match signal_id.signal {
Some(proto::signal_id::Signal::Path(path)) => {
msg = msg + ", path: " + &path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to use format!() here to create the debug msg messages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean having something like
msg =format! ("{}, path: {}" msg, &path);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. exactly that

@@ -280,7 +282,12 @@ message BatchActuateStreamRequest {
repeated ActuateRequest actuate_requests = 1;
}

// Message that can be used by provider to indicate that an actuation request was not accepted.
// It is optional for the provider to use this message, and for performance reasons it is recommended
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really disagree with what you said here, just to qualify it a bit, I think it won't add a big overhead if the provider returns an OK response in case of success, since kuksa_client actuate is an unary call (it is not supposed to be used with a high frequency as publish datapoint stream). If we add one day actuate_stream to kuksa_client, it should not definitely return an OK response up success to reduce stream overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, returning feedback to the originator of the actuate call seems way better than introducing any form of optimization here. But it should probably be made a lot more clear what the expected provider behaviour is here.

Copy link
Contributor Author

@erikbosch erikbosch Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problems changing it so that it is expected or even required that a result is returned. But if so, do we want say something more on the meaning of OK and when it should be sent. Like does it mean:

I acknowledge reception and did not find any problem with the request

or

I confirm that the request has been executed!

Or something in between? Or leave it to the provider? It could have an impact if we have the ambition to return a response to the requester - like how long do we want to wait for a response?

rafaeling
rafaeling previously approved these changes Nov 19, 2024
Copy link
Contributor

@rafaeling rafaeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me apart from just one comment added.
I am also wondering, is it possible that a provider has a different accepted value range for a signal than databroker? Provider should build its value ranges for signal according metadata received from databroker right?

message BatchActuateStreamResponse {
SignalID signal_id = 1;
Error error = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a architectural question, but do we really want a single Error enum that will contain every error possible for all functions in the whole service?

Wouldn't it be better to have specific errors for different functions, i.e. an ActuationError that only contains the errors possible when trying to actuate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the enum I used matched quite well. At least I can find use-cases for all of them

  • OK - Good to have a default value that indicates no error
  • INVALID_ARGUMENT - value is received is out of expected range
  • NOT_FOUND - signal id is not known by provider
  • PERMISSION_DENIED - signal is known by provider, but it does not accept that it is changes. A possible example could be that the provider has not registered itself for the signal id, so it does not expect to get it.

Off course one can argue that everything here is INVALID_ARGUMENT, so we do not need the rest. But to me the set seems quite reasonable.

https://github.com/eclipse-kuksa/kuksa-databroker/blob/main/proto/kuksa/val/v2/types.proto#L64

message Error {
  ErrorCode code = 1;
  string message = 2;
}

enum ErrorCode {
  OK                = 0;
  INVALID_ARGUMENT  = 1;
  NOT_FOUND         = 2;
  PERMISSION_DENIED = 3;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this could be further refined in the future I think to use ErrorCode if it meets current needs is sufficient for now.

}
}
msg = msg + ", error code: " + &error.code.to_string() + ", error message: " + &error.message;
debug!(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is just a first step in implementing this. But shouldn't the error coming from the provider be forwarded to the originator of the actuate(...) call, i.e. the "consumer"?

Copy link
Contributor Author

@erikbosch erikbosch Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I looked mostly at the Provider rpc.

But I assume we need quite a lot refactoring if we are to return results to the "consumer", some aspects I think of:

  • If Actuate / BatchActuate are to return actual results, the Databroker will need to wait until the providers has replied (for BatchActuate there could be multiple, right?)
  • That means the providers MUST send a response also on OK. Possibly combined with a timeout in databroker so that Actuation is considered unsuccessful if Databroker does not get a response within X seconds.
  • Then we maybe also need some form of the request-id/statefulness, so we know which request the error belong to. At least theoretically there might be multiple consumers trying to change the same signal, right? And it can happen partially in parallel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are discussing here was postponed if I remember correctly as we were not sure if we wanted to put databroker to wait until it receives ACK of receipt and forwards it to the consumer.
So in my opinion, this PR is enough for now if databroker can output some debug actuate information.

Comment on lines 629 to 632
// - Databroker sends BatchActuateStreamRequest -> Provider may return one or more BatchActuateStreamResponse upon error,
// but should for performance reasons send nothing upon success.
// It is up to the provider to decide if the stream shall be closed,
// as of today Databroker will not react on the received error message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should anyone close the stream?

Copy link
Contributor Author

@erikbosch erikbosch Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it as a reasonable behavior if you as a provider get a lot of junk from the Databroker, like if you get a lot of BatchActuateStreamRequest where the signal id does not match the IDs provided in the ProvideActuationRequest you have sent, or the values does not match the value limitations the Provider have internally. I think it is up to the Provider to decide how many errors it will tolerate before it gets pissed off and closes the stream.

Like if a provider has received 1000 requests and all of them have been faulty, why should it keep the stream open

@@ -280,7 +282,12 @@ message BatchActuateStreamRequest {
repeated ActuateRequest actuate_requests = 1;
}

// Message that can be used by provider to indicate that an actuation request was not accepted.
// It is optional for the provider to use this message, and for performance reasons it is recommended
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, returning feedback to the originator of the actuate call seems way better than introducing any form of optimization here. But it should probably be made a lot more clear what the expected provider behaviour is here.

@erikbosch
Copy link
Contributor Author

Looks good to me apart from just one comment added. I am also wondering, is it possible that a provider has a different accepted value range for a signal than databroker? Provider should build its value ranges for signal according metadata received from databroker right?

Not necessarily as I see it. Taking the Kuksa CAN Provider as example, it's main source of truth is a Vspec file that is loaded as configuration. It expects the databroker to have a matching vspec, at least for the files it has in common, it does not read metadata from Databroker. As a provider typically needs business logic to convert from VSS value to some vehicle network, that conversion mechanism is likely hard-coded or based on some config that is defined before a connection to Databroker is established. In the best of worlds the vspec files used by Velocitas Apps, Databroker and Providers should match, but there is no guarantee for that.

There is also the theoretical possibility of a malfunctioning or compromised Databroker that does not perform the checks it should or sends updates to the "wrong" provider.

So as summary - in system that is correctly configured and without bugs there should never be any reason for a provider to reject a request - at least as long as we only reject if signal name or value is incorrect.

Copy link
Contributor

@rafaeling rafaeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and enough to me for now, I agree with @argerus that errors should be propagated to consumer, but we might need some more time to refine that concept better.

@erikbosch erikbosch dismissed argerus’s stale review November 21, 2024 09:40

After discussion in team

@erikbosch erikbosch merged commit 6763a66 into eclipse-kuksa:main Nov 21, 2024
23 checks passed
@erikbosch erikbosch deleted the erik_err branch November 21, 2024 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants