-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add protobuf datacodec #688
Add protobuf datacodec #688
Conversation
This adds the ability to send and receive protobuf encoded data within the envelope. Signed-off-by: Kevin Conway <kevinconway@invisionapp.com>
d2ab35b
to
e75fcfa
Compare
if err := proto.Unmarshal(e.Data(), anymsg); err != nil { | ||
if e.DataSchema() == "" { | ||
return nil, fmt.Errorf("cannot encode direct protobuf message without dataschema. set dataschema to the appropriate protobuf type like type.googleapis.com/packge.v1.Type or make sure you are using the appropriate data content type %s", ContentTypeProtobuf) | ||
} | ||
anymsg.TypeUrl = e.DataSchema() | ||
anymsg.Value = e.Data() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to highlight this section because it demonstrates the big question I have for this PR. There's a circular relationship between the codec and the format defined in the protobuf spec that is difficult to resolve without either changing the spec or introducing non-specified implementations details.
When both protobuf format and encoding are used together then we must use the ProtoData
field of the protobuf CloudEvent type to store the data. I believe the intention of the protobuf specification is to match the behavior of JSON+JSON which embeds the data as a native type within the envelope rather than a JSON ecoded string within a JSON envelope. In the case of protobuf, though, the only generic type available is Any
which contains two fields: a type URI and a protobuf encoded message.
In order to create an Any
we must know the source type URI. The Encode
implementation in this draft takes a source protobuf message and generates an Any
from it such that the type URI is correct. However, this means the Encode
and Decode
methods practically only work correctly when used with the protobuf format. If used with the JSON format then Encode
will incorrectly produce an encoded Any
rather than an encoded form of the original message. Only users of this Go SDK would handle this case.
The problem I've highlighted here, though, is that there is not enough information within the Event
during formatting to construct the Any
required to comply with the spec. The special behavior of the format should be entirely encapsulated here but the type erasure that happens during the datacodec Encode
leaves us without a type URI. The same type URI could be set as the dataschema value but this value is not required by either the primary CloudEvent spec or the protobuf extension.
Some options I've considered that I'd like feedback on:
-
Only convert from source data to
Any
in the format when protobuf content type is used. Use the dataschema value, including no value, as the type URI in theAny
. Rely on the use of the non-type URI validating decoding feature ofAny
to prevent malformed or missing type URIs from blocking decoding back to a valid protobuf message. This option technically results in an invalidAny
but this would be hidden behind the format abstraction and never surfaced to users. The risk is that other languages protobuf implementations may not offer the option to bypass invalid or missing type URIs. -
Change the protobuf specification to require the
dataschema
value and match it to the type URI that should be included in anAny
. However, the type URI of a protobuf message is most often not a valid URI as required by the CloudEvent specification but a URI-reference because it is missing the required URI scheme. This means we would need to modify the actual type URI to something CloudEvent compliant in order to use that field. Additionally, the API for getting a message type URI is not a top-level feature but an artifact of the protobuf reflection interface in Go (ex:string(msg.ProtoReflect().Descriptor().FullName())
). I'm not certain how other language implementation of protobuf expose this. -
Artificially restrict the protobuf datacodec to only be used with the protobuf format. This would hide the issues without addressing them. However, this option also seems to require that we leak through the format/datacodec abstraction and add a special case somewhere in the main SDK that identifies and bails on this condition. Otherwise, it's not clear how we could impose that restriction only through the scope of this plugin.
I think Option 1 is the better of the batch because it best hides the implementation detail of using an Any
and allows the various CloudEvent SDKs to manage the complexity of handling a technically invalid Any
due to the potentially missing or malformed type URI. That puts the least burden on users of the SDK and works within the constraints of the current spec. That being said, I'm open to hearing more options or what folks think of the options I've listed.
Hey folks, checking in. Any preference on how to go or would folks rather I choose? |
This refactors the original protobuf datacodec so that it is generally useful as a codec within other formats. Previously, the coded wrapped and unwrapped every value in an Any container because this is required when the protobuf codec and format are used together. Now, the format handles wrapping the protobuf encoded binary data in the Any type. Signed-off-by: Kevin Conway <kevinconway@invisionapp.com>
Signed-off-by: Kevin Conway <kevinconway@invisionapp.com>
I went ahead an implemented Option 1 which was to move the Whenever folks get a chance to review, I believe this is ready to go. When it's merged it will close #661. |
Checking in again. I'm still interested in getting this line of work closed out. Let me know if there are any comments or questions. |
does this PR also address leveraging a schema registry like the one that comes in Kafka ecosystem in order to validate the the protobuf data payload? |
@ezbercih No. It's limited to only implementing the schema agnostic data encoding. There's nothing here that prevents some kind of integration with a schema registry in the future. This PR is limited to completing https://github.com/cloudevents/spec/blob/master/protobuf-format.md. |
APPROVE thanks! |
* Add protobuf datacodec This adds the ability to send and receive protobuf encoded data within the envelope. Signed-off-by: Kevin Conway <kevinconway@invisionapp.com> * Move pb.Any handling to protobuf format This refactors the original protobuf datacodec so that it is generally useful as a codec within other formats. Previously, the coded wrapped and unwrapped every value in an Any container because this is required when the protobuf codec and format are used together. Now, the format handles wrapping the protobuf encoded binary data in the Any type. Signed-off-by: Kevin Conway <kevinconway@invisionapp.com> * Add init to register protobuf codec Signed-off-by: Kevin Conway <kevinconway@invisionapp.com> Co-authored-by: kconwayinvision <58523435+kconwayinvision@users.noreply.github.com>
This adds the ability to send and receive protobuf encoded data within
the envelope.