Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add protobuf datacodec #688

Merged
merged 3 commits into from
Aug 13, 2021
Merged

Conversation

kconwayinvision
Copy link
Contributor

This adds the ability to send and receive protobuf encoded data within
the envelope.

This adds the ability to send and receive protobuf encoded data within
the envelope.

Signed-off-by: Kevin Conway <kevinconway@invisionapp.com>
Comment on lines 109 to 115
if err := proto.Unmarshal(e.Data(), anymsg); err != nil {
if e.DataSchema() == "" {
return nil, fmt.Errorf("cannot encode direct protobuf message without dataschema. set dataschema to the appropriate protobuf type like type.googleapis.com/packge.v1.Type or make sure you are using the appropriate data content type %s", ContentTypeProtobuf)
}
anymsg.TypeUrl = e.DataSchema()
anymsg.Value = e.Data()
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to highlight this section because it demonstrates the big question I have for this PR. There's a circular relationship between the codec and the format defined in the protobuf spec that is difficult to resolve without either changing the spec or introducing non-specified implementations details.

When both protobuf format and encoding are used together then we must use the ProtoData field of the protobuf CloudEvent type to store the data. I believe the intention of the protobuf specification is to match the behavior of JSON+JSON which embeds the data as a native type within the envelope rather than a JSON ecoded string within a JSON envelope. In the case of protobuf, though, the only generic type available is Any which contains two fields: a type URI and a protobuf encoded message.

In order to create an Any we must know the source type URI. The Encode implementation in this draft takes a source protobuf message and generates an Any from it such that the type URI is correct. However, this means the Encode and Decode methods practically only work correctly when used with the protobuf format. If used with the JSON format then Encode will incorrectly produce an encoded Any rather than an encoded form of the original message. Only users of this Go SDK would handle this case.

The problem I've highlighted here, though, is that there is not enough information within the Event during formatting to construct the Any required to comply with the spec. The special behavior of the format should be entirely encapsulated here but the type erasure that happens during the datacodec Encode leaves us without a type URI. The same type URI could be set as the dataschema value but this value is not required by either the primary CloudEvent spec or the protobuf extension.

Some options I've considered that I'd like feedback on:

  1. Only convert from source data to Any in the format when protobuf content type is used. Use the dataschema value, including no value, as the type URI in the Any. Rely on the use of the non-type URI validating decoding feature of Any to prevent malformed or missing type URIs from blocking decoding back to a valid protobuf message. This option technically results in an invalid Any but this would be hidden behind the format abstraction and never surfaced to users. The risk is that other languages protobuf implementations may not offer the option to bypass invalid or missing type URIs.

  2. Change the protobuf specification to require the dataschema value and match it to the type URI that should be included in an Any. However, the type URI of a protobuf message is most often not a valid URI as required by the CloudEvent specification but a URI-reference because it is missing the required URI scheme. This means we would need to modify the actual type URI to something CloudEvent compliant in order to use that field. Additionally, the API for getting a message type URI is not a top-level feature but an artifact of the protobuf reflection interface in Go (ex: string(msg.ProtoReflect().Descriptor().FullName())). I'm not certain how other language implementation of protobuf expose this.

  3. Artificially restrict the protobuf datacodec to only be used with the protobuf format. This would hide the issues without addressing them. However, this option also seems to require that we leak through the format/datacodec abstraction and add a special case somewhere in the main SDK that identifies and bails on this condition. Otherwise, it's not clear how we could impose that restriction only through the scope of this plugin.

I think Option 1 is the better of the batch because it best hides the implementation detail of using an Any and allows the various CloudEvent SDKs to manage the complexity of handling a technically invalid Any due to the potentially missing or malformed type URI. That puts the least burden on users of the SDK and works within the constraints of the current spec. That being said, I'm open to hearing more options or what folks think of the options I've listed.

@kconwayinvision
Copy link
Contributor Author

Hey folks, checking in. Any preference on how to go or would folks rather I choose?

This refactors the original protobuf datacodec so that it is generally
useful as a codec within other formats. Previously, the coded wrapped
and unwrapped every value in an Any container because this is required
when the protobuf codec and format are used together. Now, the format
handles wrapping the protobuf encoded binary data in the Any type.

Signed-off-by: Kevin Conway <kevinconway@invisionapp.com>
Signed-off-by: Kevin Conway <kevinconway@invisionapp.com>
@kconwayinvision
Copy link
Contributor Author

I went ahead an implemented Option 1 which was to move the Any type wrapping into the format. This ensures that the datacodec implementation is generally useful and can be used in conjunction with non-protobuf formats for the envelope.

Whenever folks get a chance to review, I believe this is ready to go. When it's merged it will close #661.

@kconwayinvision
Copy link
Contributor Author

Checking in again. I'm still interested in getting this line of work closed out. Let me know if there are any comments or questions.

@ezbercih
Copy link

does this PR also address leveraging a schema registry like the one that comes in Kafka ecosystem in order to validate the the protobuf data payload?

@kconwayinvision
Copy link
Contributor Author

@ezbercih No. It's limited to only implementing the schema agnostic data encoding.

There's nothing here that prevents some kind of integration with a schema registry in the future. This PR is limited to completing https://github.com/cloudevents/spec/blob/master/protobuf-format.md.

@n3wscott
Copy link
Member

APPROVE thanks!

@n3wscott n3wscott merged commit 5972875 into cloudevents:main Aug 13, 2021
n3wscott pushed a commit that referenced this pull request Aug 13, 2021
* Add protobuf datacodec

This adds the ability to send and receive protobuf encoded data within
the envelope.

Signed-off-by: Kevin Conway <kevinconway@invisionapp.com>

* Move pb.Any handling to protobuf format

This refactors the original protobuf datacodec so that it is generally
useful as a codec within other formats. Previously, the coded wrapped
and unwrapped every value in an Any container because this is required
when the protobuf codec and format are used together. Now, the format
handles wrapping the protobuf encoded binary data in the Any type.

Signed-off-by: Kevin Conway <kevinconway@invisionapp.com>

* Add init to register protobuf codec

Signed-off-by: Kevin Conway <kevinconway@invisionapp.com>

Co-authored-by: kconwayinvision <58523435+kconwayinvision@users.noreply.github.com>
@kconwayinvision kconwayinvision deleted the protobuf-datacodec branch August 17, 2021 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants