You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This conversation tracks the progress of coming up with a plan to validate protobuf definitions.
What are we proposing?
One of the ideas that we have discussed in the meeting yesterday was the possibility of doing some validation of protobuf before it is published to the registry. There are two validation categories that we could do:
As far as I understand, there are both api and lib packages in buffrs. For the lib packages, we could validate that they do not expose any service or rpc blocks. This just prevents you from publishing incorrectly.
We can enforce semantic versioning. This stipulates that version numbers must be of the shape 1.2.3, where the three numbers in order are the major, minor and patch numbers.
The rules are that whenever you release a new version, you must incremement the:
major version when you make incompatible API changes
minor version when you add functionality in a backward compatible manner
patch version when you make backward compatible bug fixes
Validating this means that we do not let you push a new buffrs package with a breaking change unless you also increment the major version, and we do not let you add functionality unless you increment the minor version.
Why is this important?
With buffrs, we can actually build a very powerful primitive for our services to communicate. It allows us to reuse type definitions for our RPC calls and easily update them. But in order for this to be something that can be deployed on a large scale, it needs to have some properties, such as stability guarantees.
Let's consider an example. If we have a library called physics that includes types such as Temperature and Coordinates, these are very useful primitive types that may be used in many services, either directly or indirectly.
Let's assume that this physics package exists in multiple versions: physics 3.1.0, physics 3.1.1, physics 3.1.2. What I believe we want to avoid is having to pin dependencies to a specific version. Because when we do that, what inevitably happens is that in one service, we have multiple versions of a package (serviceA 0.1.0 depends on physics 3.1.0, but it also depends on flightpath 0.2.5 which depends on physics 3.1.2). Now, we have not really made our lives much easier because now our service has several, different Temperature types that need to be converted between.
Instead, we would like to be able to express dependencies such as physics 3.*. By doing that, we can essentially communicate that we are okay with using any updates of this dependency, so long as there are no breaking changes.
But this is only possible or useful if we can rely on one important property: we need to know for sure that updates to the physics package do not introduce breaking changes. Because if they do, then we have no bricked serviceA, serviceB and serviceC, as they all depend on physics.
How can we parse Protobuf?
We should be able to use protobuf-parse to parse the protobuf files and determine breakage. It has a parser implemented in Rust, but it also allows using protoc as a parser (which might be useful in case we run into compatibility issues). It can spit out a FileDescriptorProto per protobuf file. That looks like this:
/// Describes a complete .proto file.#[derive(PartialEq,Clone,Default,Debug)]pubstructFileDescriptorProto{pubname:Option<String>,pubpackage:Option<String>,pubdependency:Vec<String>,pubpublic_dependency:Vec<i32>,pubweak_dependency:Vec<i32>,pubmessage_type:Vec<DescriptorProto>,pubenum_type:Vec<EnumDescriptorProto>,pubservice:Vec<ServiceDescriptorProto>,pubextension:Vec<FieldDescriptorProto>,puboptions:MessageField<FileOptions>,pubsource_code_info:MessageField<SourceCodeInfo>,pubsyntax:Option<String>,pubspecial_fields:SpecialFields,}
Some interesting properties:
We can extract the package name from this. This is quite useful, as it allows us to mandate that you use a package name (in the protobuf file) that aligns with the name of the buffrs package.
We get all the parsed definitions, letting us easily check if something was added/removed.
The parser does not care if things that are included do not exist, which is good for us because we don't really care about the dependencies at this point (or maybe we do?)
How can we determine breakage?
Determine previous relevant version
Figure out that kind of semver bump this is
Diff the current and the previous version to figure out what the differences are
Determine if semver bump aligns with code differences.
Determine version to compare to
Get previous version. This is not as trivial as it sounds, because the versions might not be published in order. For example, you might already be at version 4.5.12, but you want to publish a bugfix for 3.4.19. So the way this works is very simple
Get a list of all published versions: [1.0.0, 1.0.1, 2.0.4, 2.0.5, 2.1.0, 2.1.1, 3.0.0, 3.0.1, 3.0.2].
Exclude every version that is higher than the version that is supposed to be published. For example, if I am trying to publish version 2.0.6, the list becomes: [1.0.0, 1.0.1, 2.0.4, 2.0.5].
Get the last version of that resulting list.
Semver bump classification
Determine if this is a major, minor or patch bump, by comparing the two version numbers. For example:
3.5.12 to 3.5.13 is a patch bump.
3.5.12 to 3.6.0 is a minor bump.
3.5.12 to 4.0.0 is a major bump.
Compare two versions
Next we need to compare the two protobuf definitions to determine what the breakage level is in there.
Remove message, service or rpc: requires major bump.
Add new message, service or rpc: requires minor bump.
Rename or renumber message fields: major bump.
Add message fields, if optional: minor bump.
Add message fields, if required: major bump.
Add comments: patch bump.
It looks to me that the simplest way to accomplish this would be to:
Populate the results of the parsing into custom structs (we need this because having the data in a slightly different shape is more optimal for this comparison, and we are unable to derive custom properties on the external types).
Use a crate such as structdiff or diff to programmatically get a diff type that shows us all of the differences.
Run a simple decider over that set of differences which classifies if this is a patch, minor or major bump.
Acceptance criteria
If the semver bump aligns with the code differences, accept it. Otherwise, print helpful message stating that it was rejected, and why it was rejected.
Game plan
The question is where this validation should take place and how it should work. A good place to validate is always the registry, but looking at cargo for example, they also do some validation client-side because it allows you to do a local dry run.
I think a good approach is to write a validation crate that exposes the necessary primitives. This crate can then be used anywhere this validation needs to be performed. This way, the functionality is also decoupled from buffrs and can be easily tested.
Open questions
Do we enforce that the protobuf package name matches the buffrs package name?
What do we do with yanked releases?
What do we do with alpha releases?
Should we support a force override, which disables the semver check?
Should we, similar to the Rust project, allow beaking changes in versions below 1.0.0?
component::cliEverything related to the buffrs clicomponent::registryEverything related to the buffrs registrycomplexity::highIssues or ideas that are highly complex. require discussion and may affect backwards compatibilitytype::featureShipping or drafting a new feature for this producttype::ideaRough idea or proposal for buffrspriority::mediumThis is not urgent, but we should do this.
1 participant
Converted from issue
This discussion was converted from issue #105 on October 10, 2023 13:19.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This conversation tracks the progress of coming up with a plan to validate protobuf definitions.
What are we proposing?
One of the ideas that we have discussed in the meeting yesterday was the possibility of doing some validation of protobuf before it is published to the registry. There are two validation categories that we could do:
As far as I understand, there are both api and lib packages in buffrs. For the lib packages, we could validate that they do not expose any
service
orrpc
blocks. This just prevents you from publishing incorrectly.We can enforce semantic versioning. This stipulates that version numbers must be of the shape
1.2.3
, where the three numbers in order are themajor
,minor
andpatch
numbers.The rules are that whenever you release a new version, you must incremement the:
Validating this means that we do not let you push a new buffrs package with a breaking change unless you also increment the major version, and we do not let you add functionality unless you increment the
minor
version.Why is this important?
With buffrs, we can actually build a very powerful primitive for our services to communicate. It allows us to reuse type definitions for our RPC calls and easily update them. But in order for this to be something that can be deployed on a large scale, it needs to have some properties, such as stability guarantees.
Let's consider an example. If we have a library called
physics
that includes types such asTemperature
andCoordinates
, these are very useful primitive types that may be used in many services, either directly or indirectly.Let's assume that this physics package exists in multiple versions:
physics 3.1.0
,physics 3.1.1
,physics 3.1.2
. What I believe we want to avoid is having to pin dependencies to a specific version. Because when we do that, what inevitably happens is that in one service, we have multiple versions of a package (serviceA 0.1.0
depends onphysics 3.1.0
, but it also depends onflightpath 0.2.5
which depends onphysics 3.1.2
). Now, we have not really made our lives much easier because now our service has several, differentTemperature
types that need to be converted between.Instead, we would like to be able to express dependencies such as
physics 3.*
. By doing that, we can essentially communicate that we are okay with using any updates of this dependency, so long as there are no breaking changes.But this is only possible or useful if we can rely on one important property: we need to know for sure that updates to the
physics
package do not introduce breaking changes. Because if they do, then we have no brickedserviceA
,serviceB
andserviceC
, as they all depend on physics.How can we parse Protobuf?
We should be able to use protobuf-parse to parse the protobuf files and determine breakage. It has a parser implemented in Rust, but it also allows using
protoc
as a parser (which might be useful in case we run into compatibility issues). It can spit out a FileDescriptorProto per protobuf file. That looks like this:Some interesting properties:
included
do not exist, which is good for us because we don't really care about the dependencies at this point (or maybe we do?)How can we determine breakage?
Determine version to compare to
Get previous version. This is not as trivial as it sounds, because the versions might not be published in order. For example, you might already be at version
4.5.12
, but you want to publish a bugfix for3.4.19
. So the way this works is very simple1.0.0
,1.0.1
,2.0.4
,2.0.5
,2.1.0
,2.1.1
,3.0.0
,3.0.1
,3.0.2
].2.0.6
, the list becomes: [1.0.0
,1.0.1
,2.0.4
,2.0.5
].Semver bump classification
Determine if this is a
major
,minor
orpatch
bump, by comparing the two version numbers. For example:3.5.12
to3.5.13
is a patch bump.3.5.12
to3.6.0
is a minor bump.3.5.12
to4.0.0
is a major bump.Compare two versions
Next we need to compare the two protobuf definitions to determine what the breakage level is in there.
message
,service
orrpc
: requiresmajor
bump.message
,service
orrpc
: requiresminor
bump.message
fields:major
bump.message
fields, if optional:minor
bump.message
fields, if required:major
bump.patch
bump.It looks to me that the simplest way to accomplish this would be to:
Acceptance criteria
If the semver bump aligns with the code differences, accept it. Otherwise, print helpful message stating that it was rejected, and why it was rejected.
Game plan
The question is where this validation should take place and how it should work. A good place to validate is always the registry, but looking at
cargo
for example, they also do some validation client-side because it allows you to do a local dry run.I think a good approach is to write a
validation
crate that exposes the necessary primitives. This crate can then be used anywhere this validation needs to be performed. This way, the functionality is also decoupled from buffrs and can be easily tested.Open questions
1.0.0
?Beta Was this translation helpful? Give feedback.
All reactions