Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove generated code from repository #79

Closed
Oberon00 opened this issue Dec 2, 2019 · 36 comments · Fixed by #173
Closed

Remove generated code from repository #79

Oberon00 opened this issue Dec 2, 2019 · 36 comments · Fixed by #173

Comments

@Oberon00
Copy link
Member

Oberon00 commented Dec 2, 2019

The generated go code should not be checked in into the repository. See also #77 (comment)

Usually you should generate the files only during the build and maybe publish an artifact with them but never check them into source control (the same way you don't check in binary executables). I see #74 stated reasons to generate the code in this repository, and I think having the build files here is fine. But having the generated code here seems wrong.

@Oberon00
Copy link
Member Author

Oberon00 commented Dec 11, 2019

@tigrannajaryan What were the reasons to include the generated code (as opposed to just the build files required to generate it) anyway?

@tigrannajaryan
Copy link
Member

@Oberon00 So that the code can be consumed by other repos that depend on it. This was the approach used by OpenCensus proto and I went with it.

I am not opposed to removing generated code from this repo provided that there is a clear way to consume the proto from dependent projects.

@Oberon00
Copy link
Member Author

Oberon00 commented Dec 11, 2019

Coming from more of a Java background, I would have expected the build in the repository to publish some artifacts to some external repository instead of committing the generated files to version control. Then other repositories would depend on that by referencing the generated artifact.

@Oberon00
Copy link
Member Author

Yet another possibility would be to invoke the build script from the client projects. If it's more than a single line of client build script to do so, tune the proto build so that it becomes just a single line 😄

@jmacd
Copy link
Contributor

jmacd commented Dec 11, 2019 via email

@tigrannajaryan
Copy link
Member

Docker is also good option. We use this https://hub.docker.com/r/znly/protoc for our internal builds.

tigrannajaryan pushed a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Feb 5, 2020
Go files are stale. They need to be regenerated any time .proto files
are changed. This needs to happen until open-telemetry#79
is implemented.
tigrannajaryan pushed a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Feb 5, 2020
Go files are stale. They need to be regenerated any time .proto files
are changed. I executed `make gen-go`.

This needs to happen until open-telemetry#79
is implemented.
yurishkuro pushed a commit that referenced this issue Feb 6, 2020
Go files are stale. They need to be regenerated any time .proto files
are changed. I executed `make gen-go`.

This needs to happen until #79
is implemented.
@tigrannajaryan
Copy link
Member

@open-telemetry/specs-approvers does anybody know how to generate Go ProtoBuf code correctly from a different repo?

Generated Go files import each other and import statements include full URL as specified in go_package option in .proto files. This is currently set to this repository's URL. The net result is that the generated files cannot be compiled because they end up importing non-existing packages.

What is the right way to do this? I was not able to find a protoc option to override import paths. Is one expected to manually alter the import paths after generation (or go_package option before generation)?

@ghost

This comment has been minimized.

@nilebox
Copy link
Member

nilebox commented Feb 27, 2020

Coming from more of a Java background, I would have expected the build in the repository to publish some artifacts to some external repository instead of committing the generated files to version control.

In Go, source code is the artifact, and usually generated code is committed in the "library" repos like this one.

If generated code was only used within this repo, then it could have been generated as part of the build. Third parties shouldn't need to generate code they don't own.

Also, committing generated code makes it easier to review side effects in diff after running gen-go.

@Oberon00
Copy link
Member Author

However, this repository is not only used by Go and there are real, practical drawbacks to having the generated code here, namely that every change requires re-running make gen-go, which means:

  • I can't use the GitHub online editor.
  • I need to set up all the dependencies for the generator even if I'm not interested in it.

A straightforward way to solve this is to make an opentelemetry-proto-go repository which contains the generated code. You could add opentelemetry-proto as as git submodule or use other means to acquire it there. You could move the go-specific build scripts there or leave them here.

@yurishkuro
Copy link
Member

Generated code implies framework versions, for different languages, which is clearly not the scope of this IDL repo. It belongs to the repos that are actually using that code, such as collector or SDKs (and they can easily have different framework versions between themselves).

@nilebox
Copy link
Member

nilebox commented Feb 28, 2020

Currently opentelemetry-collector doesn't have a dependency on opentelemetry-go, but it would still be nice to generate code only once and commit it so that end users don't have to generate it themselves. If both repos will store generated code independently, it may add confusion for the end users on which one is the source of truth. We will also have to keep generation scripts in two places in sync to make sure they are compatible with each other (i.e. using correct flags in protoc commands etc).

I see two reasonable options:

  1. We may move generated code to opentelemetry-go and make opentelemetry-collector dependent on it, unless there is a strong reason against that.
  2. Language specific repos opentemetry-proto-go, opentelemetry-proto-java etc will effectively just split this repo into multiple repos, so that might be preferred since it won't change anything for the end users.

@yurishkuro
Copy link
Member

Currently opentelemetry-collector doesn't have a dependency on opentelemetry-go, but it would still be nice to generate code only once and commit it so that end users don't have to generate it themselves.

Who are the end users and why do they care about proto-generated code? Most users will use SDK and collector as is, the generated code is a hidden implementation detail of those.

If both repos will store generated code independently, it may add confusion for the end users on which one is the source of truth.

That code is internal to each repo (and can be placed under /internal dir).

@nilebox
Copy link
Member

nilebox commented Feb 29, 2020

Most users will use SDK and collector as is, the generated code is a hidden implementation detail of those.

If it's hidden, that's probably fine. But currently proto-generated types are exposed by collector.

See for example OTLPTraceData https://github.com/open-telemetry/opentelemetry-collector/blob/3115b835d0289b5a7fef8b87db6bb838d57ed48a/consumer/consumerdata/consumerdata.go#L44

Currently, to implement a custom exporter for collector, you deal with TraceData and MetricsData types, which use OpenCensus types. They are going to be replaced with OTLP types generated in this repo, if I understand correctly.

So if I want to implement an exporter for a custom backend, I will have to generate these types?

If the plan is to only use these types internally and not use them for extensibility, that may not be an issue.

@tigrannajaryan can probably answer this.

@yurishkuro
Copy link
Member

yurishkuro commented Feb 29, 2020

even if they are not hidden, I don't think there's a lot of chance of confusion. If one is writing a custom exporter for collector, they use proto types from collector repo.

@nilebox
Copy link
Member

nilebox commented Feb 29, 2020

I personally don't see any benefits in generating types for the same protos in more than one place from the Go ecosystem perspective.

If this repo is not appropriate for storing Go packages, let's decide on the right one. Both opentelemetry-go and opentelemetry-proto-go seem fine to me. Separate opentelemetry-proto-go is probably better for using git submodules and validating that generated types are in sync with protos in CI (see #112).

This issue is driven by non-Go concerns, and the current state seems to be working fine for both Go SDK and collector. Duplicate scripts do bring extra cost of keeping them in sync and generated files up-to-date though.


As a side note, committing generated code is officially recommended for Go: https://blog.golang.org/generate

Just keep in mind that it is for package authors, not clients, if only for the reason that the program it invokes might not be available on the target machine. Also, if the containing package is intended for import by go get, once the file is generated (and tested!) it must be checked into the source code repository to be available to clients.

@yurishkuro
Copy link
Member

I 100% agree on committing generated code. Just in the right place.

@nilebox
Copy link
Member

nilebox commented Mar 1, 2020

FYI I have successfully set up a separate repository for generating Go files: https://github.com/nilebox/opentelemetry-proto-go-experimental

Minor caveats:

Overall these are very minor tweaks, so we should be good to go with this option if we agree to move Go files out of this repo.

If someone with permissions can create the opentelemetry-proto-go repo, I can work on the extraction.


NOTE: This will require changing base Go package, so it will affect imports in opentelemetry-go and opentelemetry-collector repos. This should be straightforward once the new repo has all generated Go files in place.

@tigrannajaryan
Copy link
Member

We can move the generated to code to a separate repo if that's the consensus. From Go files' importer's perspective it makes no difference, which repo to import from and as a person who works on the proto frequently I don't care one way or another since I have the tooling installed anyway. I understand it may prevent others who are not so much involved with the proto work to make contributions if they don't have the tooling installed locally.

We should still keep the CI in this repo that performs code generation to ensure we don't break *.proto files accidentally.

One downside of moving Go files elsewhere is that any change to proto requires a change here, then in that other repo, then elsewhere that needs to use Go files (e.g. Collector), but it is not a big deal since hopefully it will be rare once we are past the active phase (we already are I think).

@tigrannajaryan
Copy link
Member

Most users will use SDK and collector as is, the generated code is a hidden implementation detail of those.

If it's hidden, that's probably fine. But currently proto-generated types are exposed by collector.

See for example OTLPTraceData https://github.com/open-telemetry/opentelemetry-collector/blob/3115b835d0289b5a7fef8b87db6bb838d57ed48a/consumer/consumerdata/consumerdata.go#L44

Currently, to implement a custom exporter for collector, you deal with TraceData and MetricsData types, which use OpenCensus types. They are going to be replaced with OTLP types generated in this repo, if I understand correctly.

So if I want to implement an exporter for a custom backend, I will have to generate these types?

If the plan is to only use these types internally and not use them for extensibility, that may not be an issue.

@tigrannajaryan can probably answer this.

This is currently work-in-progress, but most likely we will not expose OTLP types in Collector directly. See open-telemetry/opentelemetry-collector#584

@nilebox
Copy link
Member

nilebox commented Mar 2, 2020

We should still keep the CI in this repo that performs code generation to ensure we don't break *.proto files accidentally.

FWIW gen-go is not executed in CI currently:

ci: gen-java gen-swagger

I've tried to change that in #112, but if we plan to extract Go to the new repo, I would move Go-specific CI part there as well. We can also run go build in the separate repo, which would guarantee that changes are valid and safe (e.g. imports are correct).

Agree with the rest of the comment about benefits and downsides.

@tigrannajaryan
Copy link
Member

I think it is useful to merge #112 since it is an improvement over current state.

This issue may take a bit time to resolve, no need to wait for resolution.

@bogdandrutu
Copy link
Member

@tigrannajaryan can we do what we discuss and try to generate the protos inside the collector repo?

@nilebox
Copy link
Member

nilebox commented Jun 3, 2020

The Collector doesn't use generated Go types from this repo anymore: open-telemetry/opentelemetry-collector#1037

So we may consider doing the same in https://github.com/open-telemetry/opentelemetry-go and remove generated types from this repo.
This is going to be a breaking change though (package renaming).

@MrAlias
Copy link
Contributor

MrAlias commented Jun 11, 2020

The Go SIG is a fan of the approach outlined in #79 (comment)

@open-telemetry/technical-committee can you help setting up a repository for this as described in that comment?

@yurishkuro
Copy link
Member

why separate repo instead of opentelemetry-go?

@MrAlias
Copy link
Contributor

MrAlias commented Jun 11, 2020

why separate repo instead of opentelemetry-go?

The consistency of a centralized location acting as a source of truth would ensure compatibility across all uses. This includes avoiding any differences in compilation and versioning.

It would also avoid the unnecessary tight coupling of two different projects. As pointed out above, in Go projects the common approach to dependencies is to import the dependency instead of bundling into a single code-base.

For what it's worth, we are fine with it staying here as well. It sounds like that is causing a burden to developers wanting to avoid running make prior to a commit though.

Has the option to only run the Go generation on a release been explored?

@yurishkuro
Copy link
Member

The consistency of a centralized location acting as a source of truth would ensure compatibility across all uses.

Proto IDL is the necessary compatibility layer. Different libs can compile IDL into classes all they want, it does not break compatibility. Different libs can also use different code generators (e.g. gogoproto in Go has like 5 different variants). Generated code creates unique versioning issues that are much better handled within projects that depend on that code.

I feel like we're bikeshedding a simple decision: since opentelemetry-go needs to generate code from proto, it should just do it in its repo, under /internal/ package so that it's not unintentionally exposed as an API. If some other project/repo also needs to generate code, they can do it again.

@SergeyKanzhelev
Copy link
Member

@MrAlias
Copy link
Contributor

MrAlias commented Jun 22, 2020

Here's and example of the exact type of error I was talking about avoiding: open-telemetry/opentelemetry-go#845

I don't appreciate my point being trivialized by saying we are just bikeshedding. I take this very seriously as it has to do with the stability of the OpenTelemetry project, something I took responsibility for when I assumed the role of a maintainer for the Go project.

It seems like this solution is being steamrolled in. I'm at a loss as to how to discuss a solution to a problem when this solution was prescribed from the start and differing opinions are not being listened to.

@yurishkuro
Copy link
Member

I don't mean any offense.

open-telemetry/opentelemetry-go#845 is a versioning issue where Go SDK and Go Collector are built with different versions of the proto file. It would break no matter what, even if there was a shared repo with generated code, because SDK and collector are on different release schedules and the proto changes were incompatible.

The Technical Committee discussed this topic today and our recommendation remains the same: remove gen code from this repo and re-gen it in the repos that need it (NB: this is the approach the Jaeger SDKs used for several years). @bogdandrutu volunteered to create a Docker image (similar to https://github.com/jaegertracing/docker-protobuf, a wrapper around protoc, with the necessary plugins) that can be used for generating code in different languages.

@nilebox
Copy link
Member

nilebox commented Jun 23, 2020

FWIW there is no v0.4.0 opentelemetry-proto release matching the latest Collector release.
I created an issue #161 to address that.

Ideally, there should be an OTLP compatibility matrix between SDKs and Collector releases.

@MrAlias
Copy link
Contributor

MrAlias commented Jun 23, 2020

open-telemetry/opentelemetry-go#845 is a versioning issue where Go SDK and Go Collector are built with different versions of the proto file. It would break no matter what, even if there was a shared repo with generated code, because SDK and collector are on different release schedules and the proto changes were incompatible.

as @nilebox pointed out:

there is no v0.4.0 opentelemetry-proto release matching the latest Collector release.

This was built from unreleased code. Additionally, the error in the linked ticket is about the concept of Temporality, a message type still under active debate and likely to change.

This is something @bogdandrutu assured people wouldn't be an issue when it was decided to split up #137 as it would all be gated by a release of this centralized and universal source of truth.

The Technical Committee discussed this topic today and our recommendation remains the same: remove gen code from this repo and re-gen it in the repos that need it (NB: this is the approach the Jaeger SDKs used for several years). @bogdandrutu volunteered to create a Docker image (similar to https://github.com/jaegertracing/docker-protobuf, a wrapper around protoc, with the necessary plugins) that can be used for generating code in different languages.

Understood. I'm disappointing this form of decision making was used and other opinions were not a part of the conversation (especially representation from one of the known repositories this is going to effect), but understand that alternatives are no longer being explored.

@tigrannajaryan
Copy link
Member

@MrAlias BTW, related to your comment please see here #161 (comment)

I agree with you that we need to be stricter and more careful with the releases and protocol changes.

I am not sure it necessarily means the protocol needs to be generated in one place. I think @yurishkuro is right that IDL (the ProtoBuf definitions) is equaly strong guarantee of compatibility, provided that we are using the same compilers from the IDL (or compilers that generate fully compatible code from the same IDL - like canonical Protoc and Gogoproto compilers do.

Perhaps I am missing something, but I think this allows you to achieve the goals that you stated.

@yurishkuro
Copy link
Member

provided that we are using the same compilers from the IDL

To play devil's advocate, even that is not a requirement - if someone so desires then can write completely custom serialization of protobuf and not use code-gen at all (I believe Zipkin's native SDK Brave does exactly that and avoids any framework dependencies). The compatibility is in the protocol, which is defined by the IDL files. Code-gen is how one may use it, but there's nothing canonical about generated code. E.g. gogoproto supports like 5 different code-gen flavors, all resulting in different classes but still compatible on the wire.

@evantorrie
Copy link

evantorrie commented Jul 16, 2020

When merged, open-telemetry/opentelemetry-go#942 will allow this issue to be resolved via a removal of the generated code. However, it may be wise to wait until a new release of go.opentelemetry.io/otel is cut which explicitly removes the go.mod dependency on the generated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants