Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: unused (yet) dependencies added by go get should not be indirect #68593

Closed
fmeum opened this issue Jul 25, 2024 · 13 comments
Closed
Milestone

Comments

@fmeum
Copy link

fmeum commented Jul 25, 2024

Proposal Details

(In the following, $package stands for some package in a Go module $module at version $version.)

When a developer runs go get $package@$version to add a new dependency to their module, the result is that the module's go.mod file is updated with require directives for $module@$version as well as the transitive dependencies of the given package, with all of these requires being marked with // indirect.
While this seems appropriate for the transitive dependencies, the outcome for the require $module $version is not what the user intended:

  • If the user starts to use $package in their code, they will have introduced a direct dependency on $module@$version without this being reflected in go.mod.
  • If the user doesn't end up using $package (or any other package in $module), the require directives for $module@$version and all its new transitive dependencies are unnecessary.

In either case, a command such as go mod tidy or go get -t ./... is necessary to get the go.mod file into a consistent state. In fact, the go.mod file after running go get $package@$version is, due to the // indirect comment on the new require directive for $module@$version, usually not in a state that could ever be produced by go mod tidy.

This is problematic for external tooling that relies on the information in go.mod to determine the set of dependencies for a module in a lightweight way (compared to the more heavyweight and over-approximating go list -m all).

gopls explicitly works around this behavior by manually running go mod edit -require $module@$version to ensure that the require directive for $module@$version is not marked as // indirect.

Proposal

When a user runs go get $package@$version (or any other package pattern), the require directive for $module@$version should not be marked as // indirect. Since adding a usage of the package to the code and running go mod tidy would have the same effect, this change should not break any existing workflows or tooling.
A more conservative approach would be to gate this behavior behind a flag (say -r for "require").

Context

While I believe that this issue is generic to all users of go get, especially those that aren't also using gopls, there is a specific use case that motivated me to write it up.

The rules_go Bazel ruleset uses Bazel's new dependency management system Bzlmod to directly translate the information in a project's go.mod and go.sum files into Bazel dependencies, with each Go module corresponding to a Bazel repository.

Since external dependencies are subject to "strict visibility", each Bazel module needs to declare all its direct dependencies. For this reason, the // indirect comment in go.mod is used as a signal to rules_go that a dependency is not direct, avoiding build breakages caused by changes to transitive dependencies of updated direct dependencies. Due to the particular behavior of go get, this makes it very difficult for users to add new dependencies to their Bazel-managed Go project without also adding a usage in code and running go mod tidy, which can be slow in large monorepos.

If go get didn't mark the explicitly requested modules as // indirect, this setup would "just work".

@fmeum fmeum added the Proposal label Jul 25, 2024
@gopherbot gopherbot added this to the Proposal milestone Jul 25, 2024
@seankhliao seankhliao changed the title proposal: cmd/go/internal/modget: Effect of go get on go.mod should match user intent proposal: cmd/go: unused (yet) dependencies added by go get should not be indirect Jul 26, 2024
@seankhliao seankhliao added the GoCommand cmd/go label Jul 26, 2024
@earthboundkid
Copy link
Contributor

I've seen projects on Github where the dependencies are incorrectly marked as indirect, presumably for this reason.

@matloob
Copy link
Contributor

matloob commented Jul 30, 2024

I'm not sure what all the implications of doing this are. You're right that since the // indirect comments are informative and don't affect the build that we don't have to worry about that. But I'm wondering if anyone depends on the behavior that if an import doesn't yet exist and go get is run for that module, that the module is marked indirect.

I would like to see if we can first try to find an alternate solution to the bzlmod the right set of direct dependencies. For example, if it's too expensive to run go mod tidy, running go list with "-mod=mod" will update the "indirect" fields when a module importing the direct dependency is loaded. Maybe we can build a solution using that?

I'm curious to know how long go mod tidy takes for the large monorepos compared to the time that it takes for Bazel to process the go.mod files for its dependency system.

@fmeum
Copy link
Author

fmeum commented Aug 1, 2024

I'm not sure what all the implications of doing this are. You're right that since the // indirect comments are informative and don't affect the build that we don't have to worry about that. But I'm wondering if anyone depends on the behavior that if an import doesn't yet exist and go get is run for that module, that the module is marked indirect.

A tool that is based on this assumption would seem very brittle: Since go get doesn't emit the modified go.mod file directly to stdout, the file may have already been modified between running go get and the tool reading the resulting go.mod file, for example by IDE auto-save functionality or other tooling triggered by a file watcher.

I nonetheless agree that we should take the potential implications on backwards compatibility seriously. If we otherwise arrive at the conclusion that this behavior would be beneficial to users in general (not just users of rules_go), could gating this behind GOEXPERIMENT or GODEBUG be a way to roll this out safely?

I would like to see if we can first try to find an alternate solution to the bzlmod the right set of direct dependencies. For example, if it's too expensive to run go mod tidy, running go list with "-mod=mod" will update the "indirect" fields when a module importing the direct dependency is loaded. Maybe we can build a solution using that?

Could you elaborate on the go list -mod=mod trick? I didn't manage to find documentation on it and just running it didn't change my go.mod.

I'm curious to know how long go mod tidy takes for the large monorepos compared to the time that it takes for Bazel to process the go.mod files for its dependency system.

In a large monorepo with >2,000 Go module deps, the logic that translates go.mod and go.sum into Bazel repos runs in ~1s (in fact, due to a Bazel limitation, it's currently 2-3x slower than it could be with some optimizations applied). go mod tidy takes ~20s in an anecdotally average case (mix of cached modules and network access to get new modules).

@matloob
Copy link
Contributor

matloob commented Aug 1, 2024

I'm not sure what all the implications of doing this are. You're right that since the // indirect comments are informative and don't affect the build that we don't have to worry about that. But I'm wondering if anyone depends on the behavior that if an import doesn't yet exist and go get is run for that module, that the module is marked indirect.

A tool that is based on this assumption would seem very brittle: Since go get doesn't emit the modified go.mod file directly to stdout, the file may have already been modified between running go get and the tool reading the resulting go.mod file, for example by IDE auto-save functionality or other tooling triggered by a file watcher.

I nonetheless agree that we should take the potential implications on backwards compatibility seriously. If we otherwise arrive at the conclusion that this behavior would be beneficial to users in general (not just users of rules_go), could gating this behind GOEXPERIMENT or GODEBUG be a way to roll this out safely?

I think we would use those if we wanted to experiment with the behavior change without committing to it. But I think what we'd end up doing would be to keep the old behavior for modules declaring, say, go version 1.23 or older, and only change the behavior for 1.24 or later. I think we'd ultimately get the same level of safety, but any people or tools depending on the old behavior would have to adjust their expectations for (say) 1.24 or later modules.

I would like to see if we can first try to find an alternate solution to the bzlmod the right set of direct dependencies. For example, if it's too expensive to run go mod tidy, running go list with "-mod=mod" will update the "indirect" fields when a module importing the direct dependency is loaded. Maybe we can build a solution using that?

Could you elaborate on the go list -mod=mod trick? I didn't manage to find documentation on it and just running it didn't change my go.mod.

The go command usually runs in the -mod=readonly mode. That means that most go commands (with the exceptions of tidy and get which exist to modify go.mod files) do not modify the go.mod file when running, if dependencies are out of date or the // indirect comments are wrong. -mod=mod was the old default mode for the go command up through Go 1.15. It will add missing dependencies, and it will update indirect annotations to direct if it runs across a direct import from the work module into a directly required module. (it does not "downgrade" direct annotations to indirect– those have to be done using go mod tidy) But if it doesn't run across a direct import while loading it won't do anything. So ideally you'd pass in the package importing the new module dependency.

Take the following directory as an example (it's in txtar format: use golang.org/x/exp/cmd/txtar --extract to write the files to disk so you can try it out)

-- foo.go --
package foo

import "golang.org/x/tools/go/packages"

var _ = packages.Load
-- go.mod --
module example.com/foo

go 1.23rc2

require (
	golang.org/x/tools v0.23.0 // indirect
	golang.org/x/mod v0.19.0 // indirect
	golang.org/x/sync v0.7.0 // indirect
)
-- go.sum --
golang.org/x/mod v0.19.0 h1:fEdghXQSo20giMthA7cd28ZC+jts4amQ3YMXiP5oMQ8=
golang.org/x/mod v0.19.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/sync v0.7.0 h1:YsImfSBoP9QPYL0xyKJPq0gcaJdG3rInoqxTWbfQu9M=
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/tools v0.23.0 h1:SGsXPZ+2l4JsgaCKkx+FQ9YZ5XEtA1GZYuoDjenLjvg=
golang.org/x/tools v0.23.0/go.mod h1:pnu6ufv6vQkll6szChhK3C3L/ruaIv5eBeztNG8wtsI=

If you run go list -mod=mod example.com/foo then when loading the example.com/foo package, it will see the direct dependency from the example.com/foo package into the golang.org/x/tools module and promote it to a "direct" dependency and write out the updated go.mod file.

So go list -mod=mod ./... in the root of the module will update all the relevant indirect annotations to direct. This could still be slow in a large monorepo if it's running on a lot of packages but if you can determine which packages have changed and only list those packages it will be much faster.

I'm curious to know how long go mod tidy takes for the large monorepos compared to the time that it takes for Bazel to process the go.mod files for its dependency system.

In a large monorepo with >2,000 Go module deps, the logic that translates go.mod and go.sum into Bazel repos runs in ~1s (in fact, due to a Bazel limitation, it's currently 2-3x slower than it could be with some optimizations applied). go mod tidy takes ~20s in an anecdotally average case (mix of cached modules and network access to get new modules).

I see. I can understand the need for something faster. I'd like to see first if we can find a solution without changing the behavior of go get with with indirect comments, but I do want to help find a solution.

@fmeum
Copy link
Author

fmeum commented Aug 2, 2024

So go list -mod=mod ./... in the root of the module will update all the relevant indirect annotations to direct. This could still be slow in a large monorepo if it's running on a lot of packages but if you can determine which packages have changed and only list those packages it will be much faster.

Thanks for the explanation and the example! This could be a fast replacement for go mod tidy, but it unfortunately doesn't quite match the way the go_deps extension operates: Bazel (module) extensions are rerun from scratch when Bazel detects any changes to their inputs (in this case go.mod and go.sum) and are usually expected to be fully deterministic. As a result, they should neither read source files nor would they be able to reason about what exactly changed since they were last run.

rules_go offers a Bazel target (it's really just an executable) that wraps the go tool of the Bazel-configured hermetic Go SDK. Users can run bazel run @rules_go//go -- <args>, which can be shortened to go with a setup using direnv. We really want this binary to be a transparent wrapper around the Go SDK's go binary, but we can of course inject additional logic. We currently use this to automatically run bazel mod tidy, which updates Bazel files in a way controlled by extensions such as go_deps, when we see that any of the go.mod files in the project changed as a result of running this command.

We could possibly use this for a workaround that is similar to what gopls does: When the user runs go get through @rules_go//go, we could postprocess the go.mod by removing the // indirect comment on every module that matches any of the package pattern args to go get. But from a cursory glance at golang.org/x/mod, I couldn't find functions that would make it easy to build this functionality outside go get.

Downsides would be that our "transparent wrapper around go" would no longer be transparent. It would also only help users that go through this target or enforce its usage by setting up a go alias for it.

@rsc
Copy link
Contributor

rsc commented Aug 7, 2024

They are indirect because there are no references to them yet. This is an accurate recording of the facts.

If you want them to not be indirect, add the imports first and then use 'go get' with no arguments to scan the current directory's source code and resolve its imports. Then it will see that the imports are used and mark them direct. Or else run 'go mod tidy' after writing your code. Both of those will also end up with an accurate recording of the facts.

This is not something we should change.

@earthboundkid
Copy link
Contributor

They are indirect because there are no references to them yet. This is an accurate recording of the facts.

I think a better description of the state of the world than the import being "indirect" is that it was "manual". It's not a normal indirect import where A imports B. It's C was added manually but isn't needed (yet).

@fmeum
Copy link
Author

fmeum commented Aug 7, 2024

They are indirect because there are no references to them yet. This is an accurate recording of the facts.

If we follow this line of argument to the end, wouldn't that mean that go get example.com/foo@v1.2.3 should not add any requirement to go.mod? If no references to the module exist in code, it's not really an indirect dependency either, it's just not a dependency of the module at all (yet).

If you want them to not be indirect, add the imports first and then use 'go get' with no arguments to scan the current directory's source code and resolve its imports. Then it will see that the imports are used and mark them direct. Or else run 'go mod tidy' after writing your code. Both of those will also end up with an accurate recording of the facts.

Adding just the imports is tricky since IDEs (both VSCode with gopls and GoLand) tend to remove imports that aren't used in a file on save (or explicit format action). So this would require adding both an import and a usage before running go get or go mod tidy. At that point the flow would be:

  1. Run go get example.com/foo@v1.2.3 to download the module into the cache.
  2. Add an import and a usage (IDE autocompletion can usually add the import automatically now that the usage matches something in the module cache).
  3. Run go get or go mod tidy to update go.mod.

Compared to that, it would be more convenient to just type out the import and let the gopls "get package" quick fix update go.mod. But that quick fix simply runs go mod edit -require example.com/foo@v1.2.3 and then go get example.com/foo@v1.2.3, which is the behavior described by this proposal.

Which leads to my most important question: Why should a user interactively run the go get example.com/foo@v1.2.3 form of go get at all if it leaves the project in a state that isn't fully accurate without 1) code changes and 2) a follow-up go command that syncs the state of the code and the go.mod file?

I think a better description of the state of the world than the import being "indirect" is that it was "manual". It's not a normal indirect import where A imports B. It's C was added manually but isn't needed (yet).

I like this point of view. If adding the new dep as a direct require seems inappropriate, perhaps it could be marked as // manual instead? This would serve the same purpose: Allow tooling to distinguish between deps that project code may (currently or in the near future) reference from those that are necessary to build the project, but should not be directly referenced by code in the current module.

@rsc
Copy link
Contributor

rsc commented Aug 14, 2024

I still don't think we should make changes here. Manually added dependencies can be considered indirect, meaning they are not directly used by the source code.

If you want to specify a specific version, you can still add the import where you plan to use it and then run go get module@version to add the specific version you want. (Or run go get to add the latest version of whatever is needed by your source files.)

@rsc
Copy link
Contributor

rsc commented Aug 29, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Sep 4, 2024

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Sep 11, 2024

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed Sep 11, 2024
fmeum added a commit to bazelbuild/rules_go that referenced this issue Sep 15, 2024
**What type of PR is this?**

Feature

**What does this PR do? Why is it needed?**

When running `go get example.com/foo@v1.2.3`, the module
`example.com/foo` will be added to `go.mod` with an `// indirect`
comment, just like its transitive dependencies. This results in
`go_deps` not being able to distinguish this explicitly requested new
dep from other indirect deps that shouldn't be imported by the root
module. Running `go mod tidy` or `go get` without arguments is required
to have the comment removed after adding a reference to the new module
in code.

This change makes it so that `bazel run @rules_go//go get
example.com/foo@v1.2.3` marks the requested module as a direct
dependency. This realizes golang/go#68593 in
our custom wrapper, thus making this command the only one needed to add
a new module dependency and have it work with a subsequent Bazel build
(assuming that also updates the BUILD files with Gazelle). This [matches
the behavior of `gopls` when adding a new
dependency](https://github.com/golang/tools/blob/ec1a81bfec7cc6563474b7aa160db30df9bfce6b/gopls/internal/server/command.go#L805-L809).

---------

Co-authored-by: Son Luong Ngoc <sluongng@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Declined
Development

No branches or pull requests

7 participants