Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drake doesn't build with Bazel 5.3 #17763

Closed
ggould-tri opened this issue Aug 23, 2022 · 18 comments
Closed

Drake doesn't build with Bazel 5.3 #17763

ggould-tri opened this issue Aug 23, 2022 · 18 comments
Assignees
Labels
component: build system Bazel, CMake, dependencies, memory checkers, linters priority: medium type: bug

Comments

@ggould-tri
Copy link
Contributor

ggould-tri commented Aug 23, 2022

What happened?

Building Drake with Bazel 5.3 fails during lcmtypes generation, due to lcm-gen not finding libdrake_lcm.so.
This appears to be due to the path parsing changes introduced in bazelbuild/bazel#16008 .
Notably, there does not seem to be any correct value for tools/workspace/lcm/package.BUILD.bazel:113.

This does not affect mainline Drake CI on ubuntu, which uses a pinned bazel version. However it affects the Mac Monterey and drake-external-examples builds, which use the bazel apt site's default bazel package.

Version

No response

What operating system are you using?

Ubuntu 20.04

What installation option are you using?

compiled from source code using Bazel

Relevant log output

bazel-out/host/bin/external/lcm/lcm-gen: error while loading shared libraries: libdrake_lcm.so: cannot open shared object file: No such file or directory
@ggould-tri ggould-tri added type: bug configuration: bazel component: build system Bazel, CMake, dependencies, memory checkers, linters labels Aug 23, 2022
@jwnimmer-tri
Copy link
Collaborator

Also note that stl2obj (also called during build time) fails to load the vtk dynamic libraries, almost certainly for the same reason as lcm-gen.

@jwnimmer-tri
Copy link
Collaborator

Note that this is an acute problem for macOS Homebrew users, since a brew upgrade with give them bazel 5.3 and everything will be broken.

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Aug 23, 2022

See bazelbuild/bazel#16008 (comment) for probable cause.

See also bazelbuild/bazel#16153 for the pending fix (maybe).

@jwnimmer-tri
Copy link
Collaborator

Some other necessary follow-up, for the record:

  • The drake-ci report of BAZEL_VERSION = ... is lying now (as of the bazelisk PR [setup] Use bazelisk on macOS #17764). It's incorrectly querying the version (probably using the wrong cwd).

@jwnimmer-tri
Copy link
Collaborator

The remaining action here is to upgrade to Bazel 5.3.1 patch release, once upstream releases the fix. For the moment, we'll remain pinned to versions strictly less than 5.3.0.

@RussTedrake
Copy link
Contributor

This is biting some people (it actually bit me on one machine, too). The bazel issue linked above has been closed. what is the remaining action for Drake?

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Sep 15, 2022

For macOS, I assume? Drake's macOS install_prereqs install bazelisk and we have a .bazeliskrc and pins to USE_BAZEL_VERSION=5.1.0 so this should not be breaking anyone's builds anymore. (CI is passing.) Can you provide repro instructions for the failure?

The remaining action is to remove the 5.1.0 pin once Bazel 5.3.1 ships.

@ggould-tri
Copy link
Contributor Author

The necessary bazel fix was adopted into the bazel 5.3.1 release branch but 5.3.1 has not been released due to one remaining release blocking bug.

@RussTedrake
Copy link
Contributor

I've confirmed that re-running drake's install_prereqs restored bazel to 5.1.0. The failure was due to a brew upgrade done outside of install_prereqs.

@jwnimmer-tri
Copy link
Collaborator

Hmm. Maybe upgrading bazel re-points /usr/local/bin/bazel to point to bazel instead of bazelisk? That's annoying. I suppose we should advise un-installing bazel in that case? But that might be more annoying, long term.

@RussTedrake
Copy link
Contributor

I'll test tonight to make sure that brew upgrade is actually the correct and minimal reproduction; that is my understanding, but I will confirm.

@RussTedrake
Copy link
Contributor

RussTedrake commented Sep 16, 2022

Maybe it's ok afterall.

% setup/mac/install_prereqs.sh
...
Using bazelisk
...
% bazel --version
2022/09/16 05:16:41 Downloading https://releases.bazel.build/5.1.0/release/bazel-5.1.0-darwin-arm64...
2022/09/16 05:16:41 Skipping basic authentication for releases.bazel.build because no credentials found in /Users/russt/.netrc
bazel 5.1.0
% which bazel
/opt/homebrew/bin/bazel
% brew upgrade
==> Upgrading 11 outdated packages:
...
bazel 5.2.0 -> 5.3.0
...
==> Upgrading bazel
  5.2.0 -> 5.3.0

Error: Cannot install bazel because conflicting formulae are installed.
  bazelisk: because Bazelisk replaces the bazel binary

Please `brew unlink bazelisk` before continuing.
% bazel --version
bazel 5.1.0

I suspect that the machine wherebrew upgrade caused problems had probably not run install_prereqs since you switched to bazelisk.

@rpoyner-tri
Copy link
Contributor

Ubuntu only: I've noticed that bazel 5.3.1 caused the bazel test --config kcov workflow to break. Reverting to bazel 5.1 gets me back to a a working state.

Presentation:

  • run the desired test targets with --config kcov. It apparently succeeds.
  • do kcov-tool merge and examine the output. The report is just a skeleton with "NaN%" coverage.

An important symptom is that the locations deep in the bazel-testlogs tree where the data is supposed to land contain broken symlinks to the longer patths under ~/.cache/bazel .

I'll see what I can do about at least reporting errors when the problem happens.

@jwnimmer-tri
Copy link
Collaborator

Interesting timing! I was just checking that 5.3.1 fixes the shared libraries problem, in order to push this upgrade over the finish line. All of the normal build & test stuff I've tested manually (+ macOS) seems to pass OK now, so the kcov problem is the only thing holding us back from 5.3.1.

@rpoyner-tri
Copy link
Contributor

rpoyner-tri commented Sep 28, 2022

While I continue to investigate, here's a quick test for broken output data. It exploits the fact that find -L will only report files as being symlink type if the symlink is broken. A happy output tree should produce no output from this command; a sad output tree will report the names of broken symlinks.

$ find -L bazel-testlogs -name kcov |xargs -i find -L '{}' -type l

@rpoyner-tri
Copy link
Contributor

It looks to me like bazel 5.3.1 is zipping the kcov output data into test.outputs/outputs.zip, and deleting the unzipped data. So the data is not gone, it's just hiding.

I'll look at providing a patch.

@rpoyner-tri
Copy link
Contributor

#17992

@jwnimmer-tri
Copy link
Collaborator

Bazel fixed the regression as of 5.3.1. Drake is upgraded to 5.3.1 as of #18004.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: build system Bazel, CMake, dependencies, memory checkers, linters priority: medium type: bug
Projects
None yet
Development

No branches or pull requests

4 participants