Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find out how many users want lazy downloading of outputs, but can't use a daemon or FUSE / NFS #13669

Closed
philwo opened this issue Jul 13, 2021 · 6 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@philwo
Copy link
Member

philwo commented Jul 13, 2021

Dear Bazel users,

I need to collect your feedback to judge which variants of lazy downloading of output files Bazel should implement.

  1. One proposal is to implement lazy downloading of outputs via an external daemon process that would present the full output tree as a virtual filesystem that you can mount via FUSE or localhost loopback NFS. Basically, your bazel-out folder would not actually exist on disk, but instead be a FUSE / NFS mount that shows all files and then lazily download them as you access them.

  2. We're also considering extending Bazel's Builds-without-the-Bytes (--remote_download_outputs=minimal) feature with a command-line interface and API to support browsing and downloading of artifacts after a build.

We currently don't know whether we should implement just the first or both of the proposals. We don't want to add unnecessary complexity to Bazel and its code, but we also want to offer lazy downloading to all users who need it.

Basically, I'm trying to find out how many users we have who

  • ✅ use remote execution,
  • ✅ need to access outputs of their build that they can't specify explicitly before the build,
  • ❌ can't / don't want to use a separate daemon and FUSE / NFS mounts.

Thoughts? :)

@philwo philwo self-assigned this Jul 13, 2021
@philwo philwo added P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request labels Jul 13, 2021
@philwo
Copy link
Member Author

philwo commented Jul 13, 2021

FYI @brentleyjones as I noticed you commented on a related PR recently.

@EricBurnett
Copy link

We speculated on the need for (2) when BWtB was being conceived, but I have yet to come across a motivating use-case or user who would actually want it...most usage of BWtB I see where some outputs are desired is easily enough handled by using toplevel instead and specifying the targets of interest in the command.

So my 2c is that (2) is reasonable in the abstract, but doesn't seem particularly important to users in practice (at least the ones I've talked to), and so doesn't seem terribly worth prioritizing. (1) would be much more valuable though, and I think get a lot of usage if it existed. I'd vote to focus on just that.

@gravypod
Copy link

gravypod commented Jul 13, 2021

This is probably out of scope but:

@philwo is this hypothetical external process that handles presenting a fake bazel-out file system something that the bazel team would be implementing? If so there are quite a few areas that could benefit from having some kind of API for accessing that content. Speaking just for rules_docker: if your CI system executes bazel build //:some_image and some other binary could talk to the service running this storage layer and say fopen("remote-output/builds/<build identifier>/some_image.tar") to then push to a registry I can see a lot of value that could be added.

This could compound if we could embed metadata into this content to generate a chain of trust. Like: this bazel build command executed on a specific git sha and we are 100% sure that the binaries came from that source code.

@evantorrie
Copy link

Prefer 1. lazy loading via a FUSE style daemon

@brentleyjones
Copy link
Contributor

We use remote execution. Currently we hack --remote_download_outputs=toplevel with an aspect to get certain artifacts that are deeper in the tree than the top level, but it has bad performance on large outputs, so we want one of these two solutions. I know 1 is in reference to the RemoteOutputService that Ed is proposing, and while I like the idea, when I tested it with macFUSE the performance was worse than just downloading stuff. I also feel the burden of requiring setting up the external service and filesystem changes shouldn't be needed to make conditional output downloading work, so I would really like 2 to be implemented.

@coeuvre
Copy link
Member

coeuvre commented Sep 7, 2023

BwoB is now enabled by default in Bazel@HEAD and we have flag --experimental_remote_download_regex to let users specify what extra outputs to download. We will also support FUSE as described by #12823. Closing.

@coeuvre coeuvre closed this as completed Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

No branches or pull requests

6 participants