Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax input tree requirements #207

Closed

Conversation

moroten
Copy link
Contributor

@moroten moroten commented Aug 9, 2021

To reduce the workload of the remote execution clients, the requirement of constructing Merkle trees is removed. Instead, the client can choose any structure of it input file tree it would like. A client might choose to replicate the internal dependency graph.

The down side is that different clients, even different versions of the clients, might not get cache hit on effectively the same action. The use case of sharing the cache for different clients is not a realistic use case. Also, clients are most likely stable in their serialization between versions.

Fixes #141.

This first draft basically duplicates the file tree messages, i.e. DirectoryNode gets a copy called RelaxedDirectoryNode. Another option is to reuse the existing file tree messages (Directory, FileNode etc.) but let name represent a "relaxed path" instead.

@google-cla google-cla bot added the cla: yes Pull requests whose authors are covered by a CLA with Google. label Aug 9, 2021
To reduce the workload of the remote execution clients, the requirement
of constructing Merkle trees is removed. Instead, the client can choose
any structure of it input file tree it would like. A client might choose
to replicate the internal dependency graph.

The down side is that different clients, even different versions of the
clients, might not get cache hit on effectively the same action. The use
case of sharing the cache for different clients is not a realistic use
case. Also, clients are most likely stable in their serialization
between versions.

Fixes bazelbuild#141.
@moroten moroten force-pushed the relax-input-tree-requirement branch from a70b2c6 to ff9cba7 Compare August 9, 2021 07:54
Comment on lines +905 to +909
// The paths MUST use `/` as path component separator, even if the environment
// where the Action is executed use other separators. If the path starts with
// `/` it is considered to be refering to the root of the file tree, otherwise
// the path is relative to the parent directory. The path MUST NOT contain any
// `.` or `..` components. An empty string refers to the same directory.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the RelaxedFileTree contains a base path, one can require all children paths to be relative to that base path.

@juergbi
Copy link
Contributor

juergbi commented Aug 9, 2021

At least long term (v3) I'm in favor of relaxing the input tree requirements. However, my preferred approach would be to modify DirectoryNode, replacing the digest by a one_of with the choice of Digest or an embedded Directory (as suggested in #140 (comment)). That seems less invasive in the proto definition and probably also in client and server code. Especially as this is for v2 where I'm skeptical about large changes. Or am I missing an issue or disadvantage with my suggestion?

@juergbi
Copy link
Contributor

juergbi commented Aug 9, 2021

Any thoughts on the output side? I think if we add support for a flexible directory tree, we should fix the asymmetry between inputs and outpus and also replace tree_digest in OutputDirectory with a one_of. Either allow a RelaxedFileTree digest or, following the approach of my previous comment, a Directory digest. This would allow moving away from Tree messages, addressing #170.

This may need a flag in the Command to let the client opt-in.

@EdSchouten
Copy link
Collaborator

EdSchouten commented Aug 9, 2021

Call me crazy, but I've never understood this discussion. Nobody seems interested in presenting in actual detail what the problem is. And also very importantly: nobody has even described why this cannot be fixed by optimizing the client.

For example, I can imagine that Bazel has a very hard time with setups like the following. Assume there are a million files named "%d/%d" % (i, j), where i and j are both between 0 and 999. Now let's place them in a thousand filegroups, where each filegroup contains exactly one file from every directory:

N = 1000

[
    filegroup(
        name = "fg%d" % j,
        srcs = [
            "%d/%d" % (i, j)
            for i in range(N)
        ],
    )
    for j in range(N)
]

If we then merge these file groups all over the place, stuff gets complicated. Computing Merkle trees for that is simply hard. But that's not how any normal source tree works, right? In reality, dependency graphs roughly mirror the file system layout. Because they do, you rarely need to construct full Merkle trees. You can often get away with merging existing ones together. Ones that don't tend to have too much nested overlap, meaning that merging is cheap.

My observation has always been that the slowness is not caused by using a Merkle tree per se. It's because the Merkle tree generation is currently just bolted on as part of Bazel's REv2 client code. For every action that gets run, Bazel starts from a virtually clean slate when generating the Merkle tree. This is just silly, as actions often share large parts of the file system. If Bazel's core data structures could somehow memoize (abridged versions of) REv2 Merkle trees, then the process would become a lot more light-weight than what it currently is.

I think we should first go back and write a report on what the problem actually is. What is the (amortized) algorithmic complexity of Bazel's Merkle tree computation right now? What is optimal, given the existing protocol? What is the expected improvement, if any, of having this change applied?

@moroten
Copy link
Contributor Author

moroten commented Aug 9, 2021

Call me crazy, but I've never understood this discussion.

I appreciate that you are honest. Thank you!

My observation has always been that the slowness is not caused by using a Merkle tree per se. It's because the Merkle tree generation is currently just bolted on as part of Bazel's REv2 client code. For every action that gets run, Bazel starts from a virtually clean slate when generating the Merkle tree. This is just silly, as actions often share large parts of the file system. If Bazel's core data structures could somehow memoize (abridged versions of) REv2 Merkle trees, then the process would become a lot more light-weight than what it currently is.

I think we should first go back and write a report on what the problem actually is. What is the (amortized) algorithmic complexity of Bazel's Merkle tree computation right now? What is optimal, given the existing protocol? What is the expected improvement, if any, of having this change applied?

The profiling I've done in bazelbuild/bazel#10875 points to two specific functions in Bazel that lists all files and creates the Merkle tree. The most expensive operation there is O(n*log(n)) for sorting the list of all input files. My focus has been to remove that altogether.

My idea has been to only create the Directory message once per path when looking at a single action. Memoizing that is impossible as the dependency graph might look like /foo/impl.cpp -> /foo/private/inc.h -> /base.h, in which case you need to know all files before constructing the Merkle tree. Your comment that merging is cheap makes sense. Then each part of the dependency chain will get a new root of the Merkle tree, but merging three times should still be cheaper than existing O(n*log(n)). I will give that a try in Bazel.

Assuming 20 files per folder, 200 folders => 4000 files where up to 40 merges are needed, then n*log(n) = 4000*12 = 48000 is significantly larger than 40*20 = 800 so such an implementation should be enough if the constant factor is small enough.

Measurements show that we save 30% of Bazel's execution time when removing the current Merkle tree code (measured on our production CI over two days). Let's see what I end up with when merging Merkle trees.

@sstriker sstriker added this to the Remote Execution API v3 milestone Aug 10, 2021
@moroten
Copy link
Contributor Author

moroten commented Aug 19, 2021

I've created bazelbuild/bazel#13879 which implements Merkle tree memoization and merging in Bazel. One example of 3000 inputs cut the calculation time from 78 ms to 0.7 ms. With the fast alias hash I've tried before, I got down to 0.1 ms. Bazel has more overhead, in the range of 10 ms for my example action, so 1 ms or 0.1 ms doesn't really make any difference.

Now, let me do some measurements on this patch for a few days before I post real world results.

@EdSchouten
Copy link
Collaborator

Wow! That's great news, @moroten. Thanks a lot for working on this.

@moroten
Copy link
Contributor Author

moroten commented Sep 14, 2021

Closing this issue as bazelbuild/bazel#13879 solves the problem. Further discussions about closing #141 can be held there.

@moroten moroten closed this Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Pull requests whose authors are covered by a CLA with Google.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

V3 idea: Loosen the restriction of action input as Merkle tree
4 participants