-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unification constraints to NodeProperties
.
#221
base: main
Are you sure you want to change the base?
Conversation
A unification constraint specifies whether servers are allowed to unify specific filesystem nodes, for example using symlinks or hardlinks.
ALLOW_REDIRECTS = 0; | ||
|
||
// The server MAY use a filesystem alias, such as a hard link or bind | ||
// mount, to substitute for this node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard links and bind mounts are absolutely not the same thing. As far as I know, bind mounts each have their own st_dev, meaning that files essentially don't alias. They each get their own identity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Linux, files within bind mounts have the same st_dev
and st_ino
as the original.
$ uname -sr
Linux 5.14.0-1004-oem
$ mkdir srcdir dstdir
$ sudo mount -o bind srcdir dstdir
$ echo 'testing' > srcdir/test.txt
$ stat srcdir/test.txt
File: srcdir/test.txt
Size: 8 Blocks: 8 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 24259448 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ john) Gid: ( 1000/ john)
[...]
$ stat dstdir/test.txt
File: dstdir/test.txt
Size: 8 Blocks: 8 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 24259448 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ john) Gid: ( 1000/ john)
[...]
$ python3 -c 'import os; print(os.stat("srcdir/test.txt"))'
os.stat_result(st_mode=33204, st_ino=24259448, st_dev=64769, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8, st_atime=1650338444, st_mtime=1650338444, st_ctime=1650338444)
$ python3 -c 'import os; print(os.stat("dstdir/test.txt"))'
os.stat_result(st_mode=33204, st_ino=24259448, st_dev=64769, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8, st_atime=1650338444, st_mtime=1650338444, st_ctime=1650338444)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a specific use-case for wanting this to be node-level, rather than a hint or constraint specified at the level of the entire action? On one side, I'm trying to think of what build tools are going to have the information and desire necessary to provide this file-by-file, and on the other side, I'm unsure how often you'd actually need to disable an optimization only for a specific subset of files. My gut feeling here is that if we started by doing action-level constraints instead of file-level, that'd be both easier to adopt and cover the majority of use-cases. Though we could follow on with file-level still later, if there's a specific argument for it.
The examples I'm aware of are python interpreters and clang compilers. For clang, every input to the action is presumably to be read by clang (or is part of clang), so I'd expect the "no hardlink" constraint on every input file; for python, I guess only the python files themselves need to have the "no symlink" constraint applied and e.g. data files to be read at runtime could theoretically be aliased to each other, but if we imagine an action with enough data files that it's actively worth aliasing them to each other, maybe it'd be better for the build tool to specify that intentionally? (And if we're not forbidding hardlinks, servers can still do that under the hood if they have the capability.)
@@ -842,6 +842,58 @@ message NodeProperties { | |||
|
|||
// The UNIX file mode, e.g., 0755. | |||
google.protobuf.UInt32Value unix_mode = 3; | |||
|
|||
// Constrains server-side unification of identical filesystem nodes. | |||
NodeUnificationConstraint.Value unification_constraint = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Food for thought: repeated OptimizationConstraint
? I'm not sure the field needs to be super specific that it's about NodeUnification, or could be expanded to cover other things later.
NodeUnificationConstraint.Value unification_constraint = 4; | ||
} | ||
|
||
// A `NodeUnificationConstraint` specifies whether servers are allowed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about flipping this? Instead of mixing positive ("ALLOW_X") and negatives ("FORBID_X"), only stick to negatives? If we have the default be that the server is generally free to optimize, then this field would be for adding one or more restrictions against what the server is allowed to do.
My first worry is that having clients opt-in to optimizations seems like it'll lead to clients missing viable optimizations commonly, and a pile of "allow_X" optimizations on every single node from the rest. That also seems like it may be more future-proof: if we add a new optimization field, does every existing client not explicitly allowing it effectively stop being eligible for that optimization? Whereas we'd only start adding new ways to forbid optimizations if/when they were found to matter for a few cases. (If they only ever apply in a few cases, I guess I could see this, or just specifying the recommendation to do something in a slightly different way.)
A unification constraint specifies whether servers are allowed to unify specific filesystem nodes, for example using symlinks or hardlinks.
Compared to proposals #216 (cc @EricBurnett) and #217 (cc @EdSchouten), this design would let the client specify for each node whether it can be de-duplicated (1) with a symlink, (2) with a hardlink, or (3) not at all.
Use cases:
runfiles
?) to be marked non-symlinkable bazel#10299)#pragma once
implementation (https://lists.llvm.org/pipermail/cfe-dev/2021-June/068431.html)(hopefully I summarized the second use case correctly from the monthly meeting's discussion)