Commit b999376 ("nsenter: cloned_binary: remove bindfd logic
entirely") removed the read-only bind-mount logic from our cloned binary
code because it wasn't really safe because a container with
CAP_SYS_ADMIN could remove the MS_RDONLY bit and get write access to
/proc/self/exe (even with user namespaces this could've been an issue
because it's not clear if the flags are locked).
However, copying a binary does seem to have a minor performance impact.
The only way to have no performance impact would be for the kernel to
block these write attempts, but barring that we could try to reduce the
overhead by coming up with a mount that cannot have it's read-only bits
cleared.
The "simplest" solution is to create a temporary overlayfs using
fsopen(2) which uses the directory where runc exists as a lowerdir,
ensuring that the container cannot access the underlying file -- and we
don't have to do any copies.
While fsopen(2) is not free because mount namespace cloning is usually
expensive (and so it seems like the difference would be marginal), some
basic performance testing seems to indicate there is a ~60% improvement
doing it this way and that it has effectively no overhead even when
compared to just using /proc/self/exe directly:
% hyperfine --warmup 50 \
> "./runc-noclone run -b bundle ctr" \
> "./runc-overlayfs run -b bundle ctr" \
> "./runc-memfd run -b bundle ctr"
Benchmark 1: ./runc-noclone run -b bundle ctr
Time (mean ± σ): 13.7 ms ± 0.9 ms [User: 6.0 ms, System: 10.9 ms]
Range (min … max): 11.3 ms … 16.1 ms 184 runs
Benchmark 2: ./runc-overlayfs run -b bundle ctr
Time (mean ± σ): 13.9 ms ± 0.9 ms [User: 6.2 ms, System: 10.8 ms]
Range (min … max): 11.8 ms … 16.0 ms 180 runs
Benchmark 3: ./runc-memfd run -b bundle ctr
Time (mean ± σ): 22.6 ms ± 1.3 ms [User: 5.7 ms, System: 20.7 ms]
Range (min … max): 19.9 ms … 26.5 ms 114 runs
Summary
./runc-noclone run -b bundle ctr ran
1.01 ± 0.09 times faster than ./runc-overlayfs run -b bundle ctr
1.65 ± 0.15 times faster than ./runc-memfd run -b bundle ctr
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>