-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to make sandbox work in Docker with its default seccomp filter #2719
Conversation
c286afc
to
f9668ed
Compare
@@ -700,6 +700,20 @@ password <replaceable>my-password</replaceable> | |||
</varlistentry> | |||
|
|||
|
|||
<varlistentry xml:id="conf-sandbox-use-pivot_root"> | |||
<term><literal>sandbox-use-pivot_root</literal></term> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this kind of option. Is there any way we can detect whether pivot_root
works / we're in a container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can detect specific, but it depends on how the container is implemented. See this answer + comment for example. Also it seems that we'd be checking the wrong thing. The failure seems more likely caused by mount namespaces rather than cgroups, or it can be caused by a security module.
What we can do instead is make the pivot_root
feature degrade gracefully, but then it may go unnoticed when it needs fixing.
Why does pivot_root not work in Docker? The sandbox works fine inside at least some other container technologies. |
@catern It seems to work fine in NixOS containers, but that's because those reuse the host nix-daemon, so the sandbox isn't actually containerized. I did run the sandbox in LXC without the pivot_root problem, but that did suffer from the Looking at the |
It seems to me that graceful degradation is the right direction for this PR. |
What do you mean with "trust that it keeps working"? Maybe it's easiest to just fall back if However, the commit message of 5451b8d suggests that if we don't use |
I meant that if for whatever future reason our implementation breaks, we may not notice the security downgrade because we have a fallback in place. So either we remove the option and take a security risk, or we leave the option in so the user can decide for themselves whether deploying in a container is a good idea or not. (For future reference, this solution came from our internal needs and we were not planning to use this in a multi-tenant way) |
I wasn't thinking of NixOS containers, but rather other container systems I've developed myself, within which the Nix sandbox works fine. But if the Nix sandbox also works fine in LXC without the pivot_root problem, perhaps the issue is with Docker or your configuration of Docker, not with Nix. You should find out concretely what the issue is with Docker first. Also: I'm not sure this /proc change is correct. Won't that give us a /proc instance for the parent pid namespace, not the child pid namespace we created for the builder? The pids found in /proc will be wrong (though /proc/self will be right). /proc doesn't magically handle this AFAIK, you have to mount a new instance for new pid namespaces. |
Does it even make sense to use the sandbox in a container? Doesn't the container already provide enough isolation? |
Containers are not a great security device, particularly if you have to open them up to do something like this. So that's not why we're using a container. We're using the container as a method of deployment and for practical isolation.
Our use case requires the purity but not security. |
The issue is not that /proc is unsandboxed. The issue is that an incorrectly mounted /proc will cause bugs in the software you run. Anything that gets a PID (from, say, getpid or fork) and looks it up in /proc will break. |
Thanks @catern, I didn't think this may be an issue. I'll have to do some more research. One way to make procfs mounting work is to unmount all the bind mounts on |
Maybe we can have a new, weaker sandbox mode (rather than a special boolean flag) that doesn't use PID and user namespaces. That way we don't need to mount a new /proc in the sandbox. |
This blog helped me understand the procfs problem: https://kinvolk.io/blog/2018/04/towards-unprivileged-container-builds/#what-about-procfs The author talks about efforts to change the Linux kernel such that
Alternatively, @jessfraz's blog talks about her adding support in Docker and Kubernetes to leave the procfs mount alone (as opposed to hiding/mounting over various paths):
@roberth, I suspect you'll want to use one of those options in Docker/Kubernetes to side step the procfs issues. |
Note that the aforementioned support in Docker hasn't been surfaced through the CLI yet: docker/cli#1347 |
@cstrahan Thanks for the pointers. We seem to have a working solution now, based on the following
That's good enough for our use case, but it will be interesting to see how these permissions can be reduced in the future. Here's our arion service module that makes it work. I haven't decided yet whether such a module should be part of arion or a separate repo.
|
Regarding the |
This is required for running the sandbox in a privileged container.
f9668ed
to
da0cee2
Compare
I marked this as stale due to inactivity. → More info |
I haven't need this for some time now. |
Your diff helped me to install nixos from a rescue environment from a cloud provider that doesn't support pivot_root. Worked like a charm. Probably super rare situation. |
MNT_LOCKED
)