-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sandboxed containers enhancement proposal #677
Conversation
/test markdownlint |
Closing this one as we were able to unblock #366. |
/test markdownlint |
1 similar comment
/test markdownlint |
/test markdownlint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems sane to me!
### KataContainers Operator development | ||
The goal is to develop a kata operator that can be used to manage the entire lifecycle of kata components in an OpenShift cluster. We will have a controller that watches for a kata **custom resource (CR)** and a daemon-set that acts upon changes to the (CR) and do the required work on the worker nodes (install, uninstall update,...). | ||
|
||
Via the CR it will be possible to select a subset of worker nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this out of curiosity? Why not e.g. enable the feature on all available nodes but use regular nodeSelector
for steering?
Is it because it fully takes over the crio backend and so hence all containers on that node are kata? That seems unfortunate if true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @cgwalters!
Why is this out of curiosity?
The default behaviour is to deploy on all available worker nodes. However, it's been decided to leave the option for the admin to select the nodes they want, if they don't want all the worker nodes using Kata Containers.
Why not e.g. enable the feature on all available nodes but use regular nodeSelector for steering?
That would simplify our code, indeed. Shall we consider removing the option from the operator?
Is it because it fully takes over the crio backend and so hence all containers on that node are kata? That seems unfortunate if true.
No, that's not the case, at all. A note capable of doing Kata Containers should not, and will not, lose its ability to do "vanilla" containers.
@cgwalters, have I addressed all your concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behaviour is to deploy on all available worker nodes.
Sure, seems reasonable. In a compact cluster the control plane are workers too so this will apply there.
That would simplify our code, indeed. Shall we consider removing the option from the operator?
I don't have a really strong opinion on this but if having admins steer the pods with regular selectors works then why not use that? Less API surface for you to maintain, and feels more native to admins, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is scheduling support for runtime classes that we can make use of. The enhancement should mention that.
https://v1-16.docs.kubernetes.io/docs/concepts/containers/runtime-class/#
It should be noted that Kata Containers differ from Kubevirt: | ||
|
||
- Kubevirt aims to run **VM images** on OpenShift providing the VM with the look and feel of a virtual machine running as a legacy VM (CRD defining a Virtual Machine in Kubernetes) | ||
- Kata Containers aim to run **container images** on OpenShift in an isolated manner using virtualization tools (uses Runtime Class resource to choose Kata only for specific workloads which require kernel level isolation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semi random but there's actually a 3rd class of thing in here, which is "containers that use virt" - what we do with https://github.com/coreos/coreos-assembler/ overlaps a lot with kata; we want to control qemu, not be inside qemu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting, and I wonder whether the minimalist QEMU we plan to use for the extensions could also benefit CoreOS Assembler.
Do you mind if we switch this conversation to the Slack, at some point soon, to try to sync on the usages and then, maybe, come up with a full document on how we plan to use QEMU as part of RHCOS, and hopefully agree in the minimal QEMU that could attend yours (CoreOS Assembler) and ours (Kata Containers) needs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the qemu we use to build (and test) RHCOS (and FCOS) is included in the coreos-assembler container; it has nothing to do with the host, which is the opposite of kata. It's closer to KubeVirt in a way except this usage is special in that we're not trying to run production VMs that e.g. expose networking. We don't require any privileges, just KVM available.
I have a blurb on this here https://hackmd.io/8YCMjP_JTzu-vp7PrOjT0g that I need to stick somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need a minimal qemu really - our container image is already enormous anyways (2.5G+).
What may be good to share though is e.g. Go bindings for interacting with qemu - I touched on that in coreos/coreos-assembler#1336 (comment)
|
||
The Operator will: | ||
- Create a crio drop-in config file via machine config objects; | ||
- Automatically select the payload image that contains correct version of the kata binaries for the given version of the OpenShift; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some private discussion it turns out there's a lot more detail behind this bit that we'll want to elaborate on. From what I can see I lean towards having the MCO install this extra content as an extension (RPM) too.
This also relates to coreos/fedora-coreos-tracker#354 etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgwalters, I've rewritten this PR, almost entirely, to cover what we current do, what we could do, etc.
Please, whenever you have some cycles, take a look at the changes.
Also, please, help me to loop in all the needed parts that can help to decide whether we could go for having the Kata Containers RPMs also as part of the RHCOS extensions (even with those being ~100MB, installed size).
In what started as a non related conversation with @cgwalters (https://coreos.slack.com/archives/CK1AE4ZCK/p1614974860056400), we agreed that more discussions have to happen, mainly with regards how Kata Containers will be delivered to the OpenShift customers (as in, whether it'll be part of the Extensions, whether it'll be binaries being copied by the Kata Operator, etc). Based on that, this PR will be updated accordingly, probably by the beginning of the next week. |
/hold |
- Create a crio drop-in config file via machine config objects; | ||
- Automatically select the payload image that contains correct version of the kata binaries for the given version of the OpenShift; | ||
- Configure a `RuntimeClass` called `kata` which can be used to deploy workload that uses kata runtime; | ||
- Create a machine-config object to enable the `qemu-kiwi` RHCOS extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say instead have single MC object 50-kata-containers-worker
or so that does both this and the crio config to make it more atomic. Otherwise there's a race where the MCO might try to roll out just the crio change but not the extension, etc.
|
||
#### KataContainers Operator Goals | ||
- Create an API which supports installation of Kata Runtime on all or selected worker nodes | ||
- Configure CRI-O to use Kata Runtime on those worker nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there more to this than just the MC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgwalters, I don't fully understand the question.
"Configure CRI-O to use Kata Runtime on those worker nodes" is just a drop-in file, performed by: https://github.com/openshift/sandboxed-containers-operator/blob/fbbc74e21e2fc4f3d0874dd201513a801a1a333c/controllers/openshift_controller.go#L281-L362
As I'm not sure whether this is what you asked, mind to rephrase the question, with more details is possible, so I can try to answer?
(by the way, sorry if I end up missing some details, the person who majoritarily worked on the Operator side is on paternity leave and I'm taking this over from him).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait, I'm wrong, I'm thinking only about the RPM sides.
"Configure CRI-O to use Kata Runtime" also contains:
- applying RBAC policies;
- defining the CRDs;
- creating the "kata" runtime class;
- Configure CRI-O to use Kata Runtime on those worker nodes | ||
- Installation of the runtimeClass on the cluster, as well as of the required components for the runtime to be controlled by the orchestration layer. | ||
- Updates the Kata runtime | ||
- Uninstall Kata Runtime and reconfigure CRI-O to not use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If done via extensions, this would all be handled by MCO reconciliation. See also https://blog.verbum.org/2020/08/22/immutable-%E2%86%92-reprovisionable-anti-hysteresis/
I think more broadly, if everything goes via MCO functionality then e.g. node changes will be fully coordinated. Having more host daemonsets etc. can create synchronization issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "everything goes via MCO functionality" you mean ... just forget the operator and teach the MCO to deal with Kata Containers' needs, entirely?
While I think that's a really good idea, I don't think it's do-able for 4.8 timeframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, I just mean that the operator installs machineconfigs, rather than reaching out to each node individually. This is a pattern followed by for example the cluster-node-tuning-operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that we do. :-)
TotalNodesCount int `json:"totalNodesCount"` | ||
|
||
// CompletedNodesCounts is the number of nodes that have successfully completed the given operation | ||
CompletedNodesCount int `json:"CompletedNodesCount"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we fully use the MCO, then all this stuff is covered by the MCO objects, specifically machineconfigpool
.
/cc @zanetworker, @bpradipt, @jensfr for visibility |
77e79b8
to
d5f833d
Compare
@bpradipt, would you mind to take a careful look on the bits describing the operator? That's one part of the project I barely touched and I don't confident to ensure the text matches reality. |
@zanetworker, I will need your help to ensure this aligns with your vision. |
@mrunalp, I really would love your input, mainly as things changed a lot from the time when the original PR was created, till what we currently have nowadays. |
@fidencio I'd love to use this feature to isolate less trustworthy / higher risk (eg. Internet facing) workloads - do you think this is worth adding? For these workloads, Firecracker would be a great, could this be added in the future? I presume NetworkPolicy be applied on the host, outside of the VM (ie. provides a higher level of assurance)? |
I think this is covered by:
Or would you prefer have it explicitly set there?
Right now we prefer to focus on the VMM we have a huge team working on, and maintaining it, which is QEMU. I don't think this proposal is the right place to discuss this, but maybe https://github.com/openshift/sandboxed-containers-operator/ could be. I'd suggest opening an issue there, then we can start the discussion and move to the another (more appropriated?) place if needed. Does this sound reasonable?
Please, open us an issue and we can discuss it there. :-) |
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
enhancements/sandboxed-containers/sandboxed-containers-tech-preview.md
Outdated
Show resolved
Hide resolved
@mrunalp, the last requests have been addressed. Thanks for the review. |
Signed-off-by: Ariel Adam <aadam@redhat.com> Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: cgwalters The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This supersedes #366, as it provides fixes for the lint.
The reason a new PR has been opened is because Jens Freimman, who's the person dealing with #366 recently became a father and is out for his paternity leave.