Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CNI-configured VM network interfaces. #257

Merged
merged 1 commit into from
Sep 24, 2019

Conversation

sipsma
Copy link
Contributor

@sipsma sipsma commented Sep 12, 2019

Signed-off-by: Erik Sipsma sipsma@amazon.com

Closes #14

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

examples/taskworkflow.go Outdated Show resolved Hide resolved
internal/netTestutils.go Outdated Show resolved Hide resolved
runtime/cni_integ_test.go Outdated Show resolved Hide resolved
docs/getting-started.md Outdated Show resolved Hide resolved
@@ -80,11 +80,14 @@ cd ~
# overlay
# * firecracker-containerd, an alternative containerd binary that includes the
# firecracker VM lifecycle plugin and API
# * tc-redirect-tap and other CNI dependencies that enable VMs to start with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double-check: did you run through the quickstart guide on a real instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run all the commands, but only on my dev instance inside a container. I'll try on a fresh instance with the exact setup/commands here too to be safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm glad I did this (tried just running it on a fresh i3.metal w/ Debian AMI) because it revealed a bunch of issues, some pre-existing, some new:

  1. The pre-existing line exec newgrp docker appears to just end the script (I guess since it's exec'ing to a different process?).
    • I got rid of this line and use sg to run with the docker group when it's needed, let me know if there's a better way.
  2. I forgot during my previous runs that I was using my fixed ptp plugin that didn't have the DNS issue I found.
    • I have the PR out to fix that now, but for now I added a note to the getting-started guide that DNS resolution may not work until that is fixed in ptp.
    • I will create an issue to follow up on updating the getting-started guide if my ptp fix is not merged by the time this commit is merged.
  3. I had not tested with the image-builder rootfs (just the default one), which revealed the need for adding an empty /etc/hosts file to the image-builder rootfs (though Noah coincidentally just added that to his unrelated PR)
    • I added that to this PR too
  4. The PTP plugin seems to fail to add iptables rules that allow the VM outbound internet access, though the VM is capable of pinging the IPs on the host itself. I highly suspect this is due to the fact that I ran this on the host (previously just inside a container) where there are all sorts of iptables rules added automatically by docker.
    • I added a note to the getting-started guide that while the demo setup should give the VM access to host IPs, it may or may not grant the VM outbound internet access depending on the details of the user's host network.

I don't think I can do it in this PR, but I think we should probably add an example test that just runs the quickstart script so we don't get into this situation again. I'll add an issue for following up on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-existing line exec newgrp docker appears to just end the script (I guess since it's exec'ing to a different process?).

Are you running this as a script or as an interactive session? When exec newgrp docker is run in an interactive session, it spawns a new shell that you can continue to use for commands.

I think sg is fine too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note to the getting-started guide that while the demo setup should give the VM access to host IPs, it may or may not grant the VM outbound internet access depending on the details of the user's host network.

Is this something we can fix for the quickstart since we're specifying exactly what should be run on the host?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is just a demo a users would inevitably need to customize for their own network/use-case anyways [...]

It is, but it's also meant to be as much of a working setup as we can make it. If it's not something that you think you can address right now, can you open an issue to track it?

Copy link
Contributor Author

@sipsma sipsma Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that it's probably just impossible for this to universally work; there's always going to be some particular network setup on some host that's going to result in the VM not having outbound access. But I agree we should try to minimize those cases as much as possible; we should just also leave the disclaimer in too I think.

I just tried out adding the CNI firewall plugin. I think it actually uncovered a bug in firewall because at first I was not able to use it due to the fact that its delete method doesn't conform to the CNI spec; it returns an error when there was no previous result (which the CNI spec explicitly says it shouldn't do) which causes us to hit the error on this line.

We can probably fix that issue upstream, but in the meantime I tried with a forked go-sdk that just logs that error instead of returning it. Then, using firewall actually does allow the VM to get outbound access as expected even on the more complicated network present on the i3.metal host.

Given all of the above, I'll follow up by:

  1. Trying to address the issue with the firewall plugin upstream
  2. Adding an issue to include the firewall plugin in this demo-setup once it has been fixed.
  3. Opening an issue in the Go SDK to add an optional "Force" parameter or something that allows users to ignore errors returned by delete methods (obviously it will default to false), since apparently it's not that uncommon to find plugins that don't perfectly conform to the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that it's probably just impossible for this to universally work; there's always going to be some particular network setup on some host that's going to result in the VM not having outbound access.

Right, I'm not saying that the quickstart needs to account for every situation, but rather that it accounts for itself. Assuming someone follows the quickstart guide exactly, it should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's why I updated the getting-started guide to specify that the expected end result is that the host IPs are accessible and that the outside internet may be accessible but might not depending on the details of the host network. I've added a similar note to the quickstart guide too now.

Once this PR is merged I'll create an issue for updating to use the firewall plugin, then once I fix the plugin upstream we can reduce the number of cases where outbound internet access doesn't work (but will still need to leave the qualification anyways).

I also created an issue to run the quickstart as a test case, which we should do no matter what: #263

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue for the firewall plugin here: #265

internal/netTestutils.go Outdated Show resolved Hide resolved
proto/types.proto Show resolved Hide resolved
proto/types.proto Outdated Show resolved Hide resolved
runtime/cni_integ_test.go Outdated Show resolved Hide resolved
runtime/cni_integ_test.go Show resolved Hide resolved
internal/netTestutils.go Outdated Show resolved Hide resolved
internal/netTestutils.go Outdated Show resolved Hide resolved
internal/netTestutils.go Outdated Show resolved Hide resolved
@sipsma sipsma force-pushed the cnipr branch 2 times, most recently from ca09e85 to f1ce526 Compare September 19, 2019 01:40
docs/networking.md Outdated Show resolved Hide resolved
runtime/cni_integ_test.go Show resolved Hide resolved
tools/docker/Dockerfile Show resolved Hide resolved
Signed-off-by: Erik Sipsma <sipsma@amazon.com>
@sipsma sipsma merged commit 5a77bc8 into firecracker-microvm:master Sep 24, 2019
@sipsma sipsma deleted the cnipr branch October 8, 2019 18:38
fangn2 pushed a commit to fangn2/firecracker-containerd that referenced this pull request Mar 23, 2023
…ependabot/go_modules/github.com/go-openapi/runtime-0.19.21

Bump github.com/go-openapi/runtime from 0.19.20 to 0.19.21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement networking
3 participants