Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support publishing ports on specific network interfaces #14425

Open
PhrozenByte opened this issue May 30, 2022 · 17 comments
Open

Support publishing ports on specific network interfaces #14425

PhrozenByte opened this issue May 30, 2022 · 17 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. network Networking related issue or feature pasta pasta(1) bugs or features

Comments

@PhrozenByte
Copy link
Contributor

PhrozenByte commented May 30, 2022

/kind feature

Description

Right now we can publish a container’s port, or range of ports, to the host using podman create … --publish [ip:][hostPort:]containerPort … (et. al.). Even though we can indeed bind to a single IP address, we can't limit the binding to a single interface (using SO_BINDTODEVICE, see socket(7)). I suggest adding an option to support this, e.g. --publish [ip:][hostPort:]containerPort[@interface].

Reasoning

Binding to an interface is used to limit the listener's scope to a single network interface, which can get quite important in VLAN setups. As we all know, this is different to listening to a single IP address which can, in fact, be used on multiple network interfaces at the same time.

However, personally I have another use case in mind: To better utilize 0.0.0.0 resp. ::. If I try publishing port 53 of a container running a DNS server, it will fail with a "bind: address already in use" error. This is expected behaviour, because Podman's dnsname plugin will start an Dnsmasq instance listening on 0.0.0.0:53, limited to the virtual network interface. Thus we can't bind to 0.0.0.0 on all network interface, simply because the address is indeed partially in use by Podman's Dnsmasq. However, this doesn't have to be: If we could tell Podman to limit binding to a single interface using SO_BINDTODEVICE, we could indeed publish the container's port to 0.0.0.0:53, just limited to another device, e.g. enp1s0. The option could be used like podman create … --publish 53:53@enp1s0 ….

Side note

Isn't the documentation lacking info about limiting the protocol with --publish ip:hostPort:containerPort/protocol, e.g. 80:80/tcp? edit: Fixed in #14451

@openshift-ci openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label May 30, 2022
@PhrozenByte PhrozenByte changed the title Support publishing ports on specific devices Support publishing ports on specific network interfaces May 30, 2022
@Luap99 Luap99 added the network Networking related issue or feature label May 31, 2022
@Luap99
Copy link
Member

Luap99 commented May 31, 2022

I agree that this would be useful. Not sure about the syntax why would this be the last option after the container port. What about [ip[@interface]:][hostPort:]containerPort[/protocol]

That said implementing something like this is very complicated. They are at least 6 different ways that have to handle port forwarding.

  1. netavark (iptables)
  2. CNI (iptables)
  3. The libpod port binding when (1 or 2 is used)
  4. slirp4netns port forwarder
  5. rootlessport forwarder
  6. gvproxy (for podman machine also has to work on macos)

These are just the current ones, they are likely more in the future:

  • netavark (firewalld)
  • netavark (nftables)
  • pasta

Some of the projects are not maintained by us so it would be more difficult to get support for this or maybe they already have support it?

@mheon
Copy link
Member

mheon commented May 31, 2022

I don't know if this is really possible for root Podman's network stack - there's really no firewall equivalent for that sockopt, so we'd basically be doing what we normally do, except looking up the IPs of the interface and using them instead of user-specified IPs.

@Luap99
Copy link
Member

Luap99 commented May 31, 2022

iptables has the -i options to filter a specific network interface AFAIK this should do the trick

@github-actions
Copy link

github-actions bot commented Jul 2, 2022

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jul 2, 2022

@mheon @Luap99 any movement on this issue?

@mheon
Copy link
Member

mheon commented Jul 5, 2022

No, no work has been done as of yet

@github-actions
Copy link

github-actions bot commented Aug 5, 2022

A friendly reminder that this issue had no activity for 30 days.

@github-actions
Copy link

github-actions bot commented Sep 5, 2022

A friendly reminder that this issue had no activity for 30 days.

@sbrivio-rh
Copy link
Collaborator

By the way, I just submitted a patch for passt/pasta implementing this using % as separator.

Rationale: RFC 4007, section 11, introduces % as delimiter between address and zone identifier in the context of IPv6 scoped address architecture (which has some analogies to what's going on here). This delimiter is commonly used in the representation of interface specifications for IPv6 addresses, see also the examples in section 11.3, so it looks like the most natural choice for IPv4 addresses as well.

I actually plan to send a pull request for Podman integration of pasta soon (I'm still working on test scripts), based on the existing out-of-tree patch, now that some distribution packages are available, but I wouldn't include the possibility to specify this using Podman's own port forwarding configuration infrastructure yet, given that the handling of this feature for other networking modes isn't obvious at all.

It would still be possible to configure this by passing opaque options directly to pasta.

kdrag0n pushed a commit to kdrag0n/passt-virtcontainer that referenced this issue Nov 9, 2022
Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable
SO_BINDTODEVICE for non-root users"), we can bind sockets to
interfaces, if they haven't been bound yet (as in bind()).

Introduce an optional interface specification for forwarded ports,
prefixed by %, that can be passed together with an address.

Reported use case: running local services that use ports we want
to have externally forwarded:
  containers/podman#14425

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
@eriksjolund
Copy link
Contributor

eriksjolund commented Mar 26, 2023

A side-note:

An alternative to using the command-line option --publish is to use socket activation.

It's possible to use the systemd directive BindToDevice and combine it with socket activation of containers to publish a container port on a specific network interface. I verified that it works both for systemd system services (running as root) and for systemd user services (running as your regular user on the host).

If the container image does not support socket activation, it's possible to create a work-around by using an additional container running /usr/lib/systemd/systemd-socket-proxyd

The container image was built from this Containerfile

FROM docker.io/library/fedora
RUN dnf -y update && \
      dnf -y install systemd && \
      dnf clean all

Here is a sketch

$ cd ~/.config/systemd/user/
$ grep ListenStream systemdsocketproxyd.socket 
ListenStream=0.0.0.0:3000
$ grep BindToDevice systemdsocketproxyd.socket
BindToDevice=enp0s2
$ grep systemd-socket-proxyd systemdsocketproxyd.service
	--network net1 localhost/systemdsocketproxyd /usr/lib/systemd/systemd-socket-proxyd web:80
$ grep net1 lighttpd.service 
	--network net1 \
$ grep name lighttpd.service 
	--name web \
$ curl $myip:3000
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8"/>
        <title>Hello world</title>
    </head>
</html>
$ curl -v 127.0.0.1:3000
*   Trying 127.0.0.1:3000...
* connect to 127.0.0.1 port 3000 failed: Connection refused
* Failed to connect to 127.0.0.1 port 3000 after 11 ms:  Couldn't connect to server
* Closing connection 0
 curl: (7) Failed to connect to 127.0.0.1 port 3000 after 11 ms: Couldn't connect to server
$

($myip is the host IP of the computer)

I plan to write some more detailed step-by-step instructions.

@sbrivio-rh
Copy link
Collaborator

Erik, by the way:

An alternative to using the command-line option --publish is to use socket activation.

It's possible to use the systemd directive BindToDevice and combine it with socket activation of containers to publish a container port on a specific network interface. I verified that it works both for systemd system services (running as root) and for systemd user services (running as your regular user on the host).

while this looks elegant and it's probably a better fit for some use cases, you can do the same with pasta, as it uses SO_BINDTODEVICE (with kernel availability detected at runtime, it requires Linux kernel 5.7 commit c427bfec18f2 net: core: enable SO_BINDTODEVICE for non-root users) to bind listening sockets, if you specify an interface separated by %.

Example from man page:

                     -t 192.0.2.1%eth0/22
                            Forward local port 22, bound to 192.0.2.1 and interface eth0, to port 22

and this is also checked in tests.

This is only available via opaque options at the moment (--net=pasta:-t,192.0.2.1%eth0/22,...), because other back-ends don't support this, so I'm not sure yet what's the best way forward to make this available in Podman's options directly.

@Luap99
Copy link
Member

Luap99 commented Mar 27, 2023

This is only available via opaque options at the moment (--net=pasta:-t,192.0.2.1%eth0/22,...), because other back-ends don't support this, so I'm not sure yet what's the best way forward to make this available in Podman's options directly.

Well we can add support to the --publish syntax but we would need to agree on such syntax first.
If we follow the ip%interface then we would need to parse it like this:
[[ip][%interface]:][hostPort:]containerPort assuming you are allowed to specify only the interface without an ip.

As for other network backends we can just throw an error in podman that it is not supported.

@sbrivio-rh
Copy link
Collaborator

Well we can add support to the --publish syntax but we would need to agree on such syntax first. If we follow the ip%interface then we would need to parse it like this: [[ip][%interface]:][hostPort:]containerPort

Yes, it makes sense to me, just:

assuming you are allowed to specify only the interface without an ip.

I didn't think of this use case until now. pasta takes ::%eth0/n:m or 0.0.0.0%eth0/n:m with n and m being port numbers but not both for the same n, and yes, :: there means IPv6-only. Let me fix that.

As for other network backends we can just throw an error in podman that it is not supported.

If you show me how/where or if you have some time to implement this, I can do the rest.

@Luap99
Copy link
Member

Luap99 commented Mar 27, 2023

First add a HostInterface field to https://github.com/containers/common/blob/c8d98ebb660e64bfc5479d0dd105b52f20ea58f8/libnetwork/types/network.go#L249

Then add it to the parsing logic here:

func CreatePortBindings(ports []string) ([]types.PortMapping, error) {

And lastly make sure to throw an error if netmode != pasta and hostIp is not empty, do it somewhere in pkg/specgen in the Validate() function for both contianer and pod.

@sbrivio-rh
Copy link
Collaborator

sbrivio-rh commented Mar 30, 2023

-[t|u] %INTERFACE/PORTS support added in passt version 2023_03_29.b10b983.

AkihiroSuda pushed a commit to AkihiroSuda/passt-mirror that referenced this issue Apr 4, 2023
Somebody might want to bind listening sockets to a specific
interface, but not a specific address, and there isn't really a
reason to prevent that. For example:

  -t %eth0/2022

Alternatively, we support options such as -t 0.0.0.0%eth0/2022 and
-t ::%eth0/2022, but not together, for the same port.

Enable this kind of syntax and add examples to the man page.

Reported-by: Paul Holzinger <pholzing@redhat.com>
Link: containers/podman#14425 (comment)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
@miwagner1
Copy link

What about binding a port/s from one container service to another container’s localhost kinda like a p2p link using Unix sockets? I imagine it should be much faster (pasta might be almost ready for this).

@sbrivio-rh
Copy link
Collaborator

What about binding a port/s from one container service to another container’s localhost kinda like a p2p link using Unix sockets? I imagine it should be much faster (pasta might be almost ready for this).

This is a bit unrelated with the ticket at hand, but let me answer here for simplicity.

I suppose the connections where you're looking for the megabazillion bits per second are TCP. You can indeed transfer data (not frames, not packets, just payload) between a TCP and a stream-oriented UNIX domain socket, for example with socat. You can also splice(2) it from/to a pipe and from there from/to another socket, and have a relatively low overhead (no copies, just syscall overhead). However, this implies having three sockets (two TCP and one UNIX domain socket), which is one too many.

But you could also splice data (again, using a pipe) between two TCP sockets. This is what pasta already does between the container and its parent namespace: look at that orange appendage at the top left of the diagram. This is the implementation.

It's not implemented between two arbitrary namespaces, because pasta at the moment takes care of a single network namespace or container, but you can already do this:

  • from one terminal:
$ ./pasta --config-net -t 5222
# iperf3 -s -p 5222
  • from another terminal:
$ ./pasta --config-net -T 5222
# iperf3 -c 127.0.0.1 -p 5222 -l 1M -Z
Connecting to host 127.0.0.1, port 5222
[  5] local 127.0.0.1 port 52028 connected to 127.0.0.1 port 5222
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  7.52 GBytes  64.6 Gbits/sec   38   4.18 MBytes       
[  5]   1.00-2.00   sec  7.41 GBytes  63.7 Gbits/sec   28   4.18 MBytes       
[  5]   2.00-3.00   sec  7.43 GBytes  63.9 Gbits/sec    0   4.18 MBytes       
[  5]   3.00-4.00   sec  7.38 GBytes  63.4 Gbits/sec    7   4.18 MBytes       
[  5]   4.00-5.00   sec  7.33 GBytes  62.9 Gbits/sec   13   4.18 MBytes       
[  5]   5.00-6.00   sec  7.31 GBytes  62.7 Gbits/sec    0   4.18 MBytes       
[  5]   6.00-7.00   sec  7.25 GBytes  62.3 Gbits/sec   24   4.18 MBytes       
[  5]   7.00-8.00   sec  7.26 GBytes  62.4 Gbits/sec   14   4.18 MBytes       
[  5]   8.00-9.00   sec  7.24 GBytes  62.2 Gbits/sec    0   4.18 MBytes       
[  5]   9.00-10.00  sec  7.28 GBytes  62.6 Gbits/sec   18   4.18 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  73.4 GBytes  63.1 Gbits/sec  142             sender
[  5]   0.00-10.07  sec  73.4 GBytes  62.6 Gbits/sec                  receiver

iperf Done.

(you can pass equivalent options for --net=pasta:... in Podman).

Here we splice between four TCP sockets (first namespace to pasta, pasta to loopback in init namespace, loopback to pasta in init namespace, pasta to second namespace).

If you want to splice between two sockets instead, you can probably write a separate tool doing that, but don't expect a big increase in throughput -- mind that the kernel copied exactly zero bytes for iperf3 above. However, sure, we have double syscall overhead compared to two sockets.

Another alternative would be to let pasta take care of multiple namespaces, and then support a port forwarding specification where you don't just specify ports, addresses, interfaces, but also the target namespace. It requires a bit of rework, because at the moment the pasta implementation only supports the notion of "the detached namespace" as opposed to "wherever we run".

If you want to work on this I'll be more than happy to give you pointers and assistance, let me know.

@dgibson dgibson added pasta pasta(1) bugs or features kind/feature Categorizes issue or PR as related to a new feature. and removed kind/feature Categorizes issue or PR as related to a new feature. labels Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. network Networking related issue or feature pasta pasta(1) bugs or features
Projects
None yet
Development

No branches or pull requests

8 participants