Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LINUX: multiple IP addresses on QETH device override routing information on host system #230

Closed
fbi-ranger opened this issue Aug 2, 2019 · 20 comments
Assignees
Labels
L Linux only issue, such as with tuntap networking that doesn't occur on Windows. (*WON'T FIX*) The requested change was rejected or the described behavior is by design.

Comments

@fbi-ranger
Copy link

fbi-ranger commented Aug 2, 2019

Hercules version 4.2.0.0-SDL-g0b231b39-modified (4.2.0.0)
Host OS openSUSE 15.0
Guest OS z/VM 6.4
z/VM Guest z/OS 2.3

 

z/VM VSWITCH:

DEFINE VSWITCH VMINTRA1 TYPE QDIO RDEV EC00 CONNECT IP
MODIFY VSWITCH VMINTRA1 GRANT TCPIP

 

z/OS network interfaces:

1.) LCS (F00-F01) IP: 10.0.0.192/24 (= same subnet as all other systems on the private LAN)

Hercules config:

    F02.2 LCS -n /dev/net/tun 10.0.0.192  

in z/VM:

    DEDICATE 0F00 0F02 
    DEDICATE 0F01 0F03

in z/OS TCPIP PROFILE:

    DEVICE   LCS1   LCS       F00    AUTORESTART 
    LINK     LCS1   ETHERNet  0      LCS1 
    HOME 10.0.0.192     LCS1 

2.) OSD (EC00-EC02) IP: 10.10.0.192/24 (different subnet)

Hercules config:

    EC00.3 QETH CHPID EC IPADDR 10.10.0.190 NETMASK 255.255.255.0

in z/VM:

    NICDEF EC00 TYPE QDIO DEVICES 3 LAN SYSTEM VMINTRA1 CHPID EC 

in z/OS TCPIP PROFILE:

    DEVICE   PORT3      MPCIPA 
    LINK ETH2 IPAQENET  PORT3 
    HOME 10.10.0.192    ETH2 

in z/OS VTAM:

    OSATRL1 VBUILD TYPE=TRL 
    OSATRL3E TRLE LNCTL=MPC,READ=(EC00),WRITE=(EC01),DATAPATH=(EC02),      X
            PORTNAME=PORT3,                                                X
            MPCLEVEL=QDIO 

 

z/OS 2.3 registers the LCS address 10.0.0.192 also on the QETH, and the routing of the LINUX host is also changed. As a result, it defines the subnet 10.0.0.0/24 (the subnet of the host) to tun1, which is the interface for the QETH VSWITCH, and 10.10.0.0/24 addresses do NOT work since the route 10.10.0.0/24 to tun1 ends up being replaced). I have to manually fix the routing by adding manually 10.10.0.0/24 dev tun1 route.

In the Hercules system log it looks like

10:07:27 HHC00942I CTC: lcs device tap0 using mac 16:79:7C:01:BB:62
10:07:28 HHC03805I 0:EC02 QETH: tun1: Register guest IP address 10.10.0.192
10:09:14 HHC03805I 0:EC02 QETH: tun1: Register guest IP address 10.0.0.192

You see that there is a small time gap before the 10.0.0.192 address is registered on QETH device.

Routing information at the time z/VM has started: the tun1 device is the interface for QETH EC00-EC02:

LINUX host:

$ ip route
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.190
10.10.0.0/24 dev tun1 proto kernel scope link src 10.10.0.190

At this moment the z/OS guest is not running and also both, the LCS and the QETH address cannot be pinged. This works as expected.

Now when z/OS TCPIP stack is started and both devices LCS/QETH are activated by TCPIP this happens at host site:

LINUX host:

$ ip route
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.190
10.10.0.0/24 dev tun1 proto kernel scope link src 10.10.0.190
10.0.0.192 dev tap0 scope link  <-- This is added by Hercules for LCS

On Hercules:

13:17:00 HHC00942I CTC: lcs device tap0 using mac 16:79:7C:01:BB:62
13:17:01 HHC03805I 0:EC02 QETH: tun1: Register guest IP address 10.10.0.192

LINUX host:

$ ip route
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.190
10.10.0.0/24 dev tun1 proto kernel scope link src 10.10.0.190
10.0.0.192 dev tap0 scope link

In the meantime the QETH address is pingable. Works as expected.

Then on Hercules:

13:17:00 HHC03805I 0:EC02 QETH: tun1: Register guest IP address 10.0.0.192

LINUX host:

$ ip route 
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.190
10.10.0.0/24 dev tun1 proto kernel scope link src 10.10.0.190  <-- Deleted and
10.0.0.0/24 dev tun1 proto kernel scope link src 10.10.0.190   <-- replaced by this
10.0.0.192 dev tap0 scope link

 
In this situation 10.10.0.192 is not reachable any more because traffic 10.10.0.0/24 goes to the default route. Therefore 10.10.0.192 is not reachable any more.

This is the problem.

I think the 10.10.0.0/24 dev tun1 route should NOT be deleted and the 10.0.0.0/24 should be ADDED additionally.

@ivan-w Ivan,

It is NOT the case that those interfaces have the SAME IP addresses. Each has its own subnet 10.0.0.0/24 LCS , 10.10.0.0/24 QETH. Only the last octet is the same.

@fbi-ranger
Copy link
Author

fbi-ranger commented Aug 2, 2019

In case the QETH device is used as an uplink port to a VSWITCH, is the IP address at the QETH statement in the Hercules configuration file really needed? What is the need of defining this address? Because it is a tunX device???

@mcisho
Copy link
Contributor

mcisho commented Aug 2, 2019

It would be nice to see the Hercules config statements that define the LCS and the QETH devices. It would also be useful if you could issue ip addr as well as ip route commands at each step.

@fbi-ranger
Copy link
Author

fbi-ranger commented Aug 2, 2019

The Hercules config statements and all ip route results are shown above.

@fbi-ranger
Copy link
Author

fbi-ranger commented Aug 2, 2019

$ ip address
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen
 1000
    link/ether 00:e0:81:b5:14:d1 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.190/24 brd 10.0.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2e0:81ff:feb5:14d1/64 scope link
       valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:e0:81:b5:14:d0 brd ff:ff:ff:ff:ff:ff
67: tap0:  mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 7a:67:2f:7e:e0:71 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7867:2fff:fe7e:e071/64 scope link
       valid_lft forever preferred_lft forever
68: tun1:  mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 500  link/none
    inet 10.10.0.190 peer 10.0.0.192/24 scope global tun1
       valid_lft forever preferred_lft forever
    inet6 fe80::cfab:d32c:b5c:27f/64 scope link flags 800
       valid_lft forever preferred_lft forever

@fbi-ranger
Copy link
Author

fbi-ranger commented Aug 5, 2019

During some rare occasions the following message are issued by Hercules and z/VM:

HHC00942I CTC: lcs device tap0 using mac E6:A0:C0:44:88:AB
15:46:35 HCPSWU2833E Error 'E00A'X adding IP address 10.0.0.192 for VSWITCH SYSTEM VMINTRA1.
15:46:35 HCPSWU2833E IP address is already in use on the LAN.

This is NOT happening all the time. I couldn't find the reason why these messages are issued yet.

@mcisho
Copy link
Contributor

mcisho commented Aug 5, 2019

The LCS is taken down, but its IP address is not being unregistered with QETH. When the LCS is brought up again (the HHC00942I message), its IP address is being registered with the QETH, BUT ...

... because it is already registered, QETH has returned IPA_RC_IP_ADDR_ALREADY_USED (0xE00A), prompting the z/VM HCPSWU messages.

@mcisho
Copy link
Contributor

mcisho commented Aug 5, 2019

Regarding the ip address output you provided: Was that after the z/VM VSWITCH was up? Or after the z/OS QETH was up? Or after the z/OS LCS was up?

Is this route problem reproducible? Does it happen all of the time, or only occasionally? QETH itself does nothing with routes. QETH simply says use this IP address with this interface and the Linux kernel sets up the appropriate route table entries.

This is why I asked for both the ip address and ip route output at each stage: To try and understand what might be confusing the Linux kernel.

(Unfortunately I don't have openSUSE or z/VM 6.4 or z/OS 2.3, and I run Hercules with no privileges, so can't reproduce any of your environment.)

@fbi-ranger
Copy link
Author

fbi-ranger commented Aug 5, 2019

The addresses of the interfaces do not change. Therefore I added the addresses when all interfaces are active.

All the addresses are unregistered when the QETH stops and all routes are removed from the host as well.

LCS starts and the MAC address of it is shown in the log. When QETH starts, the configured address of the IP stack is pingable.

Once z/OS discovers it has another interface, it then registers that address also on QETH, and in the same moment routing is overwritten on the host (with the network 10.0.0.0/24 on tun1, which is the subnet of LCS, and the rest of the private LAN overriding the route to 10.10.0.0/24 of QETH) .

@mcisho
Copy link
Contributor

mcisho commented Aug 5, 2019

I'm sure your description of what happens is what happens, but we need detailed information to try and understand what happens. For example, your ip address output contains:

inet 10.10.0.190 peer 10.0.0.192/24

Which of the three steps (VSWITCH start? z/OS QETH start? Or z/OS LCS start?) sets the prefix length? The z/OS QETH start should have set the peer address to 10.10.0.192. Did it?

Would you please provide the full ip address and ip route output after each of the three steps, so that we can see what effect each step is having on the tun interface? Thanks.

@Fish-Git Fish-Git added (Unknown) Unresolved. It might be a bug. It might not. We don't know. We couldn't reproduce it. L Linux only issue, such as with tuntap networking that doesn't occur on Windows. Researching... The issue is being looked into or additional information is being gathered/located. labels Aug 5, 2019
@fbi-ranger
Copy link
Author

Find hereby all the IP ROUTE and IP ADDRESS results during the various stages.

IP ROUTE-IP ADDRESS.pdf

@mcisho
Copy link
Contributor

mcisho commented Aug 10, 2019

Rule one of the Linux kernel is that if an entity tells it to do something stupid, the entity knows what it is doing. In this case the entity (i.e. Hercules) is telling the kernel to use an IP address from subnet B with a tun interface defined on subnet A. The fact that it messes up networking? Refer to rule one!

I was never keen on adding VIPA support to QETH because I could foresee that what is now happening would happen. *nix tun interfaces should only ever have two IP addresses, the local (i.e. *nix host) and the peer (i.e. the Hercules guest), which should both be in the same subnet. If the peer wishes to use additional IP addresses, *nix host routing should be used to route the additional address via the peer address.

z/OS Comms Server considering all home address to be VIPA is, for Hercules, a complete pain in the arse. In the real world it doesn't matter, everyone out there using real z iron is only using OSA. (Though I don't think it's only a z/OS problem, I have noticed that on my z/VM 6.1 system the default gateway IP address (with is on a CTCI interface) is being registered with the layer3 VSWITCH. I have no idea why!)

There are various long(er) term solutions:

  1. Don't support VIPA on *nix.
  2. Only support same subnet VIPA on *nix.
  3. Write some code to set up routing for non-same subnet VIPA on *nix.
  4. Recommend *nix users to run Hercules non-privileged and use pre-configured interfaces. While, IMHO, this is by far the best solution, it's sadly unrealistic.
  5. Something else I haven't thought of.

In the short term I'm afraid the only solution is to either:

  1. Remove the LCS or the IPAQENET device/link/home from your TCP/IP stack, depending on which is the least important to you.
  2. Set your z/OS system up as cinet, and run multiple TCP/IP stacks, one with the LCS and one with the IPAQENET.

@Peter-J-Jansen
Copy link
Collaborator

Would it not be easier to simulate OSA layer 3 devices if the Hercules implementation would use a *nix TAP device for it ? I don't know what caused the decision to use a TAP devices only for layer 2, and a TUN device for layer 3, as I joined the Hercules community too late to be aware of the original reasoning for this. I'd be interested to learn about the reasons for that.

For a Windows host, I presume the Fish solution may effectively already be making no real TUN/TAP difference for supporting OSA layer 2 or 3. (Whilst of course handling the different L2/L3 packet header formats correctly.)

How does zPDT support L2 vs L3, or does it perhaps have the same limitations as Hercules ?

Cheers,

Peter

@mcisho
Copy link
Contributor

mcisho commented Aug 11, 2019

Would it not be easier to simulate OSA layer 3 devices if the Hercules implementation would use a *nix TAP device for it ?

IMHO, yes, it would. QETH would have to wrap the layer3 IP packets in Ethernet frames, and would have to handle IPv4 ARP/RARP and IPv6 Neighbor Discovery, but its all doable.

I don't know what caused the decision to use a TAP devices only for layer 2, and a TUN device for layer 3, as I joined the Hercules community too late to be aware of the original reasoning for this. I'd be interested to learn about the reasons for that.

When I became involved the decision had already been made. I've always assumed that Jan simply thought 'layer2 = Ethernet = tap, layer3 = IP = tun'. I'm sure that at the time he had more important matters to worry about, like actually getting QETH to work at all.

For a Windows host, I presume the Fish solution may effectively already be making no real TUN/TAP difference for supporting OSA layer 2 or 3. (Whilst of course handling the different L2/L3 packet header formats correctly.)

Yes, presumably. I've always assumed WIN-CTCI deals with Ethernet frames from/to the host, but I've never felt the need to read winpcap documentation to find out.

How does zPDT support L2 vs L3, or does it perhaps have the same limitations as Hercules ?

I don't know how it's done, I've never had access to a zPDT. I wouldn't have thought zPDT uses tuntap to get traffic to/from the hosts network interface(s), but...

@Peter-J-Jansen
Copy link
Collaborator

Peter-J-Jansen commented Aug 12, 2019

Thanks Ian for confirming my thoughts.

So we know what we could work on when we get bored this coming winter …

Cheers,
Peter

@fbi-ranger
Copy link
Author

fbi-ranger commented Aug 12, 2019

  1. Recommend *nix users to run Hercules non-privileged and use pre-configured interfaces. While, IMHO, this is by far the best solution, it's sadly unrealistic.

I do run Hercules non-privileged nevertheless it changes the IP configuration.

Isn't it possible that the additional route is added by Hercules when it is encountered that there is another subnet address registered on the interface?

Regarding zPDT:

zPDT networking is quite complex. Below is a snapshot from command 'find_io' which shows the network configuration in zPDT and the corresponding definitions in the configuration file of zPDT.

zPDT can connect directly to an Ethernet adapter (see F0 and F1 below). Side effect of this implementation is that the host IP address cannot be reached from z/OS when using the same ethX adapter. Therefore it is recommended to use a separate ethX adapter for zPDT. In my case I use eth1 (F1) for connections of the systems on zPDT. I use F00-F02 as uplink port for vswitch VMINTR01. All z/VM guests connect then to this vswitch.

The /dev/tap0 - /dev/tap7 are used for PTP connections. With this connection the host can be reached. IP addresses are always of type 10.1.x.1 where x is the number of /dev/tap interface.

FIND_IO on "ibmsys1" zPDT:

         Interface         Current          MAC                IPv4              IPv6           
 Path    Name              State            Address            Address           Address        
  F0     eth0              UP, RUNNING      52:54:00:82:e6:1e  172.27.144.190    fe80::5054:ff:fe82:e61e%eth0  
  F1     eth1              UP, RUNNING      52:54:00:41:0a:d5  *                 *               

  A0     tap0              DOWN             02:a0:a0:a0:a0:a0  *                 *               
  A1     tap1              DOWN             02:a1:a1:a1:a1:a1  *                 *               
  A2     tap2              DOWN             02:a2:a2:a2:a2:a2  *                 *               
  A3     tap3              UP, RUNNING      a2:22:04:a1:dd:38  10.1.4.1          fe80::a022:4ff:fea1:dd38%tap3  
  A4     tap4              UP, RUNNING      ea:22:a5:1b:c2:a3  10.1.5.1          fe80::e822:a5ff:fe1b:c2a3%tap4  
  A5     tap5              UP, RUNNING      ea:1e:ba:98:db:27  10.1.6.1          fe80::e81e:baff:fe98:db27%tap5  
  A6     tap6              DOWN             02:a6:a6:a6:a6:a6  *                 *               
  A7     tap7              DOWN             02:a7:a7:a7:a7:a7  *                 *               
                                                             
         Interface                         Current Settings                                     
 Path    Name              RxChkSum      TSO     GSO     GRO     LRO    RX VLAN       MTU**     
------   ----------------  ---------------- -----------------  ----------------  -------------- 
  F0     eth0                On*         On*     On*     On*     Off      Off         1500 
  F1     eth1                Off         Off     Off     Off     Off      On*         1500 

  A3     tap3                Off         Off     On*     On*     Off      Off         1500 
  A4     tap4                Off         Off     On*     On*     Off      Off         1500 
  A5     tap5                Off         Off     On*     On*     Off      Off         1500 
   
 *  Enabling these functions may lead to poor zPdt Performance, 
    please refer to your zPdt documentation for details.  
   
 ** To Enable Jumbo Frame Support, this MTU value and the MTU value for the 
    Host Operating System must be set to > 1500. 
   
 End of FIND_IO 

At the zPDT configuration it is defined like here:

[manager]
name	awsosa		0c00 --path=A4 --pathtype=OSD --tunnel_intf=y
device	0C00	osa	osa	
device	0C01	osa	osa	
device	0C02	osa	osa	
device	0C03	osa	osa	
device	0C04	osa	osa	
device	0C05	osa	osa	
[manager]
name	awsosa		0f00 --path=F1 --pathtype=OSD --interface=eth1
device	0F00	osa	osa	
device	0F01	osa	osa
device	0F02	osa	osa
device	0F03	osa	osa
device	0F04	osa	osa
device	0F05	osa	osa
device	0F06	osa	osa	
device	0F07	osa	osa
device	0F08	osa	osa
device	0F09	osa	osa

On z/VM:

 query vswitch vmintr01
VSWITCH SYSTEM VMINTR01   Type: QDIO    Connected: 5    Maxconn: INFINITE
  PERSISTENT  RESTRICTED    NONROUTER                 Accounting: OFF
  USERBASED LOCAL
  VLAN Unaware
  MAC address: 02-00-00-00-00-02    MAC Protection: OFF
  IPTimeout: 5         QueueStorage: 8
  Isolation Status: OFF        VEPA Status: OFF
 Uplink Port:
  State: Ready
  PMTUD setting: EXTERNAL   PMTUD value: 8992     Trace Pages: 8
  RDEV: 0F00.P00 VDEV: 0603 Controller: DTCVSW1  ACTIVE
Ready; T=0.01/0.01 13:04:26
 query f00-f02
OSA  0F00 ATTACHED TO DTCVSW1  0603 DEVTYPE OSA         CHPID F1 OSD
OSA  0F01 ATTACHED TO DTCVSW1  0604 DEVTYPE OSA         CHPID F1 OSD
OSA  0F02 ATTACHED TO DTCVSW1  0605 DEVTYPE OSA         CHPID F1 OSD
Ready; T=0.01/0.01 13:04:37

@mcisho
Copy link
Contributor

mcisho commented Aug 12, 2019

I do run Hercules non-privileged nevertheless it changes the IP configuration.

The network configuration is done by module hercifc, which has elevated privileges otherwise it would not be able to change the configuration.

Isn't it possible that the additional route is added by Hercules when it is encountered that there is another subnet address registered on the interface?

No, it isn't. QETH (or more specifically hercifc at QETHs request) only issues ioctl SIOCSIFDSTADDR requests to add the addresses registered on the interface. QETH does not issue routing related ioctl requests. It is the kernel itself that manipulates the routing table as a result of the ioctl SIOCSIFDSTADDR request(s). LCS is the only Hercules interface type that issues routing related ioctl requests.

Hmm, looks like zPDT is using tap bridged to a host ethernet interface?

@fbi-ranger
Copy link
Author

fbi-ranger commented Aug 12, 2019

I don't know really. There is no additional /dev/tap device showing up on an ip link or ip address command. It is a sort of miracle how this is done!   :-)

The implementation of zPDT acts really as an OSA with VIPA support and so on.

When LCS can do routing related ioctl requests why not the QETH driver (or probably hercifc??)
when it is in layer 3 mode? Can hercifc not be extended in such a way? Is this too complex?

@mcisho
Copy link
Contributor

mcisho commented Aug 13, 2019

It is a sort of miracle how this is done!   :-)

Indeed!

When LCS can do routing related ioctl requests why not the QETH driver ...

QETH could do routing related ioctl requests, the code exists already in hercifc for use by LCS.

@fbi-ranger
Copy link
Author

Then probably the use for QETH could be enabled? LCS registers a host route while QETH a network route.

@mcisho
Copy link
Contributor

mcisho commented Aug 22, 2019

I don't intend to do anything about this problem.

As I said earlier there are a couple of solutions to avoid this problem, either:

  1. Remove the LCS or the IPAQENET device/link/home from your TCP/IP stack, depending on which is the least important to you.

  2. Set your z/OS system up as cinet, and run multiple TCP/IP stacks, one with the LCS and one with the IPAQENET.

@mcisho mcisho closed this as completed Aug 22, 2019
@mcisho mcisho removed (Unknown) Unresolved. It might be a bug. It might not. We don't know. We couldn't reproduce it. Researching... The issue is being looked into or additional information is being gathered/located. labels Aug 22, 2019
@Fish-Git Fish-Git added the (*WON'T FIX*) The requested change was rejected or the described behavior is by design. label Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L Linux only issue, such as with tuntap networking that doesn't occur on Windows. (*WON'T FIX*) The requested change was rejected or the described behavior is by design.
Projects
None yet
Development

No branches or pull requests

4 participants