Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud-init hard codes MTU configuration at initial deploy time #3793

Open
ubuntu-server-builder opened this issue May 12, 2023 · 28 comments
Open
Labels
incomplete Action required by submitter launchpad Migrated from Launchpad

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #1899487

Launchpad details
affected_projects = ['neutron', 'netplan.io (Ubuntu)']
assignee = None
assignee_name = None
date_closed = None
date_created = 2020-10-12T13:37:40.543454+00:00
date_fix_committed = 2021-11-23T15:20:25.220656+00:00
date_fix_released = 2021-11-23T15:20:25.220656+00:00
id = 1899487
importance = undecided
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1899487
milestone = None
owner = fnordahl
owner_name = Frode Nordahl
private = False
status = incomplete
submitter = fnordahl
submitter_name = Frode Nordahl
tags = []
duplicates = []

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-12T13:37:40.543454+00:00

When using OpenStack cloud provider cloud-init will write out /etc/netplan/50-cloud-init.yaml at initial instance boot and not update it on subsequent boots of the instance.

The OpenStack metadata service provides information about MTU for the network [0] and cloud-init takes this value and writes it into the netplan configuration [1].

A side effect of configuring the MTU through netplan is that the systemd-networkd [Link] section [2] gets the MTUBytes value filled and this in turn makes systemd-networkd ignore the MTU value provided by DHCP [3][4].

During the lifetime of a cloud events occur that will force a operator to reduce the MTU available to instances attached to its overlay networks. This may happen because of software imposed change of tunnel type (GRE -> VXLAN, VXLAN -> GENEVE) or change of topology or encapsulation in the physical network equipment.

To maximize performance these clouds have configured their instances to use the maximum available MTU without leaving any headroom to account for such changes and the only way to move forward is to reduce the available MTU on the instances. We are facing a concrete challenge with this now where we have users wanting to migrate from VXLAN tunnels to GENEVE tunnels with 38 byte header size.

0: # curl http://169.254.169.254/openstack/2018-08-27/network_data.json
{"links": [{"id": "tapa035fb68-01", "vif_id": "a035fb68-010c-42e3-8da7-ea3c36a0d607", "type": "ovs", "mtu": 8942, "ethernet_mac_address": "fa:16:3e:31:26:f7"}], "networks": [{"id": "network0", "type": "ipv4_dhcp", "link": "tapa035fb68-01", "network_id": "b4ef84c0-1235-48a8-aaf7-03fab7ef5367"}], "services": []}

1: # cat /etc/netplan/50-cloud-init.yaml

This file is generated from information provided by the datasource. Changes

to it will not persist across an instance reboot. To disable cloud-init's

network configuration capabilities, write a file

/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:

network: {config: disabled}

network:
version: 2
ethernets:
ens2:
dhcp4: true
match:
macaddress: fa:16:3e:31:26:f7
mtu: 8950
set-name: ens2

2: # cat /run/systemd/network/10-netplan-ens2.link
[Match]
MACAddress=fa:16:3e:31:26:f7

[Link]
Name=ens2
WakeOnLan=off
MTUBytes=8950

3: # cat /run/systemd/network/10-netplan-ens2.network
[Match]
MACAddress=fa:16:3e:31:26:f7
Name=ens2

[Link]
MTUBytes=8950

[Network]
DHCP=ipv4
LinkLocalAddressing=ipv6

[DHCP]
RouteMetric=100
UseMTU=true

4: Oct 12 13:30:18 canary-3 systemd-networkd[24084]: /run/systemd/network/10-netplan-ens2.network: MTUBytes= in [Link] section and UseMTU= in [DHCP] section are set. Disabling UseMTU=.

@ubuntu-server-builder ubuntu-server-builder added incomplete Action required by submitter launchpad Migrated from Launchpad labels May 12, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-12T13:37:40.543454+00:00

Launchpad attachments: cloud-init.tar.gz

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-12T16:17:38.198439+00:00

" # curl http://169.254.169.254/openstack/2018-08-27/network_data.json
{"links": [{"id": "tapa035fb68-01", "vif_id": "a035fb68-010c-42e3-8da7-ea3c36a0d607", "type": "ovs", "mtu": 8942, "ethernet_mac_address": "fa:16:3e:31:26:f7"}], "networks": [{"id": "network0", "type": "ipv4_dhcp", "link": "tapa035fb68-01", "network_id": "b4ef84c0-1235-48a8-aaf7-03fab7ef5367"}], "services": []}"

How is cloud-init to know from this network-config.json that DHCP will provide an MTU value? How does it know that it should ignore the provided MTU? If DHCP is providing MTU, should network-config.json then not provide the MTU value?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-12T18:53:00.349038+00:00

That is an excellent question, I see that the example provided in the Nova documentation [0] provides null for the MTU. There are also a Nova bug 1746323 on the lack of actual API documentation for the OpenStack format metadata, so I guess they could expect no less than diverging implementations consuming it in the wild.

However, I also see that the reporting of null MTU was fixed [1] on the back of Nova bug 1576713 a few years back so that it now provides an actual MTU regardless of network addressing type.

The OpenStack format metadata does provide a separate field that distinguishes between the various types of dynamic and static configuration [2] and I see that cloud-init already makes use of it [3].

So I would suggest that whenever OpenStack eludes to dynamic configuration being in play cloud-init should not write the MTU value into the on-disk configuration but let it be configured by dynamic network configuration protocol.

What do you think?

0: https://docs.openstack.org/nova/latest/user/metadata.html#openstack-format-metadata
1: https://review.opendev.org/#/c/316395/
2: https://github.com/openstack/nova/blob/261de76104ca67bed3ea6cdbcaaab0e44030f1e2/nova/virt/netutils.py#L282-L309
3:

if network['type'] == 'ipv4_dhcp':
subnet.update({'type': 'dhcp4'})
elif network['type'] == 'ipv6_dhcp':
subnet.update({'type': 'dhcp6'})
elif network['type'] in ['ipv6_slaac', 'ipv6_dhcpv6-stateless',
'ipv6_dhcpv6-stateful']:
subnet.update({'type': network['type']})
elif network['type'] in ['ipv4', 'ipv6']:
subnet.update({
'type': 'static',
'address': network.get('ip_address'),
})

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-12T19:52:31.962412+00:00

So I would suggest that whenever OpenStack eludes to dynamic configuration
being in play cloud-init should not write the MTU value into the on-disk
configuration but let it be configured by dynamic network configuration
protocol.

What do you think?

I would argue the opposite. The existing behavior is that the MTU provided
by the network-data.json is the source of truth. Cloud-init itself cannot
determine whether the DHCP service provides MTU or not, nor whether the MTU
provided in the DHCP response is infact the desired MTU. Further, for
cloud-init now to start ignoring MTU if DHCP is present would break existing
users who's DHCP server either does not provide MTU or provides an incorrect
value.

If network-data.json MTU value is null, then I think it all of this works the
way you want, correct? Looking at the bug which fixed this always null value
should it not be enhanced to only set this MTU value if the DHCP service on
the network is not providing an MTU?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-12T21:29:06.069378+00:00

On the flip side the presence of the MTU key in the OpenStack metadata cannot be used as an indicator for intent from either the system or the user that the DHCP server should not be providing the MTU either.

Looking at the commit that changed the behaviour in OpenStack the intent of the original code was to always provide the MTU value in the metadata regardless of network type, the fact that it showed as null was a bug.

Up until 2017 the default for the OpenStack controlled DHCP server was to always provide an MTU. In 2017 the ability to control this behaviour was removed and from that point onward it always provides an MTU.

The user has no way of influencing the contents of the OpenStack network metadata, apart from downgrading to a 5 year old version.

I don't see an easy way of overriding cloud-inits default behaviour by adding additional configuration through vendor data either.

Perhaps adding a cloud-init config stanza for how the OpenStack source driver should interpret the presence of MTU in the network metadata could be a path to retain compability with anyone relying on the current behaviour and at the same time providing a way forward for everyone else?

Meanwhile instances are configured to obtain an address dynamically but stuck with a static value for MTU forever, and not being able to adjust to changes being made to the environment without manual intervention to individual instances.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-12T22:31:53.948163+00:00

On the flip side the presence of the MTU key in the OpenStack
metadata cannot be used as an indicator for intent from either the
system or the user that the DHCP server should not be providing the
MTU either.

Looking at the commit that changed the behaviour in OpenStack the
intent of the original code was to always provide the MTU value in
the metadata regardless of network type, the fact that it showed as
null was a bug.

Up until 2017 the default for the OpenStack controlled DHCP server
was to always provide an MTU. In 2017 the ability to control this
behaviour was removed and from that point onward it always provides
an MTU.

The user has no way of influencing the contents of the OpenStack
network metadata, apart from downgrading to a 5 year old version.

Before we continue suggesting that cloud-init should somehow guess
what OpenStack meant to do with network configuration it provided to
cloud-init I'd like to make sure we discuss what we're suggesting.

If we can guess that an OpenStack which sent an MTU really didn't
mean for cloud-init to use this MTU because the DHCP server might
also send an MTU, can we not guess that the IP address it sent us
wasn't the correct one and instead we should add .1 to the lowest
octet?

I'm being pendantic here to make a point. OpenStack is the "Oracle"
here; just like MAAS, or Ec2 or Azure. The network configuration it
provides to the guest is meant to be taken as it is provided.

If the configuration is sub-optimal should not the cloud itself
resolve this?

Has there been any attempt to ask in OpenStack upstream why the MTU
plumbing/control was removed? If a specific MTU is needed for a
network in OpenStack, should that not be configured in OpenStack such
that the MTU value is either statically provided in the
network-data.json sent or update the DHCP server running on that
network to provide the correct value?

I don't see an easy way of overriding cloud-inits default behaviour
by adding additional configuration through vendor data either.

This is correct. Network-config cannot be part of user-data or
vendor-data; the network needs to be configured prior to cloud-init
fetching/reading this data (which may be URLs to remote cloud-config).

Perhaps adding a cloud-init config stanza for how the OpenStack
source driver should interpret the presence of MTU in the network
metadata could be a path to retain compability with anyone relying
on the current behaviour and at the same time providing a way
forward for everyone else?

Meanwhile instances are configured to obtain an address dynamically
but stuck with a static value for MTU forever, and not being able to
adjust to changes being made to the environment without manual
intervention to individual instances.

If the network-data.json provided by the metadata service is not
sufficient and you cannot change this service one can provide a
network configuration in a ConfigDrive.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Richard Harding(rharding) wrote on 2020-10-13T20:15:58.800649+00:00

Frode, can you explain to me the OpenStack operator path here. I'm not familiar with how these adjustments are practically made.

You mention "a operator to reduce the MTU available to instances" and "To maximize performance these clouds have configured their instances to use the maximum available MTU without leaving any headroom" but then also say that the MTU controls in OpenStack have been removed?

I'd like to understand where the knobs a cloud operator have available to them and then look at how to identify the "source of truth". So far I understand one is coming from the cloud itself, but I'm not sure how, another potential source of truth is a DHCP value provided to the instance. I assume that DHCP knob is in the DHCP server config and not done through a more centralized OpenStack knob that assures common behavior among DHCP and non-DHCP instances?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-13T21:19:28.565241+00:00

Richard,

The "MTU controls in OpenStack have been removed" part pertains to the removal of the operator facing configuration option for the OpenStack DHCP server as to whether or not it should provide information about the network MTU in it's response to clients. Where the removal here means that it is permanently ON, as in you could expect a OpenStack cloud to always provide information about MTU in DHCP packets from the time it was removed. This was introduced in OpenStack Ocata back in 2017 [4]. Prior to this the default value of advertise_mtu was true. I guess the reason for mentioning this was to paint a picture of what to expect from OpenStack clouds out there in the wild wrt. whether or not the DHCP server provides information about MTU or not.

The act of reducing the MTU on networks is done through the OpenStack Neutron API and/or through migration tools [5]. The effect of reducing the MTU of a network construct in OpenStack result in reducing the MTU for the involved router interfaces as well as the associated DHCP server configuration.

There also exist levers to inject configuration into the OpenStack DHCP server to prepare for a migration which is what we use in our recommended migration path [6], you can view the functional test code here [7].

The functional test code does not include a step for reducing MTU and waiting X hours for a DHCP renewal which is why this issue was missed, instead it injects the reduced MTU config prior to launching a first test instance [8].

4: openstack/neutron@832240a
5: https://docs.openstack.org/neutron/latest/ovn/migration.html
6: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b37/715132/4/check/build-openstack-deploy-guide/b37f49a/docs/app-ovn.html#migration-from-neutron-ml2-ovs-to-ml2-ovn
7: https://github.com/openstack-charmers/zaza-openstack-tests/blob/d4deb0478a0540cc61c39bae80e35e4335571554/zaza/openstack/charm_tests/ovn/tests.py#L143-L412
8: https://github.com/openstack/charm-neutron-openvswitch/blob/bc97a66b87d33e43ab5b842b000b49a75b1c5797/tests/tests.yaml#L39

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-13T22:34:19.473833+00:00

The CI artifact referenced in [6] in the previous comment was removed, the source can be viewed here: https://review.opendev.org/#/c/715132/4/deploy-guide/source/app-ovn.rst

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-14T06:57:19.597570+00:00

Comment #8 made me think about why this works in our functional test, so I went to investigate that. In our functional test we use Bionic images, and sure enough the netplan.yaml [9] written there does NOT include the MTU! The OpenStack Metadata source remains the same [10].

9: ubuntu@banana-1:$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
ubuntu@banana-1:
$ cat /etc/netplan/50-cloud-init.yaml

This file is generated from information provided by the datasource. Changes

to it will not persist across an instance reboot. To disable cloud-init's

network configuration capabilities, write a file

/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:

network: {config: disabled}

network:
ethernets:
ens2:
dhcp4: true
match:
macaddress: fa:16:3e:a3:34:78
set-name: ens2
version: 2

10: $ curl http://169.254.169.254/openstack/2018-08-27/network_data.json
{"links": [{"id": "tapc352887e-0f", "vif_id": "c352887e-0fff-481b-af47-7df9f7c2ff05", "type": "ovs", "mtu": 8950, "ethernet_mac_address": "fa:16:3e:a3:34:78"}], "networks": [{"id": "network0", "type": "ipv4_dhcp", "link": "tapc352887e-0f", "network_id": "f8123ceb-e29d-4f4a-b200-6fb3bf3984ba"}], "services": []}

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Richard Harding(rharding) wrote on 2020-10-14T12:12:46.824356+00:00

In our functional test we use Bionic images, and sure enough the netplan.yaml [9] 
written there does _NOT_ include the MTU! The OpenStack Metadata source remains 
the same [10].

Hmm, however it should be the same version of cloud-init in either Bionic or Focal tests. Is this a netplan change?

That's one side of the coin. In working to understand the "best path forward" I just want to make sure I'm following.

The controls in the OpenStack DHCP service are purely a "on/off" switch (advertise_mtu) about advertising MTU in the DHCP network details and not the control for what that setting is? The reason I want to make sure is because I'm always leery when there are multiple sources of possible truth that have to be sorted and if a user can bite themselves by changing an MTU value in one place but also another and which do you listen to/respect.

The actual value for the MTU is a Neutron setting and in theory, should be the same then from DHCP network data or by the provided network_data.json information?

In this case the pain point is that existing instances won't process the DHCP change of MTU properly because cloud-init has written out the netplan.yaml and even though DHCP comes in with a new setting cloud-init isn't triggered in any fashion to update its understanding of the world and write out a new compatible netplan.

The final nail in this coffin is that the setting cloud-init is setting overrides the value for MTU that comes in via DHCP.

Do I have that right? If so, a couple of questions then.

  1. It seems it would be worthwhile to see if there was some method for a refreshing cloud-init's details on the network. Ideally here, changing the network data in Neutron would be able to trigger an update on instances, though that's a tricky can of worms that could lead to broken networking on existing hosts if it goes wrong. It smells a bit like the work we want to completed next cycle around allowing hotplug of a new nic/device and be able to help drive networking config for it...just without the whole new device thing heh.

  2. Should the setting that cloud-init writes be an overwriting value in netplan? Could this be a bug that netplan is not allowing the dhcp details to be respected over what cloud-init started with?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-14T22:56:29.074979+00:00

Hmm, however it should be the same version of cloud-init in either Bionic or Focal tests. Is this a netplan change?

In the bionic image I have:
$ dpkg -l | egrep "(cloud-init|netplan)"
ii cloud-init 20.3-2-g371b392c-0ubuntu118.04.1 all Init scripts for cloud instances
ii cloud-initramfs-copymods 0.40ubuntu1.1 all copy initramfs modules into root filesystem for later use
ii cloud-initramfs-dyn-netconf 0.40ubuntu1.1 all write a network interface file in /run for BOOTIF
ii libnetplan0:amd64 0.99-0ubuntu3
18.04.3 amd64 YAML network configuration abstraction runtime library
ii netplan.io 0.99-0ubuntu3~18.04.3 amd64 YAML network configuration abstraction for various backends

In the focal image I have:
$ dpkg -l | egrep "(cloud-init|netplan)"
ii cloud-init 20.3-2-g371b392c-0ubuntu120.04.1 all initialization and customization tool for cloud instances
ii cloud-initramfs-copymods 0.45ubuntu1 all copy initramfs modules into root filesystem for later use
ii cloud-initramfs-dyn-netconf 0.45ubuntu1 all write a network interface file in /run for BOOTIF
ii libnetplan0:amd64 0.99-0ubuntu3
20.04.2 amd64 YAML network configuration abstraction runtime library
ii netplan.io 0.99-0ubuntu3~20.04.2 amd64 YAML network configuration abstraction for various backends

I guess it's time for me to ask a question: is it cloud-init that renders /etc/netplan/50-cloud-init.yaml? If so where does netplan fit in when the difference is how that file is rendered and not how it is interpreted. As you can see in #10 the mtu statement is not in the file on bionic, while it is on focal.

Since versions appear to be the same my guess would be that there is some internal modelling of how bionic vs. focal should be configured?

The controls in the OpenStack DHCP service are purely a "on/off" switch (advertise_mtu) about advertising MTU in the DHCP network details and not the control for what that setting is? The reason I want to make sure is because I'm always leery when there are multiple sources of possible truth that have to be sorted and if a user can bite themselves by changing an MTU value in one place but also another and which do you listen to/respect.

Previously you had the advertise_mtu "on/off" switch which was removed at OpenStack Ocata (leaving it permanently "on"). In addition to that the operator of the cloud can inject DHCP configration into the DHCP servers. The end user consuming the cloud also have control over their virtual network MTU's through the OpenStack Neutron API.

The actual value for the MTU is a Neutron setting and in theory, should be the same then from DHCP network data or by the provided network_data.json information?

The value for the MTU is a per virtual network setting which is exposed to the end user of the cloud. And yes, the setting set on the virtual network should be the one exposed in the network_data.json. But remember that the operator of the cloud has power to inject options directly into the DHCP server which could mean the DHCP server could advertise a different MTU than the user has chosen for their network.

If the operator of the cloud has chosen to do so it is most likely for a very good operational reason. If the end user or operator intends to configure instances with DHCP, DHCP should be authoritative source of truth.

The final nail in this coffin is that the setting cloud-init is setting overrides the value for MTU that comes in via DHCP.

Do I have that right? If so, a couple of questions then.

Yes.

  1. It seems it would be worthwhile to see if there was some method for a refreshing cloud-init's details on the network. Ideally here, changing the network data in Neutron would be able to trigger an update on instances, though that's a tricky can of worms that could lead to broken networking on existing hosts if it goes wrong. It smells a bit like the work we want to completed next cycle around allowing hotplug of a new nic/device and be able to help drive networking config for it...just without the whole new device thing heh.

This does indeed sound interesting, with regards to a cloud operator possibly not having any access or control over the instances end users run on their cloud having levers to control network configuration in such instances for maintenance/migration purposes in some manner would be valuable. The alternative is forklift and endless nagging of end users to do manual intervention and the support load that comes afterwards when everything breaks because they did not pay attention to the operators requests in time.

  1. Should the setting that cloud-init writes be an overwriting value in netplan? Could this be a bug that netplan is not allowing the dhcp details to be respected over what cloud-init started with?

I think when the operator and/or end user intends to use network auto configuration (that be DHCP or IPv6 SLAAC) that should be the authoritative source of truth for the instance. Any other path will risk turning a whole estate of instances the cloud operator does not necessarily have access to into door stops whenever the network configuration changes.

My conclusion so far is that Bionic guests behaves correctly as detailed in #10, Focal guests behave incorrectly as detailed in the original bug description.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-15T05:54:41.120814+00:00

I can't let go of why the same version of cloud-init renders different config with the same data source.

So a question: are there other data sources in use that we have not yet examined?

Information from the hypervisor configuration trickling through the virtio drivers or something like that?

I will compare the qemu configuration for the to instances and see if there are any differences.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-15T15:21:50.730790+00:00

I guess it's time for me to ask a question: is it cloud-init that
renders /etc/netplan/50-cloud-init.yaml? If so where does netplan
fit in when the difference is how that file is rendered and not
how it is interpreted. As you can see in #10 the mtu statement is
not in the file on bionic, while it is on focal.

Since versions appear to be the same my guess would be that there
is some internal modelling of how bionic vs. focal should be
configured?

There isn't an internal model; cloud-init SRU's master back to
previous releases. For OpenStack, cloud-init on Xenial does not
render network-data.json by default as that's a behavior change
added in Bionic and newer.

The pipeline looks like:

cloudinit (fetch network-data.json from OpenStack)
-> cloudinit (convert network-data.json to network-config-v1) -> cloudinit (converts network-config-v1 -> netplan on Ubuntu)
`-> cloudinit (calls netplan generate -> systemd-networkd files)

I can't let go of why the same version of cloud-init renders
different config with the same data source.

Are you sure the 'mtu' value it was present in the network-data.json
at the time cloud-init fetched it vs. when you curl later?
Ceck the cloud-init.log file; you should see the network
config after it was converted from network-data.json somewhere
in the log.

Give your JSON from [10], bionic and focal render this the same

BIONIC

% lxc launch ubuntu-daily:bionic b1
% lxc exec b1 bash
root@b1:# lsb_release -rd
Description: Ubuntu 18.04.5 LTS
Release: 18.04
root@b1:
# dpkg --list | egrep "(cloud-init|netplan)"
ii cloud-init 20.3-2-g371b392c-0ubuntu118.04.1 all Init scripts for cloud instances
ii cloud-initramfs-copymods 0.40ubuntu1.1 all copy initramfs modules into root filesystem for later use
ii cloud-initramfs-dyn-netconf 0.40ubuntu1.1 all write a network interface file in /run for BOOTIF
ii libnetplan0:amd64 0.99-0ubuntu3
18.04.3 amd64 YAML network configuration abstraction runtime library
ii netplan.io 0.99-0ubuntu318.04.3 amd64 YAML network configuration abstraction for various backends
root@b1:
# cat /etc/cloud/build.info
build_name: server
serial: 20201014
root@b1:# cat network-data.json
{"links": [{"id": "tapc352887e-0f", "vif_id": "c352887e-0fff-481b-af47-7df9f7c2ff05", "type": "ovs", "mtu": 8950, "ethernet_mac_address": "fa:16:3e:a3:34:78"}], "networks": [{"id": "network0", "type": "ipv4_dhcp", "link": "tapc352887e-0f", "network_id": "f8123ceb-e29d-4f4a-b200-6fb3bf3984ba"}], "services": []}
root@b1:
# cloud-init devel net-convert --network-data network-data.json -k network_data.json -m "ens4,fa:16:3e:a3:34:78" -d test -D ubuntu --debug -O netplan
2020-10-15 15:09:17,796 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2020-10-15 15:09:17,797 - util.py[DEBUG]: Read 14 bytes from /proc/uptime
2020-10-15 15:09:17,797 - util.py[DEBUG]: Reading from /sys/class/net/eth0/addr_assign_type (quiet=False)
2020-10-15 15:09:17,797 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/eth0/addr_assign_type
2020-10-15 15:09:17,797 - util.py[DEBUG]: Reading from /sys/class/net/eth0/uevent (quiet=False)
2020-10-15 15:09:17,797 - util.py[DEBUG]: Read 26 bytes from /sys/class/net/eth0/uevent
2020-10-15 15:09:17,797 - util.py[DEBUG]: Reading from /sys/class/net/eth0/address (quiet=False)
2020-10-15 15:09:17,797 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/eth0/address
2020-10-15 15:09:17,798 - util.py[DEBUG]: Reading from /sys/class/net/eth0/device/device (quiet=False)
2020-10-15 15:09:17,798 - util.py[DEBUG]: Reading from /sys/class/net/lo/addr_assign_type (quiet=False)
2020-10-15 15:09:17,798 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/lo/addr_assign_type
2020-10-15 15:09:17,798 - util.py[DEBUG]: Reading from /sys/class/net/lo/uevent (quiet=False)
2020-10-15 15:09:17,798 - util.py[DEBUG]: Read 23 bytes from /sys/class/net/lo/uevent
2020-10-15 15:09:17,798 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False)
2020-10-15 15:09:17,798 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/lo/address
2020-10-15 15:09:17,798 - util.py[DEBUG]: Reading from /sys/class/net/lo/device/device (quiet=False)
2020-10-15 15:09:17,798 - util.py[DEBUG]: Reading from /sys/class/net/eth0/type (quiet=False)
2020-10-15 15:09:17,798 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/eth0/type
2020-10-15 15:09:17,798 - util.py[DEBUG]: Reading from /sys/class/net/lo/type (quiet=False)
2020-10-15 15:09:17,798 - util.py[DEBUG]: Read 4 bytes from /sys/class/net/lo/type

Internal State
--- !!python/object:cloudinit.net.network_state.NetworkState
_has_default_route: null
_network_state:
config:
- mac_address: fa:16:3e:a3:34:78
mtu: 8950
name: ens4
subnets:
- type: dhcp4
type: physical
dns:
nameservers: []
search: []
interfaces:
ens4:
accept-ra: null
address: null
gateway: null
inet: inet
mac_address: fa:16:3e:a3:34:78
mode: manual
mtu: 8950
name: ens4
subnets:
- routes: []
type: dhcp4
type: physical
routes: []
use_ipv6: false
_version: 1
use_ipv6: false
...

Read input format 'network_data.json' from 'network-data.json'.
Wrote output format 'netplan' to 'test/'

2020-10-15 15:09:17,805 - util.py[DEBUG]: Writing to /root/test/etc/netplan/50-cloud-init.yaml - wb: [644] 503 bytes
2020-10-15 15:09:17,806 - netplan.py[DEBUG]: netplan generate postcmd disabled
2020-10-15 15:09:17,806 - netplan.py[DEBUG]: netplan net_setup_link postcmd disabled
2020-10-15 15:09:17,808 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2020-10-15 15:09:17,808 - util.py[DEBUG]: Read 14 bytes from /proc/uptime
2020-10-15 15:09:17,808 - util.py[DEBUG]: cloud-init mode 'net-convert' took 0.011 seconds (0.01)
root@b1:~# cat test/etc/netplan/50-cloud-init.yaml

This file is generated from information provided by the datasource. Changes

to it will not persist across an instance reboot. To disable cloud-init's

network configuration capabilities, write a file

/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:

network: {config: disabled}

network:
version: 2
ethernets:
ens4:
dhcp4: true
match:
macaddress: fa:16:3e:a3:34:78
mtu: 8950
set-name: ens4

FOCAL

% lxc launch ubuntu-daily:focal f1
% lxc exec f1 bash
root@f1:# lsb_release -rd
Description: Ubuntu 20.04.1 LTS
Release: 20.04
root@f1:
# dpkg --list | egrep "(cloud-init|netplan)"
ii cloud-init 20.3-2-g371b392c-0ubuntu120.04.1 all initialization and customization tool for cloud instances
ii cloud-initramfs-copymods 0.45ubuntu1 all copy initramfs modules into root filesystem for later use
ii cloud-initramfs-dyn-netconf 0.45ubuntu1 all write a network interface file in /run for BOOTIF
ii libnetplan0:amd64 0.99-0ubuntu3
20.04.2 amd64 YAML network configuration abstraction runtime library
ii netplan.io 0.99-0ubuntu320.04.2 amd64 YAML network configuration abstraction for various backends
root@f1:
# cat /etc/cloud/build.info
build_name: server
serial: 20201014
root@f1:# cat network-data.json
{"links": [{"id": "tapc352887e-0f", "vif_id": "c352887e-0fff-481b-af47-7df9f7c2ff05", "type": "ovs", "mtu": 8950, "ethernet_mac_address": "fa:16:3e:a3:34:78"}], "networks": [{"id": "network0", "type": "ipv4_dhcp", "link": "tapc352887e-0f", "network_id": "f8123ceb-e29d-4f4a-b200-6fb3bf3984ba"}], "services": []}
root@f1:
# cloud-init devel net-convert --network-data network-data.json -k network_data.json -m "ens4,fa:16:3e:a3:34:78" -d test -D ubuntu --debug -O netplan
2020-10-15 15:12:57,900 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2020-10-15 15:12:57,901 - util.py[DEBUG]: Read 13 bytes from /proc/uptime
2020-10-15 15:12:57,901 - util.py[DEBUG]: Reading from /sys/class/net/eth0/addr_assign_type (quiet=False)
2020-10-15 15:12:57,901 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/eth0/addr_assign_type
2020-10-15 15:12:57,901 - util.py[DEBUG]: Reading from /sys/class/net/eth0/uevent (quiet=False)
2020-10-15 15:12:57,901 - util.py[DEBUG]: Read 26 bytes from /sys/class/net/eth0/uevent
2020-10-15 15:12:57,902 - util.py[DEBUG]: Reading from /sys/class/net/eth0/address (quiet=False)
2020-10-15 15:12:57,902 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/eth0/address
2020-10-15 15:12:57,902 - util.py[DEBUG]: Reading from /sys/class/net/eth0/device/device (quiet=False)
2020-10-15 15:12:57,902 - util.py[DEBUG]: Reading from /sys/class/net/lo/addr_assign_type (quiet=False)
2020-10-15 15:12:57,902 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/lo/addr_assign_type
2020-10-15 15:12:57,902 - util.py[DEBUG]: Reading from /sys/class/net/lo/uevent (quiet=False)
2020-10-15 15:12:57,902 - util.py[DEBUG]: Read 23 bytes from /sys/class/net/lo/uevent
2020-10-15 15:12:57,902 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False)
2020-10-15 15:12:57,902 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/lo/address
2020-10-15 15:12:57,902 - util.py[DEBUG]: Reading from /sys/class/net/lo/device/device (quiet=False)
2020-10-15 15:12:57,902 - util.py[DEBUG]: Reading from /sys/class/net/eth0/type (quiet=False)
2020-10-15 15:12:57,902 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/eth0/type
2020-10-15 15:12:57,902 - util.py[DEBUG]: Reading from /sys/class/net/lo/type (quiet=False)
2020-10-15 15:12:57,903 - util.py[DEBUG]: Read 4 bytes from /sys/class/net/lo/type

Internal State
--- !!python/object:cloudinit.net.network_state.NetworkState
_has_default_route: null
_network_state:
config:
- mac_address: fa:16:3e:a3:34:78
mtu: 8950
name: ens4
subnets:
- type: dhcp4
type: physical
dns:
nameservers: []
search: []
interfaces:
ens4:
accept-ra: null
address: null
gateway: null
inet: inet
mac_address: fa:16:3e:a3:34:78
mode: manual
mtu: 8950
name: ens4
subnets:
- routes: []
type: dhcp4
type: physical
routes: []
use_ipv6: false
_version: 1
use_ipv6: false
...

Read input format 'network_data.json' from 'network-data.json'.
Wrote output format 'netplan' to 'test/'

2020-10-15 15:12:57,910 - util.py[DEBUG]: Writing to /root/test/etc/netplan/50-cloud-init.yaml - wb: [644] 503 bytes
2020-10-15 15:12:57,910 - netplan.py[DEBUG]: netplan generate postcmd disabled
2020-10-15 15:12:57,910 - netplan.py[DEBUG]: netplan net_setup_link postcmd disabled
2020-10-15 15:12:57,910 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2020-10-15 15:12:57,911 - util.py[DEBUG]: Read 13 bytes from /proc/uptime
2020-10-15 15:12:57,911 - util.py[DEBUG]: cloud-init mode 'net-convert' took 0.011 seconds (0.01)
root@f1:~# cat test/etc/netplan/50-cloud-init.yaml

This file is generated from information provided by the datasource. Changes

to it will not persist across an instance reboot. To disable cloud-init's

network configuration capabilities, write a file

/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:

network: {config: disabled}

network:
version: 2
ethernets:
ens4:
dhcp4: true
match:
macaddress: fa:16:3e:a3:34:78
mtu: 8950
set-name: ens4

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-15T15:33:25.148769+00:00

Adding some excerpts from cloud-init logs differences between Focal and Bionic, appears to be quite a bit difference in how config is handled.

I also went back and compared if there were other differences like image properties or similar stuff that could find their way into qemu config, but found none.

If you say cloud-init should be equal on Bionic and Focal, where does this information come from apart from the cloud metadata (which also is equal for both instances)?

Focal:

grep fa:16:3e:d6:d0:91 /var/log/cloud-init.log

2020-10-15 14:48:15,982 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'type': 'physical', 'mtu': 8942, 'subnets': [{'type': 'dhcp4'}], 'mac_address': 'fa:16:3e:d6:d0:91', 'name': 'ens2'}]}
2020-10-15 14:48:15,991 - init.py[DEBUG]: no work necessary for renaming of [['fa:16:3e:d6:d0:91', 'ens2', 'virtio_net', '0x0001']]
2020-10-15 14:48:15,991 - stages.py[INFO]: Applying network configuration from ds bringup=False: {'version': 1, 'config': [{'type': 'physical', 'mtu': 8942, 'subnets': [{'type': 'dhcp4'}], 'mac_address': 'fa:16:3e:d6:d0:91', 'name': 'ens2'}]}
2020-10-15 14:48:18,268 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'type': 'physical', 'mtu': 8942, 'subnets': [{'type': 'dhcp4'}], 'mac_address': 'fa:16:3e:d6:d0:91', 'name': 'ens2'}]}
2020-10-15 14:48:18,277 - init.py[DEBUG]: no work necessary for renaming of [['fa:16:3e:d6:d0:91', 'ens2', 'virtio_net', '0x0001']]

Bionic:
$ grep fa:16:3e:f7:fc:c4 /var/log/cloud-init.log
2020-10-14 19:08:10,670 - stages.py[DEBUG]: applying net config names for {'ethernets': {'ens2': {'dhcp4': True, 'set-name': 'ens2', 'match': {'macaddress': 'fa:16:3e:f7:fc:c4'}}}, 'version': 2}
2020-10-14 19:08:10,679 - init.py[DEBUG]: no work necessary for renaming of [['fa:16:3e:f7:fc:c4', 'ens2', 'virtio_net', '0x0001']]
2020-10-14 19:08:10,680 - stages.py[INFO]: Applying network configuration from fallback bringup=False: {'ethernets': {'ens2': {'dhcp4': True, 'set-name': 'ens2', 'match': {'macaddress': 'fa:16:3e:f7:fc:c4'}}}, 'version': 2}
{'type': 'physical', 'name': 'ens2', 'mac_address': 'fa:16:3e:f7:fc:c4', 'match': {'macaddress': 'fa:16:3e:f7:fc:c4'}, 'subnets': [{'type': 'dhcp4'}]}
{'ens2': {'dhcp4': True, 'set-name': 'ens2', 'match': {'macaddress': 'fa:16:3e:f7:fc:c4'}}}
2020-10-14 19:08:12,659 - stages.py[DEBUG]: applying net config names for {'ethernets': {'ens2': {'dhcp4': True, 'set-name': 'ens2', 'match': {'macaddress': 'fa:16:3e:f7:fc:c4'}}}, 'version': 2}
2020-10-14 19:08:12,666 - init.py[DEBUG]: no work necessary for renaming of [['fa:16:3e:f7:fc:c4', 'ens2', 'virtio_net', '0x0001']]

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2020-10-15T15:40:18.927217+00:00

Ryan, thanks for those pointers, will check. I also see in #15 that Bionic uses Fallback while Focal uses an actual ds, don't know why though.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2020-10-15T15:57:52+00:00

On Thu, Oct 15, 2020 at 10:45 AM Frode Nordahl 1899487@bugs.launchpad.net
wrote:

Ryan, thanks for those pointers, will check. I also see in #15 that
Bionic uses Fallback while Focal uses an actual ds, don't know why
though.

Bah, I keep forgetting, that Bionic does NOT read OpenStack metadata
service by default

cd1de5f

That landed right after 18.04 was released.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Launchpad Janitor(janitor) wrote on 2020-12-15T04:17:23.068572+00:00

[Expired for cloud-init because there has been no activity for 60 days.]

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Launchpad Janitor(janitor) wrote on 2020-12-15T04:17:24.489964+00:00

[Expired for netplan.io (Ubuntu) because there has been no activity for 60 days.]

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Alexander Balderson(asbalderson) wrote on 2021-11-05T20:00:07.919765+00:00

I'm going to re-open this bug after working through Openstack networking migration for OVS to OVN.

I ran into an issue where the MTU set in netplan caused all my instances to be lost after the migration. While this can, and should, be documented, I also think that reducing places where instances could be lost should be taken whenever possible.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Brett Holman(holmanb) wrote on 2021-11-09T18:30:52.024484+00:00

Hi Alexander,

Thanks for reopening. Sorry to hear about the instance loss.

We have a short term fix, but I think we need to request a fix in Openstack for network_data.json as well. Currently cloud-init passes a configured MTU value to the renderers (netplan/systemd-networkd/etc), which in turn treat a configured MTU as overriding DHCP MTU options. There are reasons one might want to do this so I don't think we want to try to change this behavior for all of cloud-init.

You can force cloud-init to configure the network on every boot[1], the downside is that this will increase subsequent boot times.

To semantically match other datasources, Openstack would only expose MTU settings if it intended for the MTU to override DHCP's MTU option. This would mean that the mtu would only get configured if a link's network type was not dhcp (change here[2] I think). Could you open a ticket with openstack for this? If that would break other use cases, this behavior could possibly be worked around in the cloud-init openstack datasource by ignoring the MTU metadata field, but I would want to see if Openstack can fix this first resorting to that.

[1] Override:

apply network config on every boot

updates:
network:
when: ['boot']

[2] https://github.com/openstack/nova/blob/261de76104ca67bed3ea6cdbcaaab0e44030f1e2/nova/virt/netutils.py#L266

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2021-11-10T10:34:52.922632+00:00

Adding upstream OpenStack Nova to the bug to get their perspective on why the OpenStack datasource is exposing MTU when it knows that the network should be configured using DHCP ref Brett's question in #21.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Frode Nordahl(fnordahl) wrote on 2021-12-13T09:07:13.152329+00:00

To add to Bretts comment in #21, the proposed workaround is only effective if you know about this situation beforehand. I.e. if you enabled the override through vendor data before deploying any Focal or newer instances.

Another possible workaround is to preform a "cold migration". When Nova stops/starts an instance the domain XML on the hypervisor is re-created.

We can take advantage of this behavior, if an instance is stop/started after a network MTU has been lowered the new domain XML will have the new MTU which makes the libvirt driver enforce the MTU in the instance.

The instance configuration will still make systemd-networkd attempt to set the hard coded MTU but it will not be allowed to:

ubuntu@u:$ grep mtu /etc/netplan/50-cloud-init.yaml
mtu: 1442
ubuntu@u:
$ zgrep MTU /var/log/syslog.*
/var/log/syslog.2.gz:Dec 10 13:54:19 u systemd-networkd[287]: /run/systemd/network/10-netplan-ens2.network: MTUBytes= in [Link] section and UseMTU= in [DHCP] section are set. Disabling UseMTU=.
/var/log/syslog.2.gz:Dec 10 13:54:19 u systemd-networkd[287]: ens2: Could not set MTU, ignoring: mtu greater than device maximum. Invalid argument
ubuntu@u:~$ ip link
2: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1441 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether fa:16:3e:30:f3:82 brd ff:ff:ff:ff:ff:ff

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Artom Lifshitz(notartom) wrote on 2022-08-24T00:22:00.297264+00:00

While Nova indeed exposes the MTU in our metadata, our source of truth for that information is Neutron, via the mtu field on the Neutron network.

[As an aside, we've had a long standing issue wherein Neutron allows the MTU to be mutable, but there's no real support for changing the MTU within Nova, necessitating the cold migration work around that Frode has mentioned]

As Neutron controls both the DHCP agent and the network API-level resource, I'm not sure how we'd ever get into a situation where DHCP provides an MTU value that's different from what's set in the network, but maybe there's a bug in Neutron?

So since Nova's metadata is just a proxy for Neutron's MTU in this case, I think Brett's question in comment #21 is better asked to the Neutron folks.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Slawek Kaplonski(slaweq) wrote on 2022-09-02T14:02:06.058260+00:00

I read through this bug today and here are my thoughts about it.
@Artom - Neutron allows to update mtu of the network through API. It may also happen that e.g. during the migration from OVS to OVN backend networks will be changed from e.g. vxlan to geneve and that will change their MTU as well. And this is reflected in the network's info returned from neutron.
Now IIUC the problem here is with e.g. case like:

  1. New VM is boot and plugged to the network with mtu=1450
  2. Cloud init in the vm got mtu=1450 from the nova metadata service and configures it in the netplan's configuration,
  3. VM got other network configuration from DHCP - so far so good, all works fine,
  4. MTU of the network has changed for some reason, if You check with neutron's API, new value will be provided, and it will be also updated in the DHCP service from Neutron
  5. Because in the VM MTU value was configured in netplan's config based on the old data from metadata service (see point 2) now it's not updated from the DHCP.

I think that nova provides as mtu data from the network_info_cache so new value will be visible in the metadata after some time (few seconds in my tests) after it was changed in Neutron. But that don't solves anything because cloud-init already configured it during boot process and will not check it again.

@frode: is my understanding correct? If yes, could the solution be to provide mtu value in the metadata ONLY if all subnets on port don't have dhcp enabled? And provide null otherwise?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Chad Smith(chad.smith) wrote on 2022-09-07T20:06:15.656111+00:00

Thanks @alexander @frode @Artom and @slawek && Brett for weighing in on this long-standing bug and trying to sort long-term maintenance/migration use-cases and how cloud-init could better interact in those changing conditions/settings with neutron.

@slawek "If yes, could the solution be to provide mtu value in the metadata ONLY if all subnets on port don't have dhcp enabled? And provide null otherwise?"

I think that suggestion would work only for immediate deployment needs at initial deployment time. The other gap or issue is during long-term maintenance what an admin should do when changing MTU in Neutron or migrating OVS -> OVN and needs to trigger a network config refresh from cloud-init.

But that don't solves anything because cloud-init already configured it during [first boot] boot process and will not check it again.

Correct, cloud-init by default won't perform any network config operations after first boot on OpenStack unless the configuration was added to /etc/cloud/cloud.cfg.d/80-some-file.cfg as Brett mentioned in #21.

Per Frode's comment #23: "the proposed workaround is only effective if you know about this situation beforehand."

Agreed. But, in the cases where an admin is changing an MTU value through Neutron or migrating from OVS to OVN, I think those use-cases are ones where we now know ahead of time we'll be changing network configuration that may need to be re-written by cloud-init.

In these cases where we know our existing VM will soon contain stale MTU data, I think we can prime the cloud-init system in one of two ways:

Option 1. Configure cloud-init to regenerate network on every boot and trigger that reboot manually or automatically after MTU settings have changed

a. Brett's comment makes sense here. Add /etc/cloud/cloud.cf.g.d/80-openstack-network-per-boot.cfg to the affected VM prior to network migration or MTU change:

cat > /etc/cloud/cloud.cfg.d/80-openstack-network-per-boot.cfg <<EOF

apply network config on every boot

updates:
network:
when: ['boot']
EOF

b. Make MTU change or perform OVS->OVN migration steps
c. Trigger a VM reboot after either MTU changed or migration operation is complete.

Option 2. Configure cloud-init to react to udev hotplug events (v. 21.3 or later) and re-render network on such udev events on the running system without reboot

-- Do we know if migration from OVS -> OVN results in add|remove udev events, a VM configured for cloud-init hotplug would automatically read and apply new network_data.json values upon receipt of that udev event?

If udev events occur along this migration path, we can:
a. Write config to enable cloud-init to react to network hotplug add|remove events
cat > /etc/cloud/cloud.cfg.d/80-openstack-network-hotplug.cfg <<EOF

Allow datasource to regenerate network config on any udev hotplug events

updates:
network:
when: ['hotplug']
EOF

b. Restart the VM prior to MTU change or OVS->OVN migration which will start systemd cloud-init-hotplug.socket and cloud-init-hotplug.service on the system to listen for udev hotplug events

c. Make changes to MTU or OVS-OVN migration to trigger udev events which will re-apply network config

If migration or MTU changes don't result in udev events it is possible to manually 'fake' such a hotplug event to force re-rendering of network if still you have console/ssh access to the VM;

get a current physical interface name on the system

PHYS_INTERFACE=python3 -c 'from cloudinit.net import get_interfaces; print([i[0] for i in get_interfaces() if i[0] != "lo"][0])'

fake a hotplug add event on a known NIC physical interface triggering cloud-init to crawl and apply of network_data.json metadata

sudo cloud-init devel hotplug-hook -s net handle --udevaction add -d ${PHYS_INTERFACE}

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Chad Smith(chad.smith) wrote on 2022-09-08T02:21:35.685717+00:00

I guess the third option that could trigger network re-write across migration or MTU changes would be that OpenStack updates the instance-id at http://169.254.169.254/latest/meta-data/instance-id whenever the MTU value changes. By default, cloud-init re-provisions an OpenStack machine with any user-data, vendor-data and network_data.json anytime the instance-id changes.

It's a pretty big hammer, and no additional configuration would be needed for default cloud-init behavior. Upon, if cloud-init detects a change in http://169.254.169.254/latest/meta-data/instance-id it would re-provision the system based current network_data.json and user_data provided from openstack at that time. I'd be wary of this full reprovision scenario as it will also involve cloud-init re-running ssh host key generation so SSH automation, and any user creation, password setting based on userdata which may introduce obstacles due to unrecognized SSH host keys when trying to connect to the migrated instance. Or resetting initial user passwords which could have changed over the life of the VM.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user OpenStack Infra(hudson-openstack) wrote on 2023-03-06T09:21:13.127123+00:00

This issue was fixed in the openstack/nova 27.0.0.0rc1 release candidate.

@github-actions github-actions bot added the Stale label Oct 2, 2024
@aciba90 aciba90 removed the Stale label Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
incomplete Action required by submitter launchpad Migrated from Launchpad
Projects
None yet
Development

No branches or pull requests

2 participants