Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu/devel #4659

Merged
merged 49 commits into from
Dec 5, 2023
Merged

Ubuntu/devel #4659

merged 49 commits into from
Dec 5, 2023

Conversation

aciba90
Copy link
Contributor

@aciba90 aciba90 commented Dec 4, 2023

do not squash

new_upstream_snapshot.py --commitish 23.4 --no-sru-bug
dch -r -D noble ''
git commit -m 'releasing cloud-init version 23.4-0ubuntu1' debian/changelog
build-package -- -S -d
sbuild --dist=noble --arch=amd64 --arch-all ../out/cloud-init_23.4-0ubuntu1.dsc

quilt push -a
tox -e py3
quilt pop -a

holmanb and others added 30 commits November 9, 2023 03:13
Bcachefs is a COW filesystem that was recently merged in the upstream
Linux kernel. Add resize support to cloud-init.
This function is imported as part of the util module at runtime, but is
not actually used by runtime code. It is also a one line function and
can therefore be replaced at callsites by the contents of the one
function.
…al#4597)

Update workflows: check_format, unit and integration to run
tests each time a branch lands in main.

Update our status badges in README.md to represent the project-level
status of this CI instead of per pull request.
In f2f530e the default was changed. This was never reflected to the
docs and may mislead users.
There are Azure Stack implementations which do not support IMDS.  The
only way to detect this is by checking for the presence of the
static route to IMDS.  If we don't see the route to IMDS, limit the
failure report to the host via KVP so we don't report failure to
wireserver in these cases.

There will be future work to detect that we are running on Azure
Stack and have explicit configuration passed along via ovf-env.xml
to toggle IMDS support as the environment dictates.  Until then,
this approach minimizes the risk of regression on Azure public cloud
while allowing Azure Stack VMs to provision, albeit with a logged
error that it could not fetch metadata.

- refactor _check_if_primary() to track configured routes

- use this to toggle reporting behavior

- add duration check to TestGetMetadataFromImds tests in case we
  want to toggle duration in the future.

Example diagnostic on Azure public cloud:

```
2023-10-27 12:37:04,634 - azure.py[DEBUG]: Obtained DHCP lease on interface 'eth0' (primary=True driver='hv_netvsc' router='10.0.0.1' routes=[('0.0.0.0/0', '10.0.0.1'), ('168.63.129.16/32', '10.0.0.1'), ('169.254.169.254/32', '10.0.0.1')] lease={'interface': 'eth0', 'fixed-address': '10.0.0.4', 'server-name': 'IAD011091108004SOC', 'subnet-mask': '255.255.255.0', 'dhcp-lease-time': '4294967295', 'routers': '10.0.0.1', 'dhcp-message-type': '5', 'domain-name-servers': '168.63.129.16', 'dhcp-server-identifier': '168.63.129.16', 'dhcp-renewal-time': '4294967295', 'rfc3442-classless-static-routes': '0,10,0,0,1,32,168,63,129,16,10,0,0,1,32,169,254,169,254,10,0,0,1', 'dhcp-rebinding-time': '4294967295', 'unknown-245': 'a8:3f:81:10', 'domain-name': 'uejkdvkrjiqe1jxrijlvafqihe.bx.internal.cloudapp.net', 'renew': '1 2159/12/03 19:05:19', 'rebind': '1 2159/12/03 19:05:19', 'expire': '1 2159/12/03 19:05:19'} imds_routed=True wireserver_routed=True)
```

Example diagnostic on Azure Stack:

```
2023-10-27 12:35:47,363 - azure.py[DEBUG]: Obtained DHCP lease on interface 'eth0' (primary=True driver='hv_netvsc' router='10.126.64.1' routes=[('0.0.0.0/0', '10.126.64.1'), ('168.63.129.16/32', '10.126.64.1')] lease={'interface': 'eth0', 'fixed-address': '10.126.64.35', 'subnet-mask': '255.255.252.0', 'routers': '10.126.64.1', 'dhcp-lease-time': '4294967295', 'dhcp-message-type': '5', 'domain-name-servers': '10.50.10.50,10.50.50.50', 'dhcp-server-identifier': '168.63.129.16', 'interface-mtu': '1500', 'dhcp-renewal-time': '4294967295', 'unknown-245': 'a8:3f:81:10', 'rfc3442-classless-static-routes': '0,10,126,64,1,32,168,63,129,16,10,126,64,1', 'dhcp-rebinding-time': '4294967295', 'domain-name': 'corp.microsoft.com', 'renew': '1 2159/12/03 19:04:02', 'rebind': '1 2159/12/03 19:04:02', 'expire': '1 2159/12/03 19:04:02'} imds_routed=False wireserver_routed=True)
```

We can see that IMDS routing is detected appropriately.

Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
…ration()" (canonical#4607)

This reverts commit 518047a.

This commit could cause issues in bringing up the network, especially
because in non-netplan NetworkManager environments, may-fail=false .
Signed-off-by: Aviral Singh <itsaviral.2609@gmail.com>
Co-authored-by: James Falcon <james.falcon@canonical.com>
Co-authored-by: Chad Smith <chad.smith@canonical.com>
…nical#4604)

Drop the limit for connection errors if a route to IMDS is detected.
Since we know we've brought up the primary NIC and connectivity should
work, continue to retry until retry_deadline is up.

If there is no route to IMDS detected, such as Azure Stack, keep the
current limit of 11.

Fetching reprovision data behavior is unchanged.

Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
This is normal behavior, don't warn.

Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
As part of 29ed5f5, we also need to revert the integration test changes
After reprovisioning, IMDS should return None value for PPS data
type.

Introduce a check to verify it IMDS is returning stale-PPS
data. In case of stale data, report the error to the host and allow
the VM to continue with provisioning. This is part of a larger effort
to improve error handling and error reporting during VM provisioning.
Some snap tests were missing the top-level "snap" key. Because
additional properties are allowed at the top-level, this is passing
when it shouldn't.
Add mising schema definitions that are viable cloud-config.
Some misc test changes and docs move to ensure restricting top-level
properties in the cloud-config schema doesn't fail
New top-level properties key is necessary to restrict additional
top-level keys. Simpler schemes won't work because our documentation
inspects the module defintions in the schema.
)

Do not perform schema validation on merged system config as it
contains base config from /tec/cloud/cloud.cfg* which is not
strictly honored as cloud-config user-data.

Only perform schema validation on the processed user-data provided to
the instance.
This avoids schema errors for base config keys such as datasource_list,
def_log_file, _logs present in /etc/cloud/cloud.cfg*.
Previously users that didn't have the python jsonschema dependency
installed would "succeed" to validate a user configuration. Fix it.
The version has been the same for 3 releases now, so use it as a
default. It is used by test_versioned_packages_are_installed test.
These two tests won't be fixed anytime soon, so there's no reason to
continue the jenkins noise.
Previously two subcommands used different format than the rest of them.
Unify behavior and code.
…ical#4617)

Assertions about the host's kernel in a container are useless anyways.
holmanb and others added 19 commits November 27, 2023 10:43
A deprecate log was called before logging was set up.
)

EphemeralIPv{4,6} failure is not always an error, therefore do not log
this event as an error in the context manager. Allow call sites to
determine log level.

Fixes canonical#4540
…han the new mode (canonical#4250)"

This reverts commit a0e4ec1.

This commit introduced a broken `compare_permissions()` function. With
this implementation, a log file with a mode of 0o407 would remain
readable to other users on the system.

If the unit tests included in this commit expected the correct output values,
the tests would have caught this error.
Previous implementations loosened permissions in non-default scenarios.

Fixes canonicalGH-4243
When non-root user calls cloud-init schema to validate cfg files,
they will get a permissions error on /var/lib/cloud/instance dir
trying to source the cached datasource. Fallback to default
read_cfg_paths in this case and log debug level msg about this case.

Since cloud-init schema CLI is used in the context of validating
config files and not really searching for cfg path overrides, using
default paths to find instance-data.json isn't critical or even a
warning condition.

Additionally avoid print() and use LOG.warning when no datasource
currently detected as this should indicate that scripts or user
may be trying to read instance-data.json which does not exist yet.

Fixes canonicalGH-4620
It's magic that makes it hard to follow call hierachies without saving
much code.
- Add Python netifaces dependency
- Double quote to prevent globbing and word splitting
- Add CodeBleu as a contributor
Since adding the top-level properties key to the schema, the fuzz test
will only generate empty tests. Update the test to remove/ignore the
top-level keys along with ensuring fuzzed additionalProperties are
removed from the test.
…4635)

Only write /var/lib/cloud/instance/network-config.json once
datasource is detected. The /var/lib/clound/instance symlink is
created by Init.instancify after datasource detection is complete.

Move creation of /var/lib/cloud/instance/network-config.json into
a separate method _write_network_config_json. It will be called by
any call to apply_network_config.

apply_network_config is called in both Local and Network stages.

In Local stage, apply_network_config is used to either:
 - render the final network config of datasource detected in Local
 - in absence of Local datasource, render basic fallback DHCP on
   primary NIC to allow network to come up before detecting a
   Network datasource

For Network datasources, they will not have been discovered or
instancify'd in Local boot stage, so apply_network_config cannot
yet persist network-config.json.

Defer creation of network-config.json for Network datasources
until the link /var/lib/cloud/instance exists and
apply_network_config is called in Network stage to render final
network config.

Fixes canonicalGH-4630
)

Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
Restart time after reboot is slower than 30 seconds on
some platforms. Give the DatasourceNone detection a total
of 90 seconds.
…al#4635)

Earlier iterations of this test exposed failure cases where
DataSourceNone and cloud-init status --wait would block indefinitely
in some failure modes. The primary failure mode was DataSourceNone
config on LXD containers due to invalid network config on virtual
NICs due to LP:#2022947 and newer netplan resulting in
tracebacks preventing init local and network boot stages from
persisting completion status to /run/cloud-init/status.json.

This is resolved already within the test by copying working
network config prior to DataSourceNone detection on lxd_containers.

The test can now rely on client.restart default blocking and retries
to avoid expiring all retries on platforms with slower reboots.
Bump the version in cloudinit/version.py to 23.4 and
update ChangeLog.
@aciba90
Copy link
Contributor Author

aciba90 commented Dec 4, 2023

While sbuilding I got the following error:

W: cloud-init changes: unknown-architecture amd64_translations

See the full sbuild output: https://pastebin.ubuntu.com/p/QFnBb3Hkkr/

@aciba90
Copy link
Contributor Author

aciba90 commented Dec 5, 2023

The previous issue/comment happened while using a chroot from sbuild-launchpad-chroot. I have been able to execute a successful sbuild with no lintian warnings using a normal chroot (from mk-sbuild).

@aciba90 aciba90 marked this pull request as ready for review December 5, 2023 09:00
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@TheRealFalcon TheRealFalcon merged commit ab07628 into canonical:ubuntu/devel Dec 5, 2023
24 checks passed
@aciba90 aciba90 deleted the ubuntu/devel branch December 5, 2023 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants