-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Determining route metric based on NIC name #5070
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @ld9379435, for making cloud-init better. The general approach looks good to me. I left some inline comments / concerns.
This PR changes slightly the behavior of cloud-init. In order to assess if this new behavior should be patched out for stable ubuntu releases, I am trying to understand if this is a fix or a change of behavior. Is this something that is broken in every possible distro in Alibaba Cloud ECS
? If there is any distro with predictable NIC naming, this new ordering might be considered a change of behavior.
As I have not access to Alibaba, could you please paste the output of cloud-init collect-logs
?
Unified proposed patch
diff --git a/cloudinit/sources/DataSourceAliYun.py b/cloudinit/sources/DataSourceAliYun.py
index 7cacd67e1..45cc0b164 100644
--- a/cloudinit/sources/DataSourceAliYun.py
+++ b/cloudinit/sources/DataSourceAliYun.py
@@ -34,6 +34,10 @@ class DataSourceAliYun(EC2.DataSourceEc2):
self.default_update_events[EventScope.NETWORK].add(EventType.BOOT)
self.default_nic_order = NicOrder.NIC_NAME
+ def _unpickle(self, ci_pkl_version: int) -> None:
+ super()._unpickle(ci_pkl_version)
+ self.default_nic_order = NicOrder.NIC_NAME
+
def get_hostname(self, fqdn=False, resolve_ip=False, metadata_only=False):
hostname = self.metadata.get("hostname")
is_default = False
diff --git a/cloudinit/sources/DataSourceEc2.py b/cloudinit/sources/DataSourceEc2.py
index d78e97a02..64c328737 100644
--- a/cloudinit/sources/DataSourceEc2.py
+++ b/cloudinit/sources/DataSourceEc2.py
@@ -123,6 +123,8 @@ class DataSourceEc2(sources.DataSource):
def _unpickle(self, ci_pkl_version: int) -> None:
super()._unpickle(ci_pkl_version)
self.extra_hotplug_udev_rules = _EXTRA_HOTPLUG_UDEV_RULES
+ if not hasattr(self, "default_nic_order"):
+ self.default_nic_order = NicOrder.NIC_NAME
def _get_cloud_name(self):
"""Return the cloud name as identified during _get_data."""
@@ -912,8 +914,7 @@ def _build_nic_order(
@return: Dictionary with macs as keys and nic orders as values.
"""
nic_order: Dict[str, int] = {}
- macs = sorted(macs_to_nics.keys())
- if len(macs) == 0 or len(macs_metadata) == 0:
+ if len(macs_to_nics) == 0 or len(macs_metadata) == 0:
return nic_order
valid_macs_metadata = filter(
@@ -921,7 +922,8 @@ def _build_nic_order(
lambda mmd: mmd[1] is not None,
# filter by macs
map(
- lambda mac: (mac, macs_metadata.get(mac), macs_to_nics[mac]), macs
+ lambda mac: (mac, macs_metadata.get(mac), macs_to_nics[mac]),
+ macs_to_nics.keys(),
),
)
@@ -939,7 +941,7 @@ def _build_nic_order(
# function.
return {
mac: i
- for i, (mac, _mac_metadata, nic_name) in enumerate(
+ for i, (mac, _mac_metadata, _nic_name) in enumerate(
sorted(
valid_macs_metadata,
key=lambda mmd: (
diff --git a/tests/unittests/sources/test_ec2.py b/tests/unittests/sources/test_ec2.py
index 5f8310474..fa5736742 100644
--- a/tests/unittests/sources/test_ec2.py
+++ b/tests/unittests/sources/test_ec2.py
@@ -948,7 +948,10 @@ class TestBuildNicOrder:
@pytest.mark.parametrize(
["macs_metadata", "macs_to_nics", "default_nic_order", "expected"],
[
- pytest.param({}, {}, None, {}, id="all_empty"),
+ pytest.param({}, {}, NicOrder.MAC, {}, id="all_empty"),
+ pytest.param(
+ {}, {}, NicOrder.NIC_NAME, {}, id="all_empty_sort_by_nic_name"
+ ),
pytest.param(
{},
{"0a:f7:8d:96:f2:a1": "eth0"},
diff --git a/tests/unittests/test_upgrade.py b/tests/unittests/test_upgrade.py
index a7dd8538f..0aca3683b 100644
--- a/tests/unittests/test_upgrade.py
+++ b/tests/unittests/test_upgrade.py
@@ -34,12 +34,7 @@ class TestUpgrade:
# The presence of these attributes existed in 20.1.
ds_expected_unpickle_attrs = {
"AltCloud": {"seed", "supported_seed_starts"},
- "AliYun": {
- "identity",
- "metadata_address",
- "default_update_events",
- "default_nic_order",
- },
+ "AliYun": {"identity", "metadata_address", "default_update_events"},
"Azure": {
"_ephemeral_dhcp_ctx",
"_iso_dev",
@@ -80,7 +75,7 @@ class TestUpgrade:
"use_ip4LL",
"wait_retry",
},
- "Ec2": {"identity", "metadata_address", "default_nic_order"},
+ "Ec2": {"identity", "metadata_address"},
"Exoscale": {
"api_version",
"extra_config",
cloud-init.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changeset LGTM. Thanks for addressing the comments. I left a couple of inline nit/minor requests.
Unfortunately, I cannot open https://github.com/canonical/cloud-init/files/14789494/cloud-init.tar.gz, could you please reattach the logs?
This PR changes slightly the behavior of cloud-init. In order to assess if this new behavior should be patched out for stable ubuntu releases, I am trying to understand if this is a fix or a change of behavior. Is this something that is broken in every possible distro in Alibaba Cloud ECS? If there is any distro with predictable NIC naming, this new ordering might be considered a change of behavior.
This question is still open. Is this a bug fix or a change of behavior?
Thanks!
Regarding the failing |
bfd16ba
to
456322d
Compare
In the Alibaba Cloud scenario, we do not wish to define routing priority based on MAC addresses. In a cloud environment where the kernel parameter net.ifnames=0 has been configured, network interface card (NIC) names are determined by default according to their underlying Bus, Device, and Function (BDF) numbers, incrementing from eth0 to ethN, with eth0 acting as the default primary NIC name. In the previous logic, network-card has the highest priority, followed by device-number as the second priority. When _fallback_nic_order is set to NicOrder.MAC, the mac address takes the third priority. On the other hand, when _fallback_nic_order is set to NicOrder.NIC_NAME, the NIC name becomes the third priority. In AWS environments, the default setting remains as _fallback_nic_order = NicOrder.MAC, maintaining the original behavior. However, in Alibaba Cloud scenarios, we set _fallback_nic_order = NicOrder.NIC_NAME.
cloud-init.tar.gz |
I personally think that in the Alibaba Cloud scenario, it is a change in behavior that does not affect the behavior of other clouds like AWS. In the event that the meta-server does not provide network-card or device-number, and in a multi-network card scenario, it is currently observed that all distributions configure route metrics prioritization according to MAC address. However, we hope to use the network card name for sorting in the Alibaba Cloud scenario. Furthermore, I have also encountered another issue in RHEL series systems. When the libnm-settings-plugin-ifcfg-rh.so (NetworkManager can handle the ifcfg files) is present, cloud-init configures ifcfg-eth* files for multi-network cards. However, it seems that configuring the metrics as METRICS=100 or METRICS=200 in these files does not take effect. Upon consulting the nm-settings-ifcfg-rh manual, it is indicated that the configuration should be IPV4_ROUTE_METRIC or IPV6_ROUTE_METRIC. Hope your response, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
I personally think that in the Alibaba Cloud scenario, it is a change in behavior that does not affect the behavior of other clouds like AWS.
In the event that the meta-server does not provide network-card or device-number, and in a multi-network card scenario, it is currently observed that all distributions configure route metrics prioritization according to MAC address. However, we hope to use the network card name for sorting in the Alibaba Cloud scenario.
That makes sense, thanks for the context. Then, secondary nic route-metrics will change only in Aliyun/Alibaba (sorted by NIC name instead of MAC in the absence of more info to sort by). In my opinion, that's probably okay to be included as-is in stable Ubuntu releases.
Furthermore, I have also encountered another issue in RHEL series systems. When the libnm-settings-plugin-ifcfg-rh.so (NetworkManager can handle the ifcfg files) is present, cloud-init configures ifcfg-eth* files for multi-network cards. However, it seems that configuring the metrics as METRICS=100 or METRICS=200 in these files does not take effect. Upon consulting the nm-settings-ifcfg-rh manual, it is indicated that the configuration should be IPV4_ROUTE_METRIC or IPV6_ROUTE_METRIC.
Please, feel free to create a new issue.
Proposed Commit Message
Additional Context
Test Steps
Checklist