Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Determining route metric based on NIC name #5070

Merged
merged 1 commit into from
Apr 8, 2024

Conversation

ld9379435
Copy link
Contributor

@ld9379435 ld9379435 commented Mar 19, 2024

Proposed Commit Message

feat: Determining route metric based on NIC name

In the Alibaba Cloud scenario, we do not wish to define routing priority based on MAC addresses.
In a cloud environment where the kernel parameter net.ifnames=0 has been configured,
network interface card (NIC) names are determined by default according to their underlying Bus, Device, and Function (BDF) numbers,
incrementing from eth0 to ethN, with eth0 acting as the default primary NIC name.

In the previous logic, network-card has the highest priority, followed by device-number as the second priority.
When default_nic_order is set to NicOrder.MAC, the mac address takes the third priority.
On the other hand, when default_nic_order is set to NicOrder.NIC_NAME, the NIC name becomes the third priority.

In AWS environments, the default setting remains as default_nic_order = NicOrder.MAC, maintaining the original behavior.
However, in Alibaba Cloud scenarios, we set default_nic_order = NicOrder.NIC_NAME.

In the earlier code snippet, before calling _build_nic_order, the input macs were sorted.
Therefore, in the test cases, there should not be a scenario where macs are given as ["0a:f7:8d:96:f2:a1", "0a:0d:dd:44:cd:7b"].
    macs = sorted(macs_to_nics.keys())
    nic_order = _build_nic_order(macs_metadata, macs)

For the new code implementation, I move the line macs = sorted(macs_to_nics.keys()) inside the function and pass the macs_to_nics dictionary as a parameter instead,
which allows for easy access to the nic names.

Additional Context

Test Steps

Checklist

@aciba90 aciba90 self-assigned this Mar 19, 2024
Copy link
Contributor

@aciba90 aciba90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ld9379435, for making cloud-init better. The general approach looks good to me. I left some inline comments / concerns.

This PR changes slightly the behavior of cloud-init. In order to assess if this new behavior should be patched out for stable ubuntu releases, I am trying to understand if this is a fix or a change of behavior. Is this something that is broken in every possible distro in Alibaba Cloud ECS? If there is any distro with predictable NIC naming, this new ordering might be considered a change of behavior.

As I have not access to Alibaba, could you please paste the output of cloud-init collect-logs?

Unified proposed patch
diff --git a/cloudinit/sources/DataSourceAliYun.py b/cloudinit/sources/DataSourceAliYun.py
index 7cacd67e1..45cc0b164 100644
--- a/cloudinit/sources/DataSourceAliYun.py
+++ b/cloudinit/sources/DataSourceAliYun.py
@@ -34,6 +34,10 @@ class DataSourceAliYun(EC2.DataSourceEc2):
         self.default_update_events[EventScope.NETWORK].add(EventType.BOOT)
         self.default_nic_order = NicOrder.NIC_NAME
 
+    def _unpickle(self, ci_pkl_version: int) -> None:
+        super()._unpickle(ci_pkl_version)
+        self.default_nic_order = NicOrder.NIC_NAME
+
     def get_hostname(self, fqdn=False, resolve_ip=False, metadata_only=False):
         hostname = self.metadata.get("hostname")
         is_default = False
diff --git a/cloudinit/sources/DataSourceEc2.py b/cloudinit/sources/DataSourceEc2.py
index d78e97a02..64c328737 100644
--- a/cloudinit/sources/DataSourceEc2.py
+++ b/cloudinit/sources/DataSourceEc2.py
@@ -123,6 +123,8 @@ class DataSourceEc2(sources.DataSource):
     def _unpickle(self, ci_pkl_version: int) -> None:
         super()._unpickle(ci_pkl_version)
         self.extra_hotplug_udev_rules = _EXTRA_HOTPLUG_UDEV_RULES
+        if not hasattr(self, "default_nic_order"):
+            self.default_nic_order = NicOrder.NIC_NAME
 
     def _get_cloud_name(self):
         """Return the cloud name as identified during _get_data."""
@@ -912,8 +914,7 @@ def _build_nic_order(
     @return: Dictionary with macs as keys and nic orders as values.
     """
     nic_order: Dict[str, int] = {}
-    macs = sorted(macs_to_nics.keys())
-    if len(macs) == 0 or len(macs_metadata) == 0:
+    if len(macs_to_nics) == 0 or len(macs_metadata) == 0:
         return nic_order
 
     valid_macs_metadata = filter(
@@ -921,7 +922,8 @@ def _build_nic_order(
         lambda mmd: mmd[1] is not None,
         # filter by macs
         map(
-            lambda mac: (mac, macs_metadata.get(mac), macs_to_nics[mac]), macs
+            lambda mac: (mac, macs_metadata.get(mac), macs_to_nics[mac]),
+            macs_to_nics.keys(),
         ),
     )
 
@@ -939,7 +941,7 @@ def _build_nic_order(
     # function.
     return {
         mac: i
-        for i, (mac, _mac_metadata, nic_name) in enumerate(
+        for i, (mac, _mac_metadata, _nic_name) in enumerate(
             sorted(
                 valid_macs_metadata,
                 key=lambda mmd: (
diff --git a/tests/unittests/sources/test_ec2.py b/tests/unittests/sources/test_ec2.py
index 5f8310474..fa5736742 100644
--- a/tests/unittests/sources/test_ec2.py
+++ b/tests/unittests/sources/test_ec2.py
@@ -948,7 +948,10 @@ class TestBuildNicOrder:
     @pytest.mark.parametrize(
         ["macs_metadata", "macs_to_nics", "default_nic_order", "expected"],
         [
-            pytest.param({}, {}, None, {}, id="all_empty"),
+            pytest.param({}, {}, NicOrder.MAC, {}, id="all_empty"),
+            pytest.param(
+                {}, {}, NicOrder.NIC_NAME, {}, id="all_empty_sort_by_nic_name"
+            ),
             pytest.param(
                 {},
                 {"0a:f7:8d:96:f2:a1": "eth0"},
diff --git a/tests/unittests/test_upgrade.py b/tests/unittests/test_upgrade.py
index a7dd8538f..0aca3683b 100644
--- a/tests/unittests/test_upgrade.py
+++ b/tests/unittests/test_upgrade.py
@@ -34,12 +34,7 @@ class TestUpgrade:
     # The presence of these attributes existed in 20.1.
     ds_expected_unpickle_attrs = {
         "AltCloud": {"seed", "supported_seed_starts"},
-        "AliYun": {
-            "identity",
-            "metadata_address",
-            "default_update_events",
-            "default_nic_order",
-        },
+        "AliYun": {"identity", "metadata_address", "default_update_events"},
         "Azure": {
             "_ephemeral_dhcp_ctx",
             "_iso_dev",
@@ -80,7 +75,7 @@ class TestUpgrade:
             "use_ip4LL",
             "wait_retry",
         },
-        "Ec2": {"identity", "metadata_address", "default_nic_order"},
+        "Ec2": {"identity", "metadata_address"},
         "Exoscale": {
             "api_version",
             "extra_config",

tests/unittests/test_upgrade.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceEc2.py Outdated Show resolved Hide resolved
tests/unittests/sources/test_ec2.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceEc2.py Outdated Show resolved Hide resolved
cloudinit/sources/__init__.py Outdated Show resolved Hide resolved
@ld9379435
Copy link
Contributor Author

ld9379435 commented Mar 28, 2024

cloud-init.tar.gz
@aciba90 Sorry for the late reply.
This is the cloud-init log running on ubuntu22 with the commit feature.

Copy link
Contributor

@aciba90 aciba90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changeset LGTM. Thanks for addressing the comments. I left a couple of inline nit/minor requests.

Unfortunately, I cannot open https://github.com/canonical/cloud-init/files/14789494/cloud-init.tar.gz, could you please reattach the logs?

This PR changes slightly the behavior of cloud-init. In order to assess if this new behavior should be patched out for stable ubuntu releases, I am trying to understand if this is a fix or a change of behavior. Is this something that is broken in every possible distro in Alibaba Cloud ECS? If there is any distro with predictable NIC naming, this new ordering might be considered a change of behavior.

This question is still open. Is this a bug fix or a change of behavior?

Thanks!

tests/unittests/test_upgrade.py Outdated Show resolved Hide resolved
cloudinit/sources/__init__.py Outdated Show resolved Hide resolved
@aciba90
Copy link
Contributor

aciba90 commented Apr 4, 2024

Regarding the failing linkcheck, it was already fixed. A rebase of main will make it green.

@ld9379435 ld9379435 force-pushed the devel/aliyun branch 2 times, most recently from bfd16ba to 456322d Compare April 7, 2024 06:05
In the Alibaba Cloud scenario, we do not wish to define routing priority based on MAC addresses.
In a cloud environment where the kernel parameter net.ifnames=0 has been configured,
network interface card (NIC) names are determined by default according to their underlying Bus, Device, and Function (BDF) numbers,
incrementing from eth0 to ethN, with eth0 acting as the default primary NIC name.

In the previous logic, network-card has the highest priority, followed by device-number as the second priority.
When _fallback_nic_order is set to NicOrder.MAC, the mac address takes the third priority.
On the other hand, when _fallback_nic_order is set to NicOrder.NIC_NAME, the NIC name becomes the third priority.

In AWS environments, the default setting remains as _fallback_nic_order = NicOrder.MAC, maintaining the original behavior.
However, in Alibaba Cloud scenarios, we set _fallback_nic_order = NicOrder.NIC_NAME.
@ld9379435
Copy link
Contributor Author

Changeset LGTM. Thanks for addressing the comments. I left a couple of inline nit/minor requests.

Unfortunately, I cannot open https://github.com/canonical/cloud-init/files/14789494/cloud-init.tar.gz, could you please reattach the logs?

This PR changes slightly the behavior of cloud-init. In order to assess if this new behavior should be patched out for stable ubuntu releases, I am trying to understand if this is a fix or a change of behavior. Is this something that is broken in every possible distro in Alibaba Cloud ECS? If there is any distro with predictable NIC naming, this new ordering might be considered a change of behavior.

This question is still open. Is this a bug fix or a change of behavior?

Thanks!

cloud-init.tar.gz
I re-uploaded the file named cloud-init.tar.gz. Please review it.

@ld9379435
Copy link
Contributor Author

Changeset LGTM. Thanks for addressing the comments. I left a couple of inline nit/minor requests.

Unfortunately, I cannot open https://github.com/canonical/cloud-init/files/14789494/cloud-init.tar.gz, could you please reattach the logs?

This PR changes slightly the behavior of cloud-init. In order to assess if this new behavior should be patched out for stable ubuntu releases, I am trying to understand if this is a fix or a change of behavior. Is this something that is broken in every possible distro in Alibaba Cloud ECS? If there is any distro with predictable NIC naming, this new ordering might be considered a change of behavior.

This question is still open. Is this a bug fix or a change of behavior?

Thanks!

I personally think that in the Alibaba Cloud scenario, it is a change in behavior that does not affect the behavior of other clouds like AWS.

In the event that the meta-server does not provide network-card or device-number, and in a multi-network card scenario, it is currently observed that all distributions configure route metrics prioritization according to MAC address. However, we hope to use the network card name for sorting in the Alibaba Cloud scenario.

Furthermore, I have also encountered another issue in RHEL series systems. When the libnm-settings-plugin-ifcfg-rh.so (NetworkManager can handle the ifcfg files) is present, cloud-init configures ifcfg-eth* files for multi-network cards. However, it seems that configuring the metrics as METRICS=100 or METRICS=200 in these files does not take effect. Upon consulting the nm-settings-ifcfg-rh manual, it is indicated that the configuration should be IPV4_ROUTE_METRIC or IPV6_ROUTE_METRIC.

Hope your response, thanks

@ld9379435 ld9379435 requested a review from aciba90 April 8, 2024 05:55
Copy link
Contributor

@aciba90 aciba90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

I personally think that in the Alibaba Cloud scenario, it is a change in behavior that does not affect the behavior of other clouds like AWS.

In the event that the meta-server does not provide network-card or device-number, and in a multi-network card scenario, it is currently observed that all distributions configure route metrics prioritization according to MAC address. However, we hope to use the network card name for sorting in the Alibaba Cloud scenario.

That makes sense, thanks for the context. Then, secondary nic route-metrics will change only in Aliyun/Alibaba (sorted by NIC name instead of MAC in the absence of more info to sort by). In my opinion, that's probably okay to be included as-is in stable Ubuntu releases.

Furthermore, I have also encountered another issue in RHEL series systems. When the libnm-settings-plugin-ifcfg-rh.so (NetworkManager can handle the ifcfg files) is present, cloud-init configures ifcfg-eth* files for multi-network cards. However, it seems that configuring the metrics as METRICS=100 or METRICS=200 in these files does not take effect. Upon consulting the nm-settings-ifcfg-rh manual, it is indicated that the configuration should be IPV4_ROUTE_METRIC or IPV6_ROUTE_METRIC.

Please, feel free to create a new issue.

@aciba90 aciba90 merged commit 0a3a5e2 into canonical:main Apr 8, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants