-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/dhcp: handle timeouts for dhcpcd #5022
Conversation
I'm not sure 10 seconds is enough of a timeout. Azure will retry DHCP failures, but I expect the reduction from 300s/60s -> 10s will result in unexpected failures for other clouds where there is no retry. I see that dhcpcd offers a --timeout option which might be better than relying on subprocess timeout? Something to consider. In the meantime: - catch the TimeoutExpired exception and raise NoDhcpLeaseError to address below traceback - update the "dhclient" logs to "dhcpcd" ``` Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 851, in _get_data crawled_data = util.log_time( ^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2825, in log_time ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 630, in crawl_metadata self._setup_ephemeral_networking(timeout_minutes=timeout_minutes) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 440, in _setup_ephemeral_networking lease = self._ephemeral_dhcp_ctx.obtain_lease() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/ephemeral.py", line 288, in obtain_lease self.lease = maybe_perform_dhcp_discovery( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 102, in maybe_perform_dhcp_discovery return distro.dhcp_client.dhcp_discovery(interface, dhcp_log_func, distro) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 649, in dhcp_discovery out, err = subp.subp( ^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 276, in subp out, err = sp.communicate(data, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 1209, in communicate stdout, stderr = self._communicate(input, endtime, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 2114, in _communicate self._check_timeout(endtime, orig_timeout, stdout, stderr) File "/usr/lib/python3.12/subprocess.py", line 1253, in _check_timeout raise TimeoutExpired( subprocess.TimeoutExpired: Command '[b'/usr/sbin/dhcpcd', b'--ipv4only', b'--waitip', b'--persistent', b'--noarp', b'--script=/bin/true', b'eth0']' timed out after 10 seconds ``` Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
linkcheck CI looks broken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjp256 Thanks for this fix! We're putting together a 24.1.1 due to some feedback received on early testing, I think this makes sense for inclusion.
I'm not sure 10 seconds is enough of a timeout. Azure will retry DHCP failures, but I expect the reduction from 300s/60s -> 10s will result in unexpected failures for other clouds where there is no retry.
Oof. Thanks for correcting this behavior, I missed the TimeoutExpired exception.
I see that dhcpcd offers a --timeout option which might be better than relying on subprocess timeout? Something to consider.
I didn't feel comfortable using it due to NetworkConfiguration/dhcpcd#276. If carrier is not acquired, it would prevent retry (block indefinitely). Manually retrying via subprocess in this case would be more reliable, I think.
linkcheck CI looks broken
I'll get that fixed.
There may be some issues with dhcpcd retry fwiw based on #5023 - will see for sure after we flight this change |
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
unit test failure looks unrelated:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for this fix!
I'm not sure 10 seconds is enough of a timeout. Azure will retry DHCP failures, but I expect the reduction from 300s/60s -> 10s will result in unexpected failures for other clouds where there is no retry. I see that dhcpcd offers a --timeout option which might be better than relying on subprocess timeout? Something to consider. In the meantime: - catch the TimeoutExpired exception and raise NoDhcpLeaseError to address below traceback - update the "dhclient" logs to "dhcpcd" ``` Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 851, in _get_data crawled_data = util.log_time( ^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2825, in log_time ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 630, in crawl_metadata self._setup_ephemeral_networking(timeout_minutes=timeout_minutes) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 440, in _setup_ephemeral_networking lease = self._ephemeral_dhcp_ctx.obtain_lease() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/ephemeral.py", line 288, in obtain_lease self.lease = maybe_perform_dhcp_discovery( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 102, in maybe_perform_dhcp_discovery return distro.dhcp_client.dhcp_discovery(interface, dhcp_log_func, distro) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 649, in dhcp_discovery out, err = subp.subp( ^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 276, in subp out, err = sp.communicate(data, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 1209, in communicate stdout, stderr = self._communicate(input, endtime, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 2114, in _communicate self._check_timeout(endtime, orig_timeout, stdout, stderr) File "/usr/lib/python3.12/subprocess.py", line 1253, in _check_timeout raise TimeoutExpired( subprocess.TimeoutExpired: Command '[b'/usr/sbin/dhcpcd', b'--ipv4only', b'--waitip', b'--persistent', b'--noarp', b'--script=/bin/true', b'eth0']' timed out after 10 seconds ``` Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
I'm not sure 10 seconds is enough of a timeout. Azure will retry DHCP failures, but I expect the reduction from 300s/60s -> 10s will result in unexpected failures for other clouds where there is no retry. I see that dhcpcd offers a --timeout option which might be better than relying on subprocess timeout? Something to consider. In the meantime: - catch the TimeoutExpired exception and raise NoDhcpLeaseError to address below traceback - update the "dhclient" logs to "dhcpcd" ``` Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 851, in _get_data crawled_data = util.log_time( ^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2825, in log_time ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 630, in crawl_metadata self._setup_ephemeral_networking(timeout_minutes=timeout_minutes) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 440, in _setup_ephemeral_networking lease = self._ephemeral_dhcp_ctx.obtain_lease() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/ephemeral.py", line 288, in obtain_lease self.lease = maybe_perform_dhcp_discovery( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 102, in maybe_perform_dhcp_discovery return distro.dhcp_client.dhcp_discovery(interface, dhcp_log_func, distro) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 649, in dhcp_discovery out, err = subp.subp( ^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 276, in subp out, err = sp.communicate(data, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 1209, in communicate stdout, stderr = self._communicate(input, endtime, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 2114, in _communicate self._check_timeout(endtime, orig_timeout, stdout, stderr) File "/usr/lib/python3.12/subprocess.py", line 1253, in _check_timeout raise TimeoutExpired( subprocess.TimeoutExpired: Command '[b'/usr/sbin/dhcpcd', b'--ipv4only', b'--waitip', b'--persistent', b'--noarp', b'--script=/bin/true', b'eth0']' timed out after 10 seconds ``` Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
I'm not sure 10 seconds is enough of a timeout. Azure will retry DHCP failures, but I expect the reduction from 300s/60s -> 10s will result in unexpected failures for other clouds where there is no retry. I see that dhcpcd offers a --timeout option which might be better than relying on subprocess timeout? Something to consider. In the meantime: - catch the TimeoutExpired exception and raise NoDhcpLeaseError to address below traceback - update the "dhclient" logs to "dhcpcd" ``` Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 851, in _get_data crawled_data = util.log_time( ^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2825, in log_time ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 630, in crawl_metadata self._setup_ephemeral_networking(timeout_minutes=timeout_minutes) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 440, in _setup_ephemeral_networking lease = self._ephemeral_dhcp_ctx.obtain_lease() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/ephemeral.py", line 288, in obtain_lease self.lease = maybe_perform_dhcp_discovery( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 102, in maybe_perform_dhcp_discovery return distro.dhcp_client.dhcp_discovery(interface, dhcp_log_func, distro) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 649, in dhcp_discovery out, err = subp.subp( ^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 276, in subp out, err = sp.communicate(data, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 1209, in communicate stdout, stderr = self._communicate(input, endtime, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 2114, in _communicate self._check_timeout(endtime, orig_timeout, stdout, stderr) File "/usr/lib/python3.12/subprocess.py", line 1253, in _check_timeout raise TimeoutExpired( subprocess.TimeoutExpired: Command '[b'/usr/sbin/dhcpcd', b'--ipv4only', b'--waitip', b'--persistent', b'--noarp', b'--script=/bin/true', b'eth0']' timed out after 10 seconds ``` Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
I'm not sure 10 seconds is enough of a timeout. Azure will retry DHCP failures, but I expect the reduction from 300s/60s -> 10s will result in unexpected failures for other clouds where there is no retry. I see that dhcpcd offers a --timeout option which might be better than relying on subprocess timeout? Something to consider. In the meantime: - catch the TimeoutExpired exception and raise NoDhcpLeaseError to address below traceback - update the "dhclient" logs to "dhcpcd" ``` Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 851, in _get_data crawled_data = util.log_time( ^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2825, in log_time ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 630, in crawl_metadata self._setup_ephemeral_networking(timeout_minutes=timeout_minutes) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 440, in _setup_ephemeral_networking lease = self._ephemeral_dhcp_ctx.obtain_lease() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/ephemeral.py", line 288, in obtain_lease self.lease = maybe_perform_dhcp_discovery( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 102, in maybe_perform_dhcp_discovery return distro.dhcp_client.dhcp_discovery(interface, dhcp_log_func, distro) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 649, in dhcp_discovery out, err = subp.subp( ^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 276, in subp out, err = sp.communicate(data, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 1209, in communicate stdout, stderr = self._communicate(input, endtime, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 2114, in _communicate self._check_timeout(endtime, orig_timeout, stdout, stderr) File "/usr/lib/python3.12/subprocess.py", line 1253, in _check_timeout raise TimeoutExpired( subprocess.TimeoutExpired: Command '[b'/usr/sbin/dhcpcd', b'--ipv4only', b'--waitip', b'--persistent', b'--noarp', b'--script=/bin/true', b'eth0']' timed out after 10 seconds ``` Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
I'm not sure 10 seconds is enough of a timeout. Azure will retry DHCP failures, but I expect the reduction from 300s/60s -> 10s will result in unexpected failures for other clouds where there is no retry. I see that dhcpcd offers a --timeout option which might be better than relying on subprocess timeout? Something to consider. In the meantime: - catch the TimeoutExpired exception and raise NoDhcpLeaseError to address below traceback - update the "dhclient" logs to "dhcpcd" ``` Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 851, in _get_data crawled_data = util.log_time( ^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2825, in log_time ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 630, in crawl_metadata self._setup_ephemeral_networking(timeout_minutes=timeout_minutes) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 440, in _setup_ephemeral_networking lease = self._ephemeral_dhcp_ctx.obtain_lease() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/ephemeral.py", line 288, in obtain_lease self.lease = maybe_perform_dhcp_discovery( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 102, in maybe_perform_dhcp_discovery return distro.dhcp_client.dhcp_discovery(interface, dhcp_log_func, distro) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 649, in dhcp_discovery out, err = subp.subp( ^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 276, in subp out, err = sp.communicate(data, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 1209, in communicate stdout, stderr = self._communicate(input, endtime, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 2114, in _communicate self._check_timeout(endtime, orig_timeout, stdout, stderr) File "/usr/lib/python3.12/subprocess.py", line 1253, in _check_timeout raise TimeoutExpired( subprocess.TimeoutExpired: Command '[b'/usr/sbin/dhcpcd', b'--ipv4only', b'--waitip', b'--persistent', b'--noarp', b'--script=/bin/true', b'eth0']' timed out after 10 seconds ``` Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
I'm not sure 10 seconds is enough of a timeout. Azure will retry DHCP failures, but I expect the reduction from 300s/60s -> 10s will result in unexpected failures for other clouds where there is no retry. I see that dhcpcd offers a --timeout option which might be better than relying on subprocess timeout? Something to consider. In the meantime: - catch the TimeoutExpired exception and raise NoDhcpLeaseError to address below traceback - update the "dhclient" logs to "dhcpcd" ``` Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 851, in _get_data crawled_data = util.log_time( ^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2825, in log_time ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 630, in crawl_metadata self._setup_ephemeral_networking(timeout_minutes=timeout_minutes) File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 440, in _setup_ephemeral_networking lease = self._ephemeral_dhcp_ctx.obtain_lease() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/ephemeral.py", line 288, in obtain_lease self.lease = maybe_perform_dhcp_discovery( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 102, in maybe_perform_dhcp_discovery return distro.dhcp_client.dhcp_discovery(interface, dhcp_log_func, distro) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 649, in dhcp_discovery out, err = subp.subp( ^^^^^^^^^^ File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 276, in subp out, err = sp.communicate(data, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 1209, in communicate stdout, stderr = self._communicate(input, endtime, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/subprocess.py", line 2114, in _communicate self._check_timeout(endtime, orig_timeout, stdout, stderr) File "/usr/lib/python3.12/subprocess.py", line 1253, in _check_timeout raise TimeoutExpired( subprocess.TimeoutExpired: Command '[b'/usr/sbin/dhcpcd', b'--ipv4only', b'--waitip', b'--persistent', b'--noarp', b'--script=/bin/true', b'eth0']' timed out after 10 seconds ``` Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
I'm not sure 10 seconds is enough of a timeout. Azure will retry DHCP failures, but I expect the reduction from 300s/60s -> 10s will result in unexpected failures for other clouds where there is no retry.
I see that dhcpcd offers a --timeout option which might be better than relying on subprocess timeout? Something to consider.
In the meantime:
catch the TimeoutExpired exception and raise NoDhcpLeaseError to address below traceback
update the "dhclient" logs to "dhcpcd"