Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add _netdev option to mount Azure ephemeral disk #1213

Merged
merged 4 commits into from
Feb 4, 2022

Conversation

otubo
Copy link
Contributor

@otubo otubo commented Jan 26, 2022

Proposed Commit Message

The ephemeral disk depends on a functional network to be mounted. Even
though it depends on cloud-init.service, sometimes an ordering cycle is
noticed on the instance. If the option "_netdev" is added the problem is
gone.

rhbz: #1998445

Signed-off-by: Eduardo Otubo otubo@redhat.com

Additional Context

Test Steps

Deploy a RHEL-9/cloud-init-21.1 instance on Azure and observe that sometimes there's an ordering cycle on the logs and there's no network connectivity.

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

otubo added 2 commits January 26, 2022 12:31
The ephemeral disk depends on a functional network to be mounted. Even
though it depends on cloud-init.service, sometimes an ordering cycle is
noticed on the instance. If the option "_netdev" is added the problem is
gone.

rhbz: #1998445

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
@otubo
Copy link
Contributor Author

otubo commented Jan 26, 2022

For some reason tox didn't complain about the line length on my local copy. Anyways, just updated. Thanks!

@anhvoms
Copy link
Contributor

anhvoms commented Jan 30, 2022

@otubo Can you elaborate on why this is the case "The ephemeral disk depends on a functional network to be mounted"? Why is this an issue for RHEL 9 and not other distros (or earlier versions of RHEL)?
If this is indeed an issue that is specific to Azure, I think we should just scope it to DSAzure. DSAzure should pass in the mount options when it tries to mount the ephemeral resource disk.

@otubo
Copy link
Contributor Author

otubo commented Jan 31, 2022

@anhvoms you probably want to check the whole discussion, more specifically David's comment on the original BZ. TL;DR he says a recent update on systemd made it visible, but the problem might have been there all along.

@cjp256
Copy link
Contributor

cjp256 commented Jan 31, 2022

@anhvoms you probably want to check the whole discussion, more specifically David's comment on the original BZ. TL;DR he says a recent update on systemd made it visible, but the problem might have been there all along.

Anh is out out for a couple of days. I don't have access to that bug so I cannot read along, but the mount already has a dependency on cloud-init.service, which itself should have a dependency on network being online. Has that changed for RHEL9?

RHEL 8.5:

# /usr/lib/systemd/system/cloud-init.service

[Unit]
Description=Initial cloud-init job (metadata service crawler)
Wants=cloud-init-local.service
Wants=sshd-keygen.service
Wants=sshd.service
After=cloud-init-local.service
After=NetworkManager.service network.service
After=NetworkManager-wait-online.service
...

Can you share the error you are seeing? I'm also not quite sure why networking needs to be online to mount the disk...

Thanks!

@cjp256
Copy link
Contributor

cjp256 commented Jan 31, 2022

To summarize from the BZ...

A change in systemd (systemd/systemd@fa138f5) triggers an ordering cycle because local-fs.target will now set After=mnt.mount (and local-fs.target is required for cloud-init). Previously, it would just be Wants=mnt.mount, avoiding the order cycling.

systemd-248-7.el9: - No mnt.mount in "After"

systemctl show -p After local-fs.target

After=systemd-remount-fs.service tmp.mount run-user-0.mount
systemd-fsck-root.service boot-efi.mount local-fs-pre.target

systemd-249-2.el9: - Have mnt.mount in "After"

systemctl show -p After local-fs.target

After=systemd-fsck-root.service mnt.mount run-user-0.mount boot-efi.mount
run-user-1000.mount local-fs-pre.target systemd-remount-fs.service tmp.mount
-.mount

The argument made in the BZ is that due to a dependency on cloud-init.service, the ephemeral disk should not be considered local, but instead mark it as _netdev so that is not considered a local filesystem.

It seems to me that this grouping is slightly at odds with the x-systemd.requires configuration, but from what I gather reading the systemd.mount manpage, mounts fall into two groups: local-fs.target or remote-fs.target, depending whether the file system is local or remote. If our mount cannot make the local-fs.target, it should declare _netdev to group in with remote-fs.target.

Another option would be to instead use nofail so that it is not required by either local or remote targets - but that would likely cause other issues :)

From my perspective given what I've seen, this seems a reasonable change (and not specific to Azure). 👍 Are there possible side-effects from grouping with remote-fs.target?

@anhvoms
Copy link
Contributor

anhvoms commented Feb 2, 2022

@cjp256 thanks for the explanation. This looks reasonable to me

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @cjp256 . Given that this module runs after networking has been applied, it looks like a reasonable change.

@TheRealFalcon
Copy link
Member

TheRealFalcon commented Feb 2, 2022

@otubo , looks like there's still a formatting issue. Can you run tox -e format to fix it (or after a rebase it would be tox -e do_format)?

@otubo
Copy link
Contributor Author

otubo commented Feb 4, 2022

I hate style issues, sorry for the inconvenience guys. Should be good to go now. Thanks!

@TheRealFalcon TheRealFalcon merged commit a1cfd1b into canonical:main Feb 4, 2022
blackboxsw pushed a commit that referenced this pull request Feb 10, 2022
Fixes the spaces introduced in #1213

Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants