Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud-id: publish /run/cloud-init/cloud-id-<cloud-type> files #1244

Merged
merged 3 commits into from
Feb 10, 2022

Conversation

blackboxsw
Copy link
Collaborator

@blackboxsw blackboxsw commented Feb 4, 2022

Once a valid datasource is detected, publish the following artifacts
to expedite cloud-identification without having to invoke cloud-id from
shell scripts or sheling out from python.

These files can also be relied on in systemd ConditionPathExists
directives to limit execution of services and units to specific
clouds.

/run/cloud-init/cloud-id:

  • A symlink with content that is the canonical cloud-id of the datasource detected. This content is the same lower-case value
    as the output of /usr/bin/cloud-id.

/run/cloud-init/cloud-id-<canonical-cloud-id>:

  • A single file which will contain the canonical cloud-id encoded
    in the filename

Proposed Commit Message

Once a valid datasource is detected, publish the following artifacts
to expedite cloud-identification without having to invoke cloud-id from
shell scripts or sheling out from python.
    
These files can also be relied on in systemd ConditionPathExists
directives to limit execution of services and units to specific
clouds.
    
/run/cloud-init/cloud-id:
 - A symlink with content that is the canonical cloud-id of the
   datasource detected. This content is the same lower-case value
   as the output of /usr/bin/cloud-id.

/run/cloud-init/cloud-id-<canonical-cloud-id>:
 - A single file which will contain the canonical cloud-id encoded
   in the filename

Additional Context

Test Steps

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@blackboxsw blackboxsw added the wip Work in progress, do not land label Feb 4, 2022
@orndorffgrant
Copy link
Contributor

Looks good to me!

@blackboxsw Would something like this be sufficient to make sure the file would be in place and condition on it in a systemd unit?

[Unit]
After=cloud-config.target
ConditionPathExists=/run/cloud-init/cloud-id-gce

Also, since the content of this new file is the cloud-id as well, could the cloud-id command become just:

cat /run/cloud-init/cloud-id

?
Not necessarily in this PR, but could be an easy/small performance win as a result of this feature.

@blackboxsw
Copy link
Collaborator Author

blackboxsw commented Feb 4, 2022

Looks good to me!

@blackboxsw Would something like this be sufficient to make sure the file would be in place and condition on it in a systemd unit?

[Unit]
After=cloud-config.target
ConditionPathExists=/run/cloud-init/cloud-id-gce

I think you might have to use a systemd.path type unit because you want to get activated via inotify events on a running system, not just initial systemd boot goal compilation.

Here's what I'm thinking:

cat > your.service <<EOF
[Unit]
Description=Your file-triggered service

# Order after cloud-init is done with all it's configuration
After: cloud-config.target

# only run when on gce
ConditionPathExists=/run/cloud-init/cloud-id-gce         

# Don't run if already attached                          
ConditionPathExists=!/var/lib/ubuntu-advantage/private/machine-token.json

[Service]
Type=oneshot
ExecStart=...
TimeoutSec=0
EOF

cat > your.path <<EOF
[Unit]
Description=systemd.path monitoring for X active on GCP only
                                                                                
[Path]                                                    
# trigger systemd.unit when on GCE cloud-id
PathExists=/run/cloud-init/cloud-id-gce          
# Trigger systemd.unit when machine-token.json shows up or is removed                  
PathChanged=/var/lib/ubuntu-advantage/private/machine-token.json
Unit=your.service

[Install]   
WantedBy=multi-user.target 
EOF
systemctl enable your.path
systemctl enable your.service
systemctl reload-daemon
reboot

Also, since the content of this new file is the cloud-id as well, could the cloud-id command become just:

cat /run/cloud-init/cloud-id

? Not necessarily in this PR, but could be an easy/small performance win as a result of this feature.

@orndorffgrant thanks for the once over here. Yes I was definitely thinking we can address that optimization in a separate PR. There is a potential for inconsistency in cloud-id continuing to properly represent the 'disabled' status if the /run/cloud-init/cloud-id file was also present on the system. I was thinking we'd follow up with a separate PR for that discussion as it could lead to some refactoring that I expect will be a bit of back and forth.

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments inline, but LGTM. I think you can go ahead with integration tests.

cloudinit/sources/__init__.py Outdated Show resolved Hide resolved
prev_cloud_id_file = cloud_id_file
util.sym_link(f"{cloud_id_file}-{cloud_id}", cloud_id_file, force=True)
if prev_cloud_id_file != cloud_id_file:
util.del_file(prev_cloud_id_file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? /run/cloud-init gets removed every boot. Do we support changing a datasource post-boot? It doesn't hurt anything though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know of some AWS customers whose upgrade and deployment strategy involves executing cloud-init clean --logs; cloud-init init --locl; cloud-init init. If someone injected NoCloud seed files before upgrade I could see a stale /run/cloud-init/cloud-id-ec2 left around instead of /run/cloud-init/cloud-id-no-cloud. Just didn't want occasion to have artifactts around that are no longer correct. There also may be folks that live-migrate a VM to a sandboxed environment which may no longer have visibility to the original IMDS. If they trigger a persist_instance_data() we'd want to cleanup any no longer applicable DS in case something else reacts to it's presence.

@TheRealFalcon
Copy link
Member

There is a potential for inconsistency in cloud-id continuing to properly represent the 'disabled' status if the /run/cloud-init/cloud-id file was also present on the system.

Not sure there's an easy way around that. If I write /etc/cloud/cloud-init.disabled, until I reboot cloud-id will show disabled whereas /run/cloud-init/cloud-id will show the originally detected datasource.

@blackboxsw blackboxsw force-pushed the persist-run-cloud-id branch 2 times, most recently from 9989124 to 48ec41f Compare February 6, 2022 04:33
@blackboxsw blackboxsw removed the wip Work in progress, do not land label Feb 7, 2022
@blackboxsw blackboxsw marked this pull request as ready for review February 7, 2022 15:39
@blackboxsw blackboxsw changed the title WIP: cloud-id: publish /run/cloud-init/cloud-id-<cloud-type> files cloud-id: publish /run/cloud-init/cloud-id-<cloud-type> files Feb 7, 2022
@cpaelzer
Copy link
Collaborator

cpaelzer commented Feb 8, 2022

@blackboxsw / @orndorffgrant

PathExists=/run/cloud-init/cloud-id-gce

Trigger systemd.unit when machine-token.json shows up or is removed

PathChanged=/var/lib/ubuntu-advantage/private/machine-token.json

I think having both of these in the path unit will make it react to either of them - it isn't a logical and.
It will start on GCE in any case (PathExists=/run/cloud-init/cloud-id-gce) or at any time later when machine-token.json changes.
The conditionpathexists in the service blocks the service in other environments - that seem fine
Just to be sure is that what you wanted?

A potential issue I see could be the type=oneshot, that considers you "active" after exiting.
Could it happen that:

  1. PathExists=/run/cloud-init/cloud-id-gce triggers you, but nothing is ready yet
  2. exiting will make systemd consider your foo.service running
  3. machine-token.json changes, but since from systemds POV you already run nothing happens

This PR is about cloud-init providing that data, right now I do not have in mind how your daemon works - maybe it is enough to start and stay up? Could you ensure when trying that a late update of the json after the service has been started once already indeed does what you want?

@TheRealFalcon
Copy link
Member

@blackboxsw I have one more integration test suggestion. I think we should add a check (probably as a separate test in the test combined class) that /run/cloud-init/cloud-id exists and is a symlink.

@orndorffgrant
Copy link
Contributor

@blackboxsw / @orndorffgrant

PathExists=/run/cloud-init/cloud-id-gce
# Trigger systemd.unit when machine-token.json shows up or is removed
PathChanged=/var/lib/ubuntu-advantage/private/machine-token.json

I think having both of these in the path unit will make it react to either of them - it isn't a logical and. It will start on GCE in any case (PathExists=/run/cloud-init/cloud-id-gce) or at any time later when machine-token.json changes. The conditionpathexists in the service blocks the service in other environments - that seem fine Just to be sure is that what you wanted?

A potential issue I see could be the type=oneshot, that considers you "active" after exiting. Could it happen that:

1. PathExists=/run/cloud-init/cloud-id-gce triggers you, but nothing is ready yet

2. exiting will make systemd consider your foo.service running

3. machine-token.json changes, but since from systemds POV you already run nothing happens

This PR is about cloud-init providing that data, right now I do not have in mind how your daemon works - maybe it is enough to start and stay up? Could you ensure when trying that a late update of the json after the service has been started once already indeed does what you want?

Hi sorry I missed this comment yesterday.

Based on my testing, ua only needs

After=cloud-config.target
ConditionPathExists=/run/cloud-init/cloud-id-gce

Because cloud-inits services are oneshot (they aren't "done" until the process exits), the After is sufficient to make sure the file is in place before the system tries to start the ua service. So we don't need a .path for that.

Moving beyond the scope of this PR, but as far as the other dependency on machine-token.json, that is handled with a

ConditionPathExists=!/var/lib/ubuntu-advantage/private/machine-token.json

That is sufficient to make sure the daemon doesn't start when the machine is already attached to UA.

Instead of using a .path to detect when that file changes automagically, I'm just adding a call to start/stop the daemon appropriately when ua attach or ua detach is run.

Once a valid datasource is detected, publish the following artifacts
to expedite cloud-identification without having to invoke cloud-id from
shell scripts or sheling out from python.

These files can also be relied on in systemd ConditionPathExists
directives to limit execution of services and units to specific
clouds.

/run/cloud-init/cloud-id:
 - A symlink with content that is the canonical cloud-id of the
   datasource detected. This content is the same lower-case value
   as the output of /usr/bin/cloud-id.

/run/cloud-init/cloud-id-<canonical-cloud-id>:
 - A single file which will contain the canonical cloud-id encoded
   in the filename
instnace.read_from_file rstrips white space to integration tests
will expect dropped trailing newline.
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@TheRealFalcon TheRealFalcon merged commit 217ef6b into canonical:main Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants