Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

declare dependency on ssh-import-id (SC-864) #1334

Merged
merged 9 commits into from
Apr 19, 2022

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Mar 15, 2022

cc_ssh_import_id: Warn and skip module if missing dep

Add ssh-import-id as a suggested package dependency for debs

Context:

We have a module that uses ssh-import-id. Users of ssh-import-id will want this package installed.

@holmanb holmanb changed the title declare soft dependency on ssh-import-id declare dependency on ssh-import-id Mar 15, 2022
@holmanb holmanb changed the title declare dependency on ssh-import-id declare dependency on ssh-import-id (SC-864) Mar 15, 2022
@holmanb
Copy link
Member Author

holmanb commented Mar 15, 2022

More context:

It looks like the LXD image ubuntu:bionic includes the ssh-import-id dependency, but images:ubuntu/bionic/cloud does not.

@holmanb
Copy link
Member Author

holmanb commented Mar 15, 2022

Update: It looks like the same is true for software-properties-common (which is currently defined as Recommends by the cloud-init deb). I just ran across an image in a large cloud provider that didn't have it installed which caused cloud-init to fail to add a repo.

@blackboxsw
Copy link
Collaborator

Looking over apt-cache rdepends ssh-import-id the only package which depends on that is openssh-server.

Given that the image you ran across didn't have openssh-server installed either, I think it's fair to expect cloud-init to not work in this scenario where a dependency is mssing and I also don't want to push ssh-import-id (and openssh-server) onto every image cloud-init is installed on as there are many use cases where ssh_import_id is not provided in user-data and there also may be use-cases where openssh-server is undesired on a slim image.

Free free to disagree with these points,

  1. I don't think we want to grow this strict dependency by default for all cloud-init installed images because some images my not contain openssh-server and the only case where we would need ssh-import-id is when that specific user-data is provided to an instance.
  2. I think cloud-init should be a little more helpful either with (2a)logging to explain the failure and skip imports or (2b) actually installing the necessary ssh-import-id dependency and setting up openssh-server

2a: think we could be a bit more proactive in this scenario to inform users better of the underlying dependency issue

if not sub.which('ssh-import-id'):
    log.warning(f"Command not found: ssh-import-id. Unable to import user keys {ids}")
    return

Also add docs to cc_ssh_import_id module that this will only run on systems with ssh-import-id and openssh-server installed

2b: If ssh-import-id doesn't exist we could document that we will install that package and openssh-server if ssh_import_ids are provided in user-data.

Option 2a is easy and clears up expectations instead of just logging an ugly traceback about the failure.
Option 2b may be fragile as we could run into apt locks retries, network timeouts etc, but we do have prior art in many modules where we calling cloud.distro.install_packages to add the conditional dependencies only when specific user-data is provided to this VM.

  • cc_lxd, cc_ubuntu_advantage, cc_snap, cc_salt_minion, cc_ntp, cc_mcollective, cc_fan, cc_ubuntu_drivers etc.

What do folks think?

@TheRealFalcon
Copy link
Member

I don't think we want to grow this strict dependency by default for all cloud-init installed images because some images my not contain openssh-server and the only case where we would need ssh-import-id is when that specific user-data is provided to an instance.

This is surprising to me. How else do you connect to your instance? I get that there are use cases where all non-essential services are removed and only the ports open necessary for your app, but aren't those usually heavily customized images anyway?

Option 2b may be fragile as we could run into apt locks retries, network timeouts etc, but we do have prior art in many modules where we calling cloud.distro.install_packages to add the conditional dependencies only when specific user-data is provided to this VM.

Our prior art looks to be ubuntu/debian modules exclusively, so installing dependencies in those modules is somewhat safer. Technically, our ssh-import-id module is debian/ubuntu specific too, but there's no reason it has to be.

I'd much prefer we go the route of 2a. We could have a docs note in modules that they require an additional dependency depending on your distro, and to add that dependency to the install packages section of the user data. Is there a reason we can't move the install packages module to be earlier in our boot cycle? Some modules are already invoking it earlier than when we run it.

@holmanb
Copy link
Member Author

holmanb commented Mar 29, 2022

Looking over apt-cache rdepends ssh-import-id the only package which depends on that is openssh-server.

It's worth noting that openssh-server recommends ssh-import-id, it doesn't depend on it.

The functionality of ssh-import-id is something akin to:

curl https://github.com/$GH_USER.keys >> /home/$USER/.ssh/authorized_keys
curl https://launchpad.net/$LP_USER/+sshkeys >> /home/$USER/.ssh/authorized_keys

This is an auxiliary to openssh-server, not a core requirement. I'd argue that similarly cloud-init should "recommend" ssh-import-id.

Per https://www.debian.org/doc/debian-policy/ch-relationships.html:

Recommends

    This declares a strong, but not absolute, dependency.

    The Recommends field should list packages that would be found together with this one in all but unusual installations.

If we aren't going to declare it a full dependency, this sounds right to me.

On the other hand, is 1.5MB (the dependency of openssh-server + ssh-import-id) of disk even a concern in cloud deployments these days? I'm fine with saying "no we value the space right now", but I figure it's worth asking - this industry changes fast.

If we'd rather not deal with the dependencies, maybe we just curl $SITE.keys >> ~/.ssh/authorized_keys ourselves and don't use import-ssh-id? (half joking)

What do folks think?

I was unaware we had modules that installed packages at runtime. Agreed, that sounds fragile. I'd prefer we stay away from 2b.

Currently, I think 2a + Recommends makes the most sense, but also I'm no packaging expert so feel free to tell me why I'm wrong.

@cpaelzer
Copy link
Collaborator

cpaelzer commented Mar 30, 2022 via email

@cpaelzer
Copy link
Collaborator

cpaelzer commented Mar 30, 2022 via email

@holmanb holmanb force-pushed the holmanb/ssh-import-id-dep branch from fd2fc08 to e356062 Compare April 8, 2022 22:10
@holmanb holmanb marked this pull request as draft April 11, 2022 14:37
@holmanb holmanb force-pushed the holmanb/ssh-import-id-dep branch from eb682cf to 04abd5d Compare April 14, 2022 17:22
@holmanb holmanb marked this pull request as ready for review April 14, 2022 17:23
@holmanb
Copy link
Member Author

holmanb commented Apr 15, 2022

@cpaelzer @TheRealFalcon @blackboxsw - Thanks for the input on this.

I think it's ready for review.

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit inline but approve either way

cloudinit/config/cc_ssh_import_id.py Outdated Show resolved Hide resolved
@holmanb holmanb force-pushed the holmanb/ssh-import-id-dep branch 3 times, most recently from 43d01d9 to 1dd6da3 Compare April 16, 2022 02:40
@holmanb
Copy link
Member Author

holmanb commented Apr 18, 2022

@blackboxsw - Any reason we can't just walk the config checking for keys matching ssh_import_id? I just pushed some code that explores this approach. It appears to work under some limited manual tests.

@@ -149,4 +157,21 @@ def import_ssh_ids(ids, user, log):
raise exc


# vi: ts=4 expandtab
def is_key_in_nested_dict(config: dict, search_key: str) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable. It's probably a non-issue, but here are the costs of this generic function:

  • false positives if there are ssh_import_id keys under paths besides "users", and "user". (This implies invalid user-data schema anyway)
  • this function does more work that a specific and targeted function which only looks for the supported ssh_import_id paths. (user-data is generally small for systems and a small recursive loop across cloud-config dict really doesn't cost too much)

The only alternative I could think of here would be only looking at the places where we support this key with something like the following:

if "ssh_import_id" in config:
   return True
if "user" in config and isinstance(config["user"], dict):
   if "ssh_import_id" in config["user"]:
       return True
if "users" in config:
   if isinstance(config["users"], dict):
       for user, user_config:
           if "ssh_import_id" in user_config:
               return True
   if isinstance(config["users"], list):
       for user_config in config["users"]:
           if "ssh_import_id" in user_config:
               return True
return False

When I run timeit over the following I see the generic function is twice as long (though it's a tiny cost per run amount)

# {"ssh_import_id": "xxxx", "packages": ["sl"]}
specific  1.0499926080001387
general recursive  1.904686696000681
# d2 = {"user": {"ssh_import_id": "yep"}, "packages": ["sl"]}
specific  2.0969378739991953
general recursive  4.546597008999925

# d3 = {"users": [{"ssh_import_id": "yep"}], "packages": ["sl"]}
specific  3.2126418679999915
general recursive  6.143558423999821

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My junker timeit test

#!/usr/bin/python3

def is_ssh_import_id_set(config, key="ssh_import_id"):

   if key in config:
      return True
   if "user" in config and isinstance(config["user"], dict):
      if key in config["user"]:
          return True
   if "users" in config:
      if isinstance(config["users"], dict):
          for _username, user_config in config["users"].items():
              if key in user_config:
                  return True
      if isinstance(config["users"], list):
          for user_config in config["users"]:
              if key in user_config:
                  return True
   return False

def is_key_in_nested_dict(config: dict, search_key: str) -> bool:
    """Search for key nested in config.

    Note: A dict embedded in a list of lists will not be found walked - but in
    this case we don't need it.
    """
    for config_key in config.keys():
        if search_key == config_key:
            return True
        if isinstance(config[config_key], dict):
            return is_key_in_nested_dict(config[config_key], search_key)
        if isinstance(config[config_key], list):
            # this code could probably be generalized to walking the whole
            # config by iterating lists in search of dictionaries
            for item in config[config_key]:
                if isinstance(item, dict):
                     return is_key_in_nested_dict(item, search_key)
    return False


import timeit

d1 = {"ssh_import_id": "yep", "packages": ["sl"]}
d2 = {"user": {"ssh_import_id": "yep"}, "packages": ["sl"]}
d3 = {"users": [{"ssh_import_id": "yep"}], "packages": ["sl"]}



print("specific ", timeit.timeit('is_ssh_import_id_set(d3, "ssh_import_id")', globals=globals(), number=10000000))
print("general recursive ", timeit.timeit('is_key_in_nested_dict(d3, "ssh_import_id")', globals=globals(), number=10000000))

@holmanb holmanb force-pushed the holmanb/ssh-import-id-dep branch from ef40baa to bec04bf Compare April 18, 2022 19:35
Co-authored-by: ubuntu-server-builder <blackboxsw+builder@gmail.com>
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Assuming CI passes, should be good here.

@holmanb holmanb merged commit 437cb0a into canonical:main Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants