-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
declare dependency on ssh-import-id (SC-864) #1334
Conversation
More context: It looks like the LXD image |
Update: It looks like the same is true for |
Looking over Given that the image you ran across didn't have openssh-server installed either, I think it's fair to expect cloud-init to not work in this scenario where a dependency is mssing and I also don't want to push ssh-import-id (and openssh-server) onto every image cloud-init is installed on as there are many use cases where ssh_import_id is not provided in user-data and there also may be use-cases where openssh-server is undesired on a slim image. Free free to disagree with these points,
2a: think we could be a bit more proactive in this scenario to inform users better of the underlying dependency issue if not sub.which('ssh-import-id'):
log.warning(f"Command not found: ssh-import-id. Unable to import user keys {ids}")
return Also add docs to cc_ssh_import_id module that this will only run on systems with ssh-import-id and openssh-server installed 2b: If ssh-import-id doesn't exist we could document that we will install that package and openssh-server if ssh_import_ids are provided in user-data. Option 2a is easy and clears up expectations instead of just logging an ugly traceback about the failure.
What do folks think? |
This is surprising to me. How else do you connect to your instance? I get that there are use cases where all non-essential services are removed and only the ports open necessary for your app, but aren't those usually heavily customized images anyway?
Our prior art looks to be ubuntu/debian modules exclusively, so installing dependencies in those modules is somewhat safer. Technically, our ssh-import-id module is debian/ubuntu specific too, but there's no reason it has to be. I'd much prefer we go the route of 2a. We could have a docs note in modules that they require an additional dependency depending on your distro, and to add that dependency to the install packages section of the user data. Is there a reason we can't move the install packages module to be earlier in our boot cycle? Some modules are already invoking it earlier than when we run it. |
It's worth noting that openssh-server recommends ssh-import-id, it doesn't depend on it. The functionality of ssh-import-id is something akin to:
This is an auxiliary to openssh-server, not a core requirement. I'd argue that similarly cloud-init should "recommend" ssh-import-id. Per https://www.debian.org/doc/debian-policy/ch-relationships.html:
If we aren't going to declare it a full dependency, this sounds right to me. On the other hand, is 1.5MB (the dependency of openssh-server + ssh-import-id) of disk even a concern in cloud deployments these days? I'm fine with saying "no we value the space right now", but I figure it's worth asking - this industry changes fast. If we'd rather not deal with the dependencies, maybe we just
I was unaware we had modules that installed packages at runtime. Agreed, that sounds fragile. I'd prefer we stay away from 2b. Currently, I think 2a + Recommends makes the most sense, but also I'm no packaging expert so feel free to tell me why I'm wrong. |
2b: If ssh-import-id doesn't exist we could document that we will install
that package and openssh-server if ssh_import_ids are provided in user-data.
I'm rather strictly against silently adding any kind of potential open door
to any system.
Gladly the discussion has kind of settled on variations of 2a.
|
This is an auxiliary to openssh-server, not a core requirement. I'd argue that similarly cloud-init should "recommend" ssh-import-id
I was first thinking the same, but let us consider the possible scenarios:
a) system has no ssh-server and thereby also not the ssh-import-id that it
recommends.
Cloud-init now brings ssh-import-id via this new dependencies ...
People think it will work (positive log message) but it won't
We consume space on disk for nothing.
b) system has ssh-server, ssh-import-id was installed as recommends -
cloud-init works fine
c) system has ssh-server but someone took explicit measures to remove or
not install it in the first place
I'd be more tempted to call this a "Suggests" dependency on cloud-init
combined with Chads suggestion of
1) harden cloud-init to report well about the problem
2) mention in the documentation that ssh-import-id and an ssh server will
need to be present
2bonus) if there is a way to ensure via cloud init config that they are
present for the unlikely case they are not, mention that as an example
there. This allows users/admins to make a choice instead of forcing it onto
them.
Finally, yes every byte counts for image size considerations.
It is more often the summary of "that bit won't hurt" than a single big
chunk.
P.S. I have no deciding voice here, but wanted to add my POV.
|
fd2fc08
to
e356062
Compare
eb682cf
to
04abd5d
Compare
@cpaelzer @TheRealFalcon @blackboxsw - Thanks for the input on this. I think it's ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit inline but approve either way
43d01d9
to
1dd6da3
Compare
@blackboxsw - Any reason we can't just walk the config checking for keys matching |
@@ -149,4 +157,21 @@ def import_ssh_ids(ids, user, log): | |||
raise exc | |||
|
|||
|
|||
# vi: ts=4 expandtab | |||
def is_key_in_nested_dict(config: dict, search_key: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable. It's probably a non-issue, but here are the costs of this generic function:
- false positives if there are
ssh_import_id
keys under paths besides "users", and "user". (This implies invalid user-data schema anyway) - this function does more work that a specific and targeted function which only looks for the supported ssh_import_id paths. (user-data is generally small for systems and a small recursive loop across cloud-config dict really doesn't cost too much)
The only alternative I could think of here would be only looking at the places where we support this key with something like the following:
if "ssh_import_id" in config:
return True
if "user" in config and isinstance(config["user"], dict):
if "ssh_import_id" in config["user"]:
return True
if "users" in config:
if isinstance(config["users"], dict):
for user, user_config:
if "ssh_import_id" in user_config:
return True
if isinstance(config["users"], list):
for user_config in config["users"]:
if "ssh_import_id" in user_config:
return True
return False
When I run timeit over the following I see the generic function is twice as long (though it's a tiny cost per run amount)
# {"ssh_import_id": "xxxx", "packages": ["sl"]}
specific 1.0499926080001387
general recursive 1.904686696000681
# d2 = {"user": {"ssh_import_id": "yep"}, "packages": ["sl"]}
specific 2.0969378739991953
general recursive 4.546597008999925
# d3 = {"users": [{"ssh_import_id": "yep"}], "packages": ["sl"]}
specific 3.2126418679999915
general recursive 6.143558423999821
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My junker timeit test
#!/usr/bin/python3
def is_ssh_import_id_set(config, key="ssh_import_id"):
if key in config:
return True
if "user" in config and isinstance(config["user"], dict):
if key in config["user"]:
return True
if "users" in config:
if isinstance(config["users"], dict):
for _username, user_config in config["users"].items():
if key in user_config:
return True
if isinstance(config["users"], list):
for user_config in config["users"]:
if key in user_config:
return True
return False
def is_key_in_nested_dict(config: dict, search_key: str) -> bool:
"""Search for key nested in config.
Note: A dict embedded in a list of lists will not be found walked - but in
this case we don't need it.
"""
for config_key in config.keys():
if search_key == config_key:
return True
if isinstance(config[config_key], dict):
return is_key_in_nested_dict(config[config_key], search_key)
if isinstance(config[config_key], list):
# this code could probably be generalized to walking the whole
# config by iterating lists in search of dictionaries
for item in config[config_key]:
if isinstance(item, dict):
return is_key_in_nested_dict(item, search_key)
return False
import timeit
d1 = {"ssh_import_id": "yep", "packages": ["sl"]}
d2 = {"user": {"ssh_import_id": "yep"}, "packages": ["sl"]}
d3 = {"users": [{"ssh_import_id": "yep"}], "packages": ["sl"]}
print("specific ", timeit.timeit('is_ssh_import_id_set(d3, "ssh_import_id")', globals=globals(), number=10000000))
print("general recursive ", timeit.timeit('is_key_in_nested_dict(d3, "ssh_import_id")', globals=globals(), number=10000000))
ef40baa
to
bec04bf
Compare
Co-authored-by: ubuntu-server-builder <blackboxsw+builder@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Assuming CI passes, should be good here.
Context:
We have a module that uses ssh-import-id. Users of ssh-import-id will want this package installed.