-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Avoid uploading public key #2119
Conversation
2fe9617
to
07ee384
Compare
author Zhanghao Wu <zhanghao.wu@outlook.com> 1687057727 -0700 committer Zhanghao Wu <zhanghao.wu@outlook.com> 1687488369 -0700 parent ca9fed1 author Zhanghao Wu <zhanghao.wu@outlook.com> 1687057727 -0700 committer Zhanghao Wu <zhanghao.wu@outlook.com> 1687488294 -0700 Add document for minimal permission start permission Fix permission check note about the service account setup Add missing permissions Add missing permissions add permissions for tpu Update docs/source/cloud-setup/cloud-permissions.rst Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> Update docs/source/cloud-setup/cloud-permissions.rst Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> address comments
fix lambda Use ssh-key metadata for instance fix ssh for lambda format revert lambda remove unused permission
f71e185
to
c33d3f5
Compare
…-public-key-uploading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Michaelvll, left some questions. We should probably do back compat + smoke.
@@ -50,7 +50,7 @@ def bootstrap_instance_config(self, node_config): | |||
instance_config['serverType'] = node_config['InstanceType'] | |||
instance_config['contractId'] = "None" | |||
instance_config['initialScript'] = self._get_vm_init_script( | |||
node_config['auth']['ssh_public_key']) | |||
node_config['AuthorizedKey']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we pick AuthorizedKey on our own, or is it a convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we log into a remote linux server, it will look for its local ~/.ssh/authorized_keys
file. I guess it is more accurate to use AuthorizedKey
here, instead of the ssh_public_key
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is it seems a bit weird that the different *.j2 use different field names for this.
One option is to conform to AWS's j2, i.e., cloud-init's "ssh_authorized_keys" field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is that we want to keep the naming style the same as the other keys in the yaml file. For example, in oci-ray.yml.j2
, all the keys are camel-case. Having a snake-case ssh_authorized_keys
among them might be a bit weird?
…-public-key-uploading
Thanks for the review @concretevitamin! Just tested it and it passed the smoke tests. For backward compatibility, the changes in this PR will not affect the existing clusters, as the config restoring in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, some minor questions.
sky/authentication.py
Outdated
"""Module to enable a single SkyPilot key for all VMs in each cloud.""" | ||
"""Module to enable a single SkyPilot key for all VMs in each cloud. | ||
|
||
The `setup_<cloud>_authentication` functions will be called after the ray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `setup_<cloud>_authentication` functions will be called after the ray | |
The `setup_<cloud>_authentication` functions will be called on every new cluster provision request. | |
Specifically, after the ray yaml template file `<cloud>-ray.yml.j2` is filled in with resource specific | |
information, these functions are called with the filled in ray yaml config as input, |
Please double check if "new cluster provision request" is correct, e.g., what about launch -c existing
or start
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions will be called no matter whether the cluster exists or not. I applied the change, and changed the wording a bit. We can probably fix when the functions should be called in a future PR. : )
@@ -50,7 +50,7 @@ def bootstrap_instance_config(self, node_config): | |||
instance_config['serverType'] = node_config['InstanceType'] | |||
instance_config['contractId'] = "None" | |||
instance_config['initialScript'] = self._get_vm_init_script( | |||
node_config['auth']['ssh_public_key']) | |||
node_config['AuthorizedKey']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is it seems a bit weird that the different *.j2 use different field names for this.
One option is to conform to AWS's j2, i.e., cloud-init's "ssh_authorized_keys" field?
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @Michaelvll!
…-public-key-uploading
Fixes #2025
Uploading the
sky-key.pub
to the~/.ssh/sky-key.pub
will cause problem for spot controller. This PR gets rid of the public_key upload, except for the Lambda, as it is hardcoded and hard to modify at the moment.This also reduces the permission for the GCP account to launch a VM, as we no longer need to set the metadata for the key pairs.
Tested (run the relevant ones):
bash format.sh
sky launch --cloud aws -c test-aws --num-nodes 2 -i 0 echo hi; sleep 90; sky status -r test-aws
sky launch --cloud gcp -c test-gcp --num-nodes 2 -i 0 echo hi; sleep 90; sky status -r test-gcp
with a new pair of keyssky launch --cloud gcp -c test-gcp -i 0; sky status -r
without the setCommonInstanceMetadata permission.sky launch --cloud azure --num-nodes 2 -c test-azure -i 0 echo hi; sleep 120; sky status -r test-azure
sky launch --cloud oci --num-nodes 2 --cpus 2 -c test-oci -i 0 echo hi; sleep 90; sky status -r test-oci
sky launch --cloud lambda -c test-lambda -i 0 --num-nodes 2 --down echo hi; sleep 90; sky status -r test-lambda
sky launch --cloud ibm --num-nodes 2 -c test-ibm -i 0 echo hi; sleep 90; sky status -r test-ibm
sky spot launch --cpus 2 --cloud gcp echo hi
with Azure to be the spot controller [Spot/Auth] Public key should not be uploaded to the remote VM #2025sky launch --cloud scp -c test-scp -i 0 echo hi; sleep 90; sky status -r test-scp
(Failed due to test account not able to launch new instance)pytest tests/test_smoke.py
(except [Examples] TF training code for ResNet no longer working #2207)