Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for conditional exclusion of features (specific to a cloud) during provisioning #2293

Closed

Conversation

hemildesai
Copy link
Contributor

@hemildesai hemildesai commented Jul 23, 2023

This PR adds a backbone to conditionally exclude some unsupported features in a cloud instead of raising an error if even one of the requested features is unsupported. It accomplishes this by maintaining a per cloud map of excludable features -> exclusion condition. During provisioning, if the exclusion condition is met for any of the requested features, it excludes that feature from the corresponding check_features_are_supported. As of now, this aims to address #2249

This is a proof of concept PR and I've tested in locally (only have k8s cloud enabled). If this is a direction that we want to proceed in, I can work on adding tests for it and running existing smoke/integration tests.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@concretevitamin concretevitamin requested a review from cblmemo July 23, 2023 16:30
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is excellent! It is also beneficial when users want to exclude some minor features. IIUC, this should partly resolve #2251 as well. Left some suggestions:


if self._requested_features != granted_features:
logger.info(
f'{colorama.Fore.CYAN}The following features will be skipped since they are not supported by {to_provision.cloud.__class__.__name__} and were deemed optional : '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f'{colorama.Fore.CYAN}The following features will be skipped since they are not supported by {to_provision.cloud.__class__.__name__} and were deemed optional : '
f'{colorama.Fore.CYAN}The following features will be skipped since they are not supported by {to_provision.cloud._REPR} and were deemed optional : '

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked and print(to_provision.cloud) gives Kubernetes so just {to_provision.cloud} should be fine

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC using cloud._REPR is some code convention. cc @concretevitamin for a look

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's supposed to be used either via {to_provision.cloud} or {repr(to_provision.cloud)} to not access private attrs directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to {to_provision.cloud}

cls, requested_features: Set[CloudImplementationFeatures],
config: ExcludableFeatureCheckConfig
) -> Set[CloudImplementationFeatures]:
"""Returns the features that can be granted by the cloud implementation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if adding a comment saying the cloud.check_features_are_supported checking is moved to here 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also refactor the implementation in here?

self.check_features_are_supported(resources_required_features)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I can work on it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked on this, looks like cluster_name is not available in this method.

Comment on lines +169 to +174
lambda config: config.cluster_name.startswith(
'sky-spot-controller-')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lambda config: config.cluster_name.startswith(
'sky-spot-controller-')
lambda config: config.cluster_name == spot.SPOT_CONTROLLER_NAME

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this but it was giving a circular import error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ic... Maybe you could copy-paste the f'sky-spot-controller-{common_utils.get_user_hash()}' to here? Or if @concretevitamin has any other ideas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah happy to explicitly check for f'sky-spot-controller-{common_utils.get_user_hash()}', I'll go with whatever is the preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with either way.

I just realized this PR is merging into the k8s branch. cc @romilbhardwaj to take a look too. On the surface it seems like we should make k8s.py importing spot work (the latter should not import k8s?).

@hemildesai hemildesai force-pushed the hd/k8s-spot-no-autostop branch from 5bc3e57 to c6085dd Compare August 1, 2023 06:04
@hemildesai hemildesai force-pushed the hd/k8s-spot-no-autostop branch from c6085dd to 7daed9d Compare August 1, 2023 06:12
@romilbhardwaj romilbhardwaj deleted the branch skypilot-org:k8s_cloud August 2, 2023 10:57
@romilbhardwaj
Copy link
Collaborator

Hey @hemildesai - this PR got closed when we merged #2096 into master.

For our next sprint, we'll be switching to k8s_cloud_beta1 branch. Can you re-open this PR on that branch?

@hemildesai
Copy link
Contributor Author

Couldn't reopen so created #2348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants