Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGD-3813 - limit the number of executors #8

Merged
merged 6 commits into from
Oct 11, 2023

Conversation

alextarasov-spot
Copy link

@alextarasov-spot alextarasov-spot commented Sep 27, 2023

Jira ticket

https://spotinst.atlassian.net/browse/3813

Description

We should cap the number of executors that the spark-operator processes and the size of the executors list in the SparkApplication CR status section
This will make sure that we do not DoS attack ourselves indefinitely by slowing down the SparkApplication CR processing (submitting apps and tracking driver pods is the highest priority of the spark-operator) and also that the executor list in the CR does not grow too large, causing etcd errors when we try to patch / update the CR

Demo

run the spark-operator with the --executors-processing-limit=5 argument, which will limit the number of processed executors to 5

Screen.Recording.2023-09-21.at.18.20.15.mov

Checklist

  • I have added a Jira ticket link
  • I have filled in the test plan
  • I have executed the tests and filled in the test results
  • (For release PRs) I have reviewed the data plane release instructions
  • (For release PRs) I have updated the changelog (infra/setup/CHANGELOG.md)
  • (For release PRs) I have/will create a changelog PR in the documentation repo (spotinst/help)

How to test

Description of environment setup necessary to exercise the feature and perform tests

Test plan and results

Feel free to add screenshots showing test results

Test Description Result Notes
1 Test with input A Pass Some notes about the test
2 Test with input B Pass Some notes about the test
3 Test with input C Pass Some notes about the test

@alextarasov-spot alextarasov-spot requested a review from a team as a code owner September 27, 2023 10:55
@alextarasov-spot alextarasov-spot temporarily deployed to dev September 27, 2023 11:00 — with GitHub Actions Inactive
Copy link

@arnarpall arnarpall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

main.go Outdated Show resolved Hide resolved
alextarasov-spot and others added 2 commits October 2, 2023 12:22
Co-authored-by: Sébastien Maintrot <3097030+ImpSy@users.noreply.github.com>
Co-authored-by: Sébastien Maintrot <3097030+ImpSy@users.noreply.github.com>
@alextarasov-spot alextarasov-spot temporarily deployed to dev October 2, 2023 09:27 — with GitHub Actions Inactive
Alex Tarasov added 2 commits October 3, 2023 09:51
…tors' into BGD-3813-cap-the-number-of-executors

# Conflicts:
#	pkg/controller/sparkapplication/controller.go
@alextarasov-spot alextarasov-spot temporarily deployed to dev October 3, 2023 06:58 — with GitHub Actions Inactive
@alextarasov-spot alextarasov-spot temporarily deployed to dev October 3, 2023 09:23 — with GitHub Actions Inactive
@alextarasov-spot alextarasov-spot merged commit 1aad721 into ocean-spark Oct 11, 2023
5 checks passed
@alextarasov-spot alextarasov-spot deleted the BGD-3813-cap-the-number-of-executors branch October 11, 2023 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants