Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP controller <-> scheduler coordination on server status updates #5886

Draft
wants to merge 1 commit into
base: v2
Choose a base branch
from

Conversation

lc525
Copy link
Member

@lc525 lc525 commented Sep 10, 2024

Work in progress for better state sharing between controller and scheduler,
taking into account possible failures on both ends.

@lc525 lc525 requested a review from sakoush as a code owner September 10, 2024 09:41
@lc525 lc525 added the v2 label Sep 10, 2024
@lc525 lc525 marked this pull request as draft September 10, 2024 09:45
@lc525 lc525 force-pushed the shared/server-status-updates branch from 29f3945 to 998bb9a Compare September 10, 2024 09:55
@lc525
Copy link
Member Author

lc525 commented Sep 12, 2024

This PR will not be merged. It is used just to share in-progress code, and will be closed when the equivalent functionality has been implemented through other PRs.

sakoush added a commit to sakoush/seldon-core that referenced this pull request Sep 17, 2024
sakoush added a commit that referenced this pull request Sep 24, 2024
* Start envoy xDS server last

* Orginise starter cmd for scheduler

* Add synchroniser interface and simple timerbased impl

* Separate out logic when agent connects

* Integrate simple synchroniser in code

* Add first sync option to ServerNotify

* Adjust controller to set isFirstSync

* Allow servernotify initial sync to start the sync process in scheduler

* Rename start to signal in synchroniser

* fFx test

* Add test for ServerNotify

* Change interface to allow for number of signals to wait for

* Changes from #5886 (to include server events)

* Add testing for servers (and other events) in hub

* Add test for AddServerReplica

* Add server sync impl

* Add test for server sync

* Add more testing coverage

* Add logging

* Wireup server changes in starter cmd

* Add extra logging

* Skip logging if not required

* Start timer from the begining.

* Tidy up logic and add more tests

* Wire up simple sync for the non k8s case

* Use waut group instead of a channel for sync

* Tidy up log messages

* Lint fixes

* Set default timeout for scheduler readiness in docker-compose setup

* Add explicit envar for scheduler ready timeout (compose)

* Fix lint

* Fix test

* Add architecture design at the start of the file for server sync

* Add new line

* Tidy up name in test

* Add parametrisation for helm for SCHEDULER_READY_TIMEOUT_SECONDS

* Add log message for variable

* Add note why xDS starts last

* Add extra wait for routes to be established.

* Tidy up event hub code

* Tidy up event handling code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant