Replies: 1 comment 3 replies
-
Sorry for the long delay. General observation: Contrary to cloud-computing with micro services the Theia backend is a monolithic application per user consisting of potentially many processes running co-located and persisting state into a shared local filesystem (workspace). This makes it difficult to scale Theia like other applications. For example there is no remote LSP-service that a Theia-Pod could connect to that encapsulates the service of providing compile errors and auto-completions. The LSP server typically runs locally in the Pod of the user to be able to access the current (potentially unique) state of the code (potentially not yet pushed to Git). The underlying problem is that we basically reuse tooling that was never intended to run in a cloud context. IMHO this will take another decade to be fixed :) |
Beta Was this translation helpful? Give feedback.
-
Baseline: Currently there is a test cpp kubernetes service deployed fronted with an oauth2 authentication service (keycloak being the provider used).
The service can be scaled and in a small limited trial where there are three active pods supporting the service, each of three independent users gets allocated a spare pod automatically. This CANNOT be relied upon and for good reason.
Che workspaces at workspace launch time manufacture a specific pod for a user defined workspace. This approach has lengthy startup ttimes and IMO does not allow for the full scalability capability of kubernetes.
IMO the general problem Kubernetes solves at the service level is how to multiplex n user through the compute of m pods supporting a service where m can be less than n. Very typical of course for things like websites and webservices and non sticky sessions.
I believe we have to follow the same model for maximum scalability and performance at the right price point.
I believe the key is understanding what the extent of the Theia state data is, and where it resides, and how it can be tracked (and serialised) across kubernetes management events. A persistent volume may play a part in this; initially I think it is important to understand what state is and the options to
So fundamentally I think we ask ourselves: what if we can get a handle on the state of a Theia session front and back (back most importantly) allowing nothing to be lost when kubernetes detects we are idle and gives another service user the compute.
So at login there is a 'session' and there is also persistence in the form of git. So the session has a history of the workspace, e.g terminals in use and command history, changes being tracked by git etc. As I say a persistent volume may help in the first instance but understanding how to acess this transitory data then other serialisation options may be more appropriate from a performance perspective
Beta Was this translation helpful? Give feedback.
All reactions