What defines a session? #3

gmawdo · 2022-01-23T08:22:40Z

gmawdo
Jan 23, 2022

Baseline: Currently there is a test cpp kubernetes service deployed fronted with an oauth2 authentication service (keycloak being the provider used).

The service can be scaled and in a small limited trial where there are three active pods supporting the service, each of three independent users gets allocated a spare pod automatically. This CANNOT be relied upon and for good reason.

Che workspaces at workspace launch time manufacture a specific pod for a user defined workspace. This approach has lengthy startup ttimes and IMO does not allow for the full scalability capability of kubernetes.

IMO the general problem Kubernetes solves at the service level is how to multiplex n user through the compute of m pods supporting a service where m can be less than n. Very typical of course for things like websites and webservices and non sticky sessions.

I believe we have to follow the same model for maximum scalability and performance at the right price point.

I believe the key is understanding what the extent of the Theia state data is, and where it resides, and how it can be tracked (and serialised) across kubernetes management events. A persistent volume may play a part in this; initially I think it is important to understand what state is and the options to

get a handle on it
manage it across pod placements (e.g over the course of a Theia session my pod may change at the 'whim' of kubernetes). Typically I found it changed when I had a break. I need to supply more scientific evidence.

So fundamentally I think we ask ourselves: what if we can get a handle on the state of a Theia session front and back (back most importantly) allowing nothing to be lost when kubernetes detects we are idle and gives another service user the compute.

So at login there is a 'session' and there is also persistence in the form of git. So the session has a history of the workspace, e.g terminals in use and command history, changes being tracked by git etc. As I say a persistent volume may help in the first instance but understanding how to acess this transitory data then other serialisation options may be more appropriate from a performance perspective

koegel · 2022-01-28T15:56:37Z

koegel
Jan 28, 2022

Sorry for the long delay.
The state of a Theia backend for an arbitrary Theia-based application is quite arbitrary. It could be anything you could do in the container, e.g. launch other processes. Essentially the session state is the memory used by the container. IMHO in the generic case of arbitrary Theia-based applications it is not practical to serialize this state.
The story is a little different if we think about one particular Theia-based application. We could try to get all relevant parts of the application to serialize their state to the workspace. This would help with two things: Make it less noticeable for the developer that the Pod restarted and improve startup time, e.g. until next compilation cpmpletes. For example clangd can serialize its AST to a filesystem, this will improve the performance of the next compile run or the Theia-frontend serializes the state of the window arrangement. The problem with all this is that it requires us to modify the Theia-based application and cannot be done generically for any Theia-based application.
So all in all it seems to me that we have to fight Kubernetes here and there a bit.

General observation: Contrary to cloud-computing with micro services the Theia backend is a monolithic application per user consisting of potentially many processes running co-located and persisting state into a shared local filesystem (workspace). This makes it difficult to scale Theia like other applications. For example there is no remote LSP-service that a Theia-Pod could connect to that encapsulates the service of providing compile errors and auto-completions. The LSP server typically runs locally in the Pod of the user to be able to access the current (potentially unique) state of the code (potentially not yet pushed to Git). The underlying problem is that we basically reuse tooling that was never intended to run in a cloud context. IMHO this will take another decade to be fixed :)

3 replies

gmawdo Feb 1, 2022
Author

interesting....I need to work through this and tie it in with my experiment:
I'm running some experiments at the moment whereby I scaled a Kubernetes cpp theia service to three nodes. The service has the oauth2 authentication in front of it and there are three users defined in Keycloak (the oauth2 provider)
the experiment is to understand the chaos and data leaks that could occur across three differing users using the system concurrently and at different times; and further, to understand the locations that are used to manage state (client,server,browser)
so far surprisingly workable - clearly could be luck at this stage as I need to account for what I witness

gmawdo Feb 3, 2022
Author

An initial discovery: I do see Theia related local storage entries in my browser sessions and this accounts for the 'memory' I'm witnessing as my Theia instance is moved across pods.....I think it is an intrinsic service that Theia extensions can call upon:
https://github.com/eclipse-theia/theia/blob/master/packages/core/src/browser/storage-service.ts

koegel Feb 3, 2022

This does make sense!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What defines a session? #3

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

What defines a session? #3

gmawdo Jan 23, 2022

Replies: 1 comment · 3 replies

koegel Jan 28, 2022

gmawdo Feb 1, 2022 Author

gmawdo Feb 3, 2022 Author

koegel Feb 3, 2022

gmawdo
Jan 23, 2022

Replies: 1 comment 3 replies

koegel
Jan 28, 2022

gmawdo Feb 1, 2022
Author

gmawdo Feb 3, 2022
Author