Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an ability to configure health checks for container in Workpsace.Next ChePlugin #10273

Closed
garagatyi opened this issue Jul 4, 2018 · 7 comments
Labels
kind/task Internal things, technical debt, and to-do tasks to be performed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@garagatyi
Copy link

Description

Right now we use hardcoded health checks, but with the Workspace.Next we need to make them configurable

Reproduction Steps

OS and version:

Diagnostics:

@l0rd l0rd mentioned this issue Jul 4, 2018
24 tasks
@garagatyi garagatyi added kind/task Internal things, technical debt, and to-do tasks to be performed. team/osio labels Jul 10, 2018
@l0rd l0rd mentioned this issue Aug 3, 2018
57 tasks
@skabashnyuk skabashnyuk self-assigned this Oct 1, 2018
@sleshchenko
Copy link
Member

sleshchenko commented Oct 1, 2018

@garagatyi
The title says about containers health checks while the description contains an information that right now we use hardcoded health checks and actually we don't have any health checks for containers but we have for Che Servers(ws-agent, terminal, exec). Correct me If I'm wrong.
So, could you please clarify whether we should:

  1. Introduce health checks for plugins containers;
  2. Introduce health checks for plugins endpoints since they will be transformed to Che Server and it would be useful to has actual statuses there, especially for Workspace Loader, when it waits for Editor Endpoint before opening it in an iframe.
  3. For both of Plugins Containers and Endpoints.

@garagatyi
Copy link
Author

I think we need to add container health checks. We couldn't do that in Che 6 because each container could have several apps because of installers. In Che 7 we should put 1 app per container, so container liveness checks would be enough. Apart from that, we would be able to reuse existing functionality of k8s/OS/Docker.
In any case, since WS.NEXT flow is not implemented for docker we should add liveness checks for containers only.
I'm talking about Plugin sidecar containers only. In case a user needs health checks in workspace recipe he can use native health check for the recipe type.

Health checks (liveness probes in k8s terms) can check app running in a container, not container itself, so they can check app state. If app state is OK then I consider as fare to set statuses of all the servers of that container as RUNNING. Maybe later we can remove those statuses at all.

@BarryDrez
Copy link

@garagatyi For custom stacks, could this be something the user can configure in their recipe - e.g.,

"recipe": {
  "type": "kubernetes",
  "content": "kind: List\nitems:\n - \n  kind: Service\n  apiVersion: v1\n  metadata:\n   name: isservice\n  spec:\n   selector:\n    name: IS103\n   ports:\n    - \n     name: isadmin\n     protocol: TCP\n     port: 5555\n     targetPort: 5555\n - \n  kind: Pod\n  apiVersion: v1\n  metadata:\n   name: is103\n  spec:\n   containers:\n    - \n     image: 'daerepository03.eur.ad.sag:4443/design-server/is:10.3.0.0xa'\n     name: integrationserver\n     ports:\n      - \n       containerPort: 5555\n       protocol: TCP\n     resources:\n      limits:\n       memory: 2048Mi\n     livenessProbe:\n       failureThreshold: 11\n       initialDelaySeconds: 5\n       periodSeconds: 5\n       successThreshold: 1\n       tcpSocket:\n         port: 5555\n       timeoutSeconds: 45\n     readinessProbe:\n       failureThreshold: 10\n       initialDelaySeconds: 20\n       periodSeconds: 5\n       successThreshold: 1\n       tcpSocket:\n         port: 5555\n       timeoutSeconds: 120\n",
  "contentType": "text/x-yaml"
}

Formatted content:

kind: List
items:
 - 
  kind: Service
  apiVersion: v1
  metadata:
   name: isservice
  spec:
   selector:
    name: IS103
   ports:
    - 
     name: isadmin
     protocol: TCP
     port: 5555
     targetPort: 5555
 - 
  kind: Pod
  apiVersion: v1
  metadata:
   name: is103
  spec:
   containers:
    - 
     image: 'daerepository03.eur.ad.sag:4443/design-server/is:10.3.0.0xa'
     name: integrationserver
     ports:
      - 
       containerPort: 5555
       protocol: TCP
     resources:
      limits:
       memory: 2048Mi
     livenessProbe:
       failureThreshold: 11
       initialDelaySeconds: 5
       periodSeconds: 5
       successThreshold: 1
       tcpSocket:
         port: 5555
       timeoutSeconds: 45
     readinessProbe:
       failureThreshold: 10
       initialDelaySeconds: 10
       periodSeconds: 3
       successThreshold: 1
       tcpSocket:
         port: 5555
       timeoutSeconds: 2

I have tried this, but it does not work.

@garagatyi
Copy link
Author

@BarryDrez if you are talking about user recipe then it should be already supported. @sleshchenko correct me if I'm mistaken.

Original suggestion was about configuring health checks for IDE plugins. Such configuration should be similar to what is defined in the k8s deployment

@sleshchenko
Copy link
Member

@garagatyi You're right, it should be supported.
@BarryDrez

I have tried this, but it does not work.

What the error? Could you provide workspace related Deployment that is created by Che Server?
It would be better if you create a dedicated issue and we'll continue your problem investigation there. Thanks.

@BarryDrez
Copy link

@garagatyi, @sleshchenko Thank you for clarifying this. I have done some more experimenting with my liveness and readiness probes, and it looks like I needed to add a longer delay for the liveness probe. If this still looks like a bug, I will open a new issue as you suggest, but it is beginning to look like it is working well (as designed).

@che-bot
Copy link
Contributor

che-bot commented Sep 17, 2019

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 17, 2019
@che-bot che-bot closed this as completed Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/task Internal things, technical debt, and to-do tasks to be performed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants