Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When creating new workspace, UI is stuck on "Loading...", though workspace is successfully created #10501

Closed
magbj opened this issue Jul 20, 2018 · 10 comments
Labels
kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@magbj
Copy link

magbj commented Jul 20, 2018

Description

After selecting to create a workspace, either selecting "Open IDE" from the creation screen, or selecting the workspace, the UI is stuck on showing "Loading...".

The workspace is still being successfully created in the background though. As soon as it is in a running state, I can always bring it up by reloading the browser. The UI does not discover the change in state though, and brings things up automatically.

Earlier on in my testing, I used to get an IDE screen that was "disconnected", and when the workspace was sufficiently provisioned the status and bootstrap log started to show. I am not sure what changed to not make that appear anymore, and just showing the loading page.

This is what is looks like hanging:

image

If I do a browser refresh, everything comes up:

image

Deployed into AWS in a private subnet (1 AZ/subnet), running on top of AWS EKS/Kubernetes
ELB for incoming traffic. HTTPS with do SSL termination on Nginx.
Using Nginx Ingress Controller (0.16.2).
Keycloak version: 3.4.3.Final
Eclipse Che version: 6.8.0-SNAPSHOT
Created a storage class for EBS/GP2
All ports/IPs are accessible within private subnet

I see the same behavior in both Chrome and Firefox.

This is what I see in the browser (Chrome) console:

image

@garagatyi
Copy link

As far as I understand the issue might be in how kubernetes infra works and latest changes in the tracking of workspace status.
Then, I believe, that this issue #10365 should describe the same problem you have. And it is in progress now.

@ghost
Copy link

ghost commented Jul 24, 2018

@magbj @garagatyi yes, it's a known problem with Che on K8S.

@ghost ghost added the kind/question Questions that haven't been identified as being feature requests or bugs. label Jul 24, 2018
@garagatyi garagatyi added kind/bug Outline of a bug - must adhere to the bug report template. and removed kind/question Questions that haven't been identified as being feature requests or bugs. labels Jul 24, 2018
@magbj
Copy link
Author

magbj commented Jul 24, 2018

@garagatyi @eivantsov

It sounds great that you are adding in the reconnect ability for WS.

In the mean time, by setting this for nginx ingress controller deployment:

  • --watch-namespace=$(POD_NAMESPACE)
  • --force-namespace-isolation=true
  • --enable-dynamic-configuration=true

I was able to make it automatically switch from "Loading..." to the IDE, though it did not use the nicer process of the disconnected IDE.

@garagatyi
Copy link

@magbj an interesting workaround! But doesn't it mean that we would require to change k8s infrastructure configuration to run Che on it?

@ramene
Copy link

ramene commented Sep 5, 2018

I've managed to stand up CHE whilst bumping postgres to 10.5 with a similar setup to @magbj and running into same ingress issues with workspaces along with websocket connection issues also noted by @magbj

Deployed on AWS in a private subnet across 3 AZ's using eksctl create cluster, running on top of AWS EKS/Kubernetes
AWS ALB for incoming traffic, HTTP only
I opted out of using NGINX ingress controller due to numerous issues
Keycloak version: 4.2.1.Final - From stable/keycloak
Eclipse Che version: 6.11.0-SNAPSHOT
Created a storage class using EFS

index.ts:145 Error: Failed to run the workspace: "Waiting for ingress 'ingressle14ialw' reached timeout"
    at index.ts:307
    at che-json-rpc-master-api.ts:198
    at json-rpc-client.ts:191
    at Array.forEach (<anonymous>)
    at e.processNotification (json-rpc-client.ts:190)
    at e.processResponse (json-rpc-client.ts:177)
    at json-rpc-client.ts:94
    at websocket-client.ts:107
    at Array.forEach (<anonymous>)
    at e.callHandlers (websocket-client.ts:107)
(anonymous) @ index.ts:145
vendor-73784b1bca.js:28856 

I think this may point back to my own mishandling of ALB rules and requisite annotations; but even after making the necessary updates to subnets, it fails shortly thereafter

Before annotations:

4m          40m          13        che-ingress.15513722adda5a3f            Ingress                             Warning   ERROR                   aws-alb-ingress-controller                               error parsing annotations: Retrieval of subnets failed to resolve 2 qualified subnets. Subnets must contain the kubernetes.io/cluster/<cluster name> tag with a value of shared or owned and the kubernetes.io/role/internal-elb tag signifying it should be used for ALBs Additionally, there must be at least 2 subnets with unique availability zones as required by ALBs. Either tag subnets to meet this requirement or use the subnets annotation on the ingress resource to explicitly call out what subnets to use for ALB creation. The subnets that did resolve were [].

After annotations:

12m         13m          2         che-ingress.15513b806be04b34   Ingress               Normal    UPDATE    aws-alb-ingress-controller   Ingress default/che-ingress
13m         13m          1         che-ingress.15513b845eae6530   Ingress               Normal    CREATE    aws-alb-ingress-controller   6d494ba2-default-cheingres-b25b created
13m         13m          2         che-ingress.15513b84912da5e5   Ingress               Normal    CREATE    aws-alb-ingress-controller   6d494ba2-02b495ee1e76880abde target group created
13m         13m          1         che-ingress.15513b84b7caa419   Ingress               Normal    CREATE    aws-alb-ingress-controller   80 listener created
13m         13m          1         che-ingress.15513b84b93efaa3   Ingress               Normal    CREATE    aws-alb-ingress-controller   1 rule created
12m         12m          1         che-ingress.15513b896375cbaf   Ingress               Normal    MODIFY    aws-alb-ingress-controller   6d494ba2-default-cheingres-b25b tags modified
$ kubectl describe po -n kube-system alb-ingress-controller-5596d9bf8-lk7hm
Name:           alb-ingress-controller-5596d9bf8-lk7hm
Namespace:      kube-system
Node:           ip-192-168-118-34.us-west-2.compute.internal/192.168.118.34
Start Time:     Mon, 03 Sep 2018 15:43:03 -0400
Labels:         app=alb-ingress-controller
                pod-template-hash=115285694
Annotations:    <none>
Status:         Running
IP:             192.168.124.250
Controlled By:  ReplicaSet/alb-ingress-controller-5596d9bf8
Containers:
  server:
    Container ID:  docker://96f7d1e877276743c8d39581f55870369d08c1b8e0ae0413fdf6b4b9b552702e
    Image:         quay.io/coreos/alb-ingress-controller:1.0-beta.6
    Image ID:      docker-pullable://quay.io/coreos/alb-ingress-controller@sha256:1c934a32ee5e3aad925dbe0ff37cb50ae04d99e33c4d878186d603c1901ad644
    Port:          <none>
    Host Port:     <none>
    Args:
      /server
      --ingress-class=alb
      --cluster-name=eclipse-che
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Tue, 04 Sep 2018 21:30:38 -0400
      Finished:     Tue, 04 Sep 2018 21:30:40 -0400
    Ready:          False
    Restart Count:  14
    Environment:
      AWS_REGION:             us-west-2
      POD_NAME:               alb-ingress-controller-5596d9bf8-lk7hm (v1:metadata.name)
      POD_NAMESPACE:          kube-system (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from alb-ingress-token-hwvl7 (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  alb-ingress-token-hwvl7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  alb-ingress-token-hwvl7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                 From                                                   Message
  ----     ------   ----                ----                                                   -------
  Normal   Pulling  46m (x5 over 1d)    kubelet, ip-192-168-118-34.us-west-2.compute.internal  pulling image "quay.io/coreos/alb-ingress-controller:1.0-beta.6"
  Normal   Pulled   46m (x5 over 1d)    kubelet, ip-192-168-118-34.us-west-2.compute.internal  Successfully pulled image "quay.io/coreos/alb-ingress-controller:1.0-beta.6"
  Normal   Created  46m (x5 over 1d)    kubelet, ip-192-168-118-34.us-west-2.compute.internal  Created container
  Normal   Started  46m (x5 over 1d)    kubelet, ip-192-168-118-34.us-west-2.compute.internal  Started container
  Warning  BackOff  3m (x201 over 48m)  kubelet, ip-192-168-118-34.us-west-2.compute.internal  Back-off restarting failed container
(reverse-i-search)`': clear && kubectl get events --sort-by=.metadata.creationTimestamp
 $ kubectl logs -n kube-system $(kubectl get po -n kube-system | egrep -o alb-[a-zA-Z0-9-]+)
-------------------------------------------------------------------------------
AWS ALB Ingress controller
  Release:    1.0-beta.6
  Build:      git-f740c293
  Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller
-------------------------------------------------------------------------------

I0905 01:30:38.069833       1 flags.go:132] Watching for Ingress class: alb
W0905 01:30:38.070148       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0905 01:30:38.070387       1 main.go:159] Creating API client for https://10.100.0.1:443
I0905 01:30:38.085917       1 main.go:203] Running in Kubernetes cluster version v1.10 (v1.10.3) - git (clean) commit 2bba0127d85d5a46ab4b778548be28623b32d0b0 - platform linux/amd64
I0905 01:30:38.086664       1 alb.go:85] ALB resource names will be prefixed with 6d494ba2
I0905 01:30:38.094112       1 alb.go:158] Starting AWS ALB Ingress controller
I0905 01:30:39.295400       1 leaderelection.go:185] attempting to acquire leader lease  kube-system/ingress-controller-leader-alb...
I0905 01:30:39.305289       1 leaderelection.go:194] successfully acquired lease kube-system/ingress-controller-leader-alb
I0905 01:30:39.305322       1 status.go:152] new leader elected: alb-ingress-controller-5596d9bf8-lk7hm
I0905 01:30:39.540996       1 albingresses.go:77] Building list of existing ALBs
I0905 01:30:39.719512       1 albingresses.go:85] Fetching information on 1 ALBs
E0905 01:30:40.076241       1 albingresses.go:211] Failed to find related managed instance SG. Was it deleted from AWS? Error: Didn't find exactly 1 matching (managed) instance SG. Found 3
I0905 01:30:40.076929       1 albingresses.go:101] Assembled 0 ingresses from existing AWS resources in 535.905766ms
F0905 01:30:40.076977       1 albingresses.go:103] Assembled 0 ingresses from 1 load balancers
$ kubectl version --short
Client Version: v1.10.3
Server Version: v1.10.3

$ eksctl version
2018-09-04T21:24:25-04:00 [ℹ]  versionInfo = map[string]string{"builtAt":"2018-08-31T14:44:02Z", "gitCommit":"0578d6cd44d8c5a4ebe17825db882ad194f0bee4", "gitTag":"0.1.1"}

$ helm version --short
Client: v2.10.0+g9ad53aa
Server: v2.10.0+g9ad53aa

Ultimately, I've not been able to get as far as @magbj, noted #10370 but I'm pretty close; I guess I have to take a step back here and ask does CHE still have issues running atop K8S(EKS) or has anyone managed to get this working end-to-end in a live environment? Locally, ofcourse, it all runs beautifully.

@antonbabenko
Copy link

I think the problem is related to security group rules. Try to do what is described here - kubernetes-sigs/aws-load-balancer-controller#236 (comment)

@garagatyi
Copy link

@magbj issue #10365 is fixed, so, hopefully, you won't need any additional configuration to have IDE loading by after a workspace start. How do you think can we close the issue?

@ramene your issue with the volumes is not related to the topic. If you still looking for help from the community I would recommend opening a new issue and describe your difficulties there. Maybe @magbj would be able to help here since it seems that he managed to run Che on a similar setup successfully.

@ramene
Copy link

ramene commented Sep 7, 2018

Thanks @antonbabenko, I'd not seen the issue you referenced; I'll stand it back up and futz with the SG's and report back.

@garagatyi, I'm eager to get to the point of the IDE loading after a workspace start; and my apologies for lumping multiple issues together, I'll remove this bit and open a new issue with AWS EFS and storageclasses accordingly. I may simply have to default to use the same storage class as @magbj. I appreciate you chiming in nonetheless.

@garagatyi
Copy link

@ramene NP, just trying to split issues since it helps in not mixing discussions

@che-bot
Copy link
Contributor

che-bot commented Sep 7, 2019

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2019
@che-bot che-bot closed this as completed Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants