Skip to content

Commit

Permalink
Add troubleshooting guide for notebooks (kubeflow#1008)
Browse files Browse the repository at this point in the history
* feat: add template of troubleshooting guide for notebooks

* feat: improve the notebooks troubleshooting guide

* fix: improve the note for GCP users

* fix: minor bug in the link

* fix: improve the troubleshooting guide

* fix: fix troubleshooting guides
  • Loading branch information
Tabrizian authored and k8s-ci-robot committed Aug 2, 2019
1 parent 3bc674e commit 1dd668e
Show file tree
Hide file tree
Showing 3 changed files with 99 additions and 31 deletions.
4 changes: 3 additions & 1 deletion content/docs/notebooks/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -265,4 +265,6 @@ exposed to the internet and is an unsecured endpoint by default.
* Learn the advanced features available from a Kubeflow notebook, such as
[submitting Kubernetes resources](/docs/notebooks/submit-kubernetes/) or
[building Docker images](/docs/notebooks/submit-docker-image/).

* Visit the [troubleshooting guide](/docs/notebooks/troubleshoot) for fixing common
errors in creating Jupyter notebooks in Kubeflow

95 changes: 95 additions & 0 deletions content/docs/notebooks/troubleshoot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
+++
title = "Troubleshooting Guide"
description = "Fixing common problems in Kubeflow notebooks"
weight = 50
+++

## Persistent Volumes and Persistent Volumes Claims

First, make sure that PVCs are bounded when using Jupter notebooks. This should
not be a problem when using managed Kuberenetes. But if you are using Kubernetes
on-prem, check out the guide to [Kubeflow on-prem in a multi-node Kubernetes cluster](/docs/use-cases/kubeflow-on-multinode-cluster/) if you are running Kubeflow in multi-node on-prem environment. Otherwise, look at the [Pods stuck in Pending State](/docs/other-guides/troubleshooting/#pods-stuck-in-pending-state) guide to troubleshoot this problem.

## Check the status of notebooks

Run the commands below.

```
kubectl get notebooks -o yaml ${NOTEBOOK}
kubectl describe notebooks ${NOTEBOOK}
```

Check the `events` section to make sure that there are no errors.

## Check the status of statefulsets

Make sure that the number of `statefulsets` equals the desired number. If it is
not the case, check for errors using the `kubectl describe`.


```
kubectl get statefulsets -o yaml ${NOTEBOOK}
kubectl describe statefulsets ${NOTEBOOK}
```


The output should look like below:
```
NAME DESIRED CURRENT AGE
your-notebook 1 1 9m4s
```
## Check the status of Pods

If the number of statefulsets didn't match the desired number, make sure that
the number of Pods match the number of desired Pods in the first command.
In case it didn't match, follow the steps below to further investigate the issue.

```
kubectl get pod -o yaml ${NOTEBOOK}-0
```

* The name of the Pod should start with `jupter`
* If you are using username/password auth with Jupyter the pod will be named

```
jupyter-${USERNAME}
```

* If you are using IAP on GKE the pod will be named

```
jupyter-accounts-2egoogle-2ecom-3USER-40DOMAIN-2eEXT
```
* Where USER@DOMAIN.EXT is the Google account used with IAP

Once you know the name of the pod do

```
kubectl describe pod ${NOTEBOOK}-0
```

* Look at the `events` to see if there are any errors trying to schedule the pod
* One common error is not being able to schedule the pod because there aren’t enough resources in the cluster.


If the error still persisted, check for the errors in the logs of containers.

```
kubectl logs ${NOTEBOOK}-0
```

## Note for GCP Users

You may encounter error below:
```
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 2m19s (x26 over 7m39s) statefulset-controller create Pod test1-0 in StatefulSet test1 failed error: pods "test1-0" is forbidden: error looking up service account kubeflow/default-editor: serviceaccount "default-editor" not found
```

To fix this problem, create a service account named `default-editor` with cluster-admin role.

```
kubectl create sa default-editor
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user default-editor
```
31 changes: 1 addition & 30 deletions content/docs/other-guides/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,37 +78,8 @@ how RBAC interacts with IAM on GCP.

## Problems spawning Jupyter pods

If you're having trouble spawning Jupyter notebooks, check that the pod is getting
scheduled
This section has been moved to [Jupyter Notebooks Troubleshooting Guide] (/docs/notebooks/troubleshoot/).

```
kubectl -n ${NAMESPACE} get pods
```

* Look for pods whose name starts with juypter
* If you are using username/password auth with Jupyter the pod will be named

```
jupyter-${USERNAME}
```

* If you are using IAP on GKE the pod will be named

```
jupyter-accounts-2egoogle-2ecom-3USER-40DOMAIN-2eEXT
```

* Where USER@DOMAIN.EXT is the Google account used with IAP

Once you know the name of the pod do

```
kubectl -n ${NAMESPACE} describe pods ${PODNAME}
```

* Look at the events to see if there are any errors trying to schedule the pod
* One common error is not being able to schedule the pod because there aren't
enough resources in the cluster.

## Pods stuck in Pending state

Expand Down

0 comments on commit 1dd668e

Please sign in to comment.