-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add troubleshooting guide for notebooks #1008
Changes from 8 commits
e63dc10
7d572d8
18e280f
9d28d89
237d57b
38cce38
886eb71
c601b9a
88e7803
90f2e57
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
+++ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the "Next steps" section of the following page: Add a (relative) link pointing to your new page, so that people know there's a troubleshooting page available to help when setting up their notebooks. |
||
title = "Troubleshooting Guide" | ||
description = "Fixing common problems in Kubeflow notebooks" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Merge the content from the following guide into your new page: And then add a link from the above troubleshooting guide pointing to your new page. |
||
weight = 50 | ||
+++ | ||
|
||
## Persistent Volumes and Persistent Volumes Claims | ||
|
||
First, make sure that PVCs are bounded when using Jupter notebooks. This should | ||
not be a problem when using managed Kuberenetes. But if you are using Kubernetes | ||
on-prem, check out the guide to [Kubeflow on-prem in a multi-node Kubernetes cluster](/docs/use-cases/kubeflow-on-multinode-cluster/) if you are running Kubeflow in multi-node on-prem environment. Otherwise, look at the [Pods stuck in Pending State](/docs/other-guides/troubleshooting/#pods-stuck-in-pending-state) guide to troubleshoot this problem. | ||
|
||
## Check the status of notebooks | ||
|
||
Run the commands below. | ||
|
||
``` | ||
kubectl get notebooks -o yaml ${NOTEBOOK} | ||
kubectl describe notebooks ${NOTEBOOK} | ||
``` | ||
|
||
Check the `events` section to make sure that there are no errors. | ||
|
||
## Check the status of statefulsets | ||
|
||
Make sure that the number of `statefulsets` equals the desired number. If it is | ||
not the case, check for errors using the `kubectl describe`. | ||
|
||
|
||
``` | ||
kubectl get statefulsets -o yaml ${NOTEBOOK} | ||
kubectl describe statefulsets ${NOTEBOOK} | ||
``` | ||
|
||
|
||
The output should look like below: | ||
``` | ||
NAME DESIRED CURRENT AGE | ||
your-notebook 1 1 9m4s | ||
``` | ||
## Check the status of Pods | ||
|
||
If the number of statefulsets didn't match the desired number, make sure that | ||
the number of Pods match the number of desired Pods in the first command. | ||
In case it didn't match, follow the steps below to further investigate the issue. | ||
|
||
``` | ||
kubectl get pod -o yaml ${NOTEBOOK}-0 | ||
``` | ||
|
||
* The name of the Pod should start with `jupter` | ||
* If you are using username/password auth with Jupyter the pod will be named | ||
|
||
``` | ||
jupyter-${USERNAME} | ||
``` | ||
|
||
* If you are using IAP on GKE the pod will be named | ||
|
||
``` | ||
jupyter-accounts-2egoogle-2ecom-3USER-40DOMAIN-2eEXT | ||
``` | ||
* Where USER@DOMAIN.EXT is the Google account used with IAP | ||
|
||
Once you know the name of the pod do | ||
|
||
``` | ||
kubectl describe pod ${NOTEBOOK}-0 | ||
``` | ||
|
||
* Look at the `events` to see if there are any errors trying to schedule the pod | ||
* One common error is not being able to schedule the pod because there aren’t enough resources in the cluster. | ||
|
||
|
||
If the error still persisted, check for the errors in the logs of containers. | ||
|
||
``` | ||
kubectl logs ${NOTEBOOK}-0 | ||
``` | ||
|
||
## Note for GCP Users | ||
|
||
You may encounter error below: | ||
``` | ||
Type Reason Age From Message | ||
---- ------ ---- ---- ------- | ||
Warning FailedCreate 2m19s (x26 over 7m39s) statefulset-controller create Pod test1-0 in StatefulSet test1 failed error: pods "test1-0" is forbidden: error looking up service account kubeflow/default-editor: serviceaccount "default-editor" not found | ||
``` | ||
|
||
To fix this problem, create a service account named `default-editor` with cluster-admin role. | ||
|
||
``` | ||
kubectl create sa default-editor | ||
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user default-editor | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,37 +78,8 @@ how RBAC interacts with IAM on GCP. | |
|
||
## Problems spawning Jupyter pods | ||
|
||
If you're having trouble spawning Jupyter notebooks, check that the pod is getting | ||
scheduled | ||
This section has been moved to [Jupyter Notebooks Troubleshooting Guide] (docs/notebooks/troubleshoot/). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Link gives a 404. |
||
|
||
``` | ||
kubectl -n ${NAMESPACE} get pods | ||
``` | ||
|
||
* Look for pods whose name starts with juypter | ||
* If you are using username/password auth with Jupyter the pod will be named | ||
|
||
``` | ||
jupyter-${USERNAME} | ||
``` | ||
|
||
* If you are using IAP on GKE the pod will be named | ||
|
||
``` | ||
jupyter-accounts-2egoogle-2ecom-3USER-40DOMAIN-2eEXT | ||
``` | ||
|
||
* Where USER@DOMAIN.EXT is the Google account used with IAP | ||
|
||
Once you know the name of the pod do | ||
|
||
``` | ||
kubectl -n ${NAMESPACE} describe pods ${PODNAME} | ||
``` | ||
|
||
* Look at the events to see if there are any errors trying to schedule the pod | ||
* One common error is not being able to schedule the pod because there aren't | ||
enough resources in the cluster. | ||
|
||
## Pods stuck in Pending state | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link gives a 404. Please follow the pattern of the link above.
See preview:
https://deploy-preview-1008--competent-brattain-de2d6d.netlify.com/docs/notebooks/setup/#next-steps