Skip to content

Commit

Permalink
Improve auto deploy infrastructure (kubeflow#641)
Browse files Browse the repository at this point in the history
* Add to playbook for auto deploy infrastructure.

* Fix link

* Fix checkout link.

* Support downloading unipped binaries.

* Playbook.

* Latest.

* Log changing permissions.
  • Loading branch information
jlewi authored Apr 24, 2020
1 parent 9c2bbd7 commit 1298c97
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 5 deletions.
15 changes: 14 additions & 1 deletion playbook/auto_deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,18 @@ Playbook for the auto deployed instances of Kubeflow.
1. Check logs

* [Reconciler logs](https://console.cloud.google.com/logs/viewer?project=kubeflow-ci&folder&organizationId&minLogLevel=0&expandAll=false&interval=PT1H&resource=k8s_container%2Fcluster_name%2Fkubeflow-testing%2Fnamespace_name%2Ftest-pod&advancedFilter=resource.type%3D%22k8s_container%22%0Aresource.labels.cluster_name%3D%22kf-ci-v1%22%0Aresource.labels.namespace_name%3D%22auto-deploy%22%0Alabels.%22k8s-pod%2Fapp%22%3D%22auto-deploy%22%0Aresource.labels.container_name%3D%22reconciler%22)

* You can filter by version name to see entries for a specific version; e.g

```
resource.type="k8s_container"
resource.labels.cluster_name="kf-ci-v1"
resource.labels.namespace_name="auto-deploy"
labels."k8s-pod/app"="auto-deploy"
resource.labels.container_name="reconciler"
jsonPayload.version_name="v1"
```

* [Server logs](https://console.cloud.google.com/logs/viewer?project=kubeflow-ci&folder&organizationId&minLogLevel=0&expandAll=false&interval=PT1H&resource=k8s_container%2Fcluster_name%2Fkubeflow-testing%2Fnamespace_name%2Ftest-pod&advancedFilter=resource.type%3D%22k8s_container%22%0Aresource.labels.cluster_name%3D%22kf-ci-v1%22%0Aresource.labels.namespace_name%3D%22auto-deploy%22%0Alabels.%22k8s-pod%2Fapp%22%3D%22auto-deploy%22%0Aresource.labels.container_name%3D%22server%22)

1. Connect to the **kf-ci-v1** cluster
Expand All @@ -18,7 +30,8 @@ Playbook for the auto deployed instances of Kubeflow.
```

* fetch those logs
* Go to the [GKE Workloads Dashboard](https://cloud.console.google.com/kubernetes/workload?project=kubeflow-ci&pageState=(%22workload_list_table%22:(%22f%22:%22%255B%257B_22k_22_3A_22Is%2520system%2520object_22_2C_22t_22_3A11_2C_22v_22_3A_22_5C_22False_~*false_5C_22_22_2C_22i_22_3A_22is_system_22%257D_2C%257B_22k_22_3A_22cluster_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22kf-ci-v1_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22metadata%252FclusterReference%252Fname_22%257D_2C%257B_22k_22_3A_22namespace_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22auto-deploy_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22metadata%252Fnamespace_22%257D%255D%22) and navigate to the job
* Go to the [GKE Workloads Dashboard](https://cloud.console.google.com/kubernetes/workload?project=kubeflow-ci&pageState=\(%22workload_list_table%22:(%22f%22:%22%255B%257B_22k_22_3A_22Is%2520system%2520object_22_2C_22t_22_3A11_2C_22v_22_3A_22_5C_22False_~*false_5C_22_22_2C_22i_22_3A_22is_system_22%257D_2C%257B_22k_22_3A_22cluster_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22kf-ci-v1_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22metadata%252FclusterReference%252Fname_22%257D_2C%257B_22k_22_3A_22namespace_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22auto-deploy_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22metadata%252Fnamespace_22%257D%255D%22\)) and navigate to the job
* Click through to the pod and then the link to view the logs
* If you look in the reconciler logs you can filter by version name to get most recent job name
* TODO(jlewi): Unfortunately the dashboard doesn't appear to allow sorting by
creation timestamp which makes it hard to find latest ones
19 changes: 16 additions & 3 deletions py/kubeflow/testing/create_unique_kf_instance.py
Original file line number Diff line number Diff line change
Expand Up @@ -431,14 +431,27 @@ def _gcloud_list():
else:
if args.kfctl_path.startswith("http"):
temp_dir = tempfile.mkdtemp()
util.run(["curl", "-L", "-o", "kfctl.tar.gz", args.kfctl_path],

filename = "kfctl"

zipped = False
if args.kfctl_path.endswith(".tar.gz"):
zipped = True
filename = filename + ".tar.gz"

util.run(["curl", "-L", "-o", filename, args.kfctl_path],
cwd=temp_dir)
util.run(["tar", "-xvf", "kfctl.tar.gz"], cwd=temp_dir)
if zipped:
util.run(["tar", "-xvf", "kfctl.tar.gz"], cwd=temp_dir)

kfctl_path = os.path.join(temp_dir, "kfctl")
git_describe = util.run([kfctl_path, "version"])
logging.info("Changing permissions on %s", kfctl_path)
os.chmod(kfctl_path, 0o777)
else:
kfctl_path = args.kfctl_path

git_describe = util.run([kfctl_path, "version"])

logging.info("kfctl path set to %s", kfctl_path)

# We need to keep the name short to avoid hitting limits with certificates.
Expand Down
4 changes: 3 additions & 1 deletion test-infra/auto-deploy/manifest/config/deploy-kubeflow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,10 @@ spec:
- command:
- /usr/local/bin/checkout_repos.sh
# TOODO(jlewi): We should really switch to tekton and use resources.
# TODO(https://github.com/kubeflow/testing/pull/641): Switch to kubeflow/testing@HEAD
# after 641 is merged
- --depth=all
- --repos=kubeflow/kfctl@HEAD,kubeflow/testing@auto_update
- --repos=kubeflow/kfctl@HEAD,jlewi/testing@playbook
- --src_dir=/src
- --links
env:
Expand Down

0 comments on commit 1298c97

Please sign in to comment.