Argo workflow to run E2E tests (#72)

* Create an Argo workflow to run the E2E test for Kubeflow deployment * Create a ksonnet app for deploying Argo in our test infrastructure * Create a ksonnet component to trigger the E2E workflow. * Add tensorflow/k8s as a git submodule because we want to reuse some python scripts in that project to write our tests. * bootstrap.sh is the entrypoint for our prow jobs It will be used to check out the repo at the commit corresponding to the prow job and then invoke a test script in the repo. This ensures that the bulk of our test logic is pulled from the repo at the commit being tested. * checkout.sh is a script for checking out the source to be used as the first step in our workflows The Argo workflow uses an NFS share to store test data so that we can have multiple steps running in parallel and accessing the same files.
kubeflow · Jan 6, 2018 · 17aa4ce · 17aa4ce
1 parent 7b543ef
commit 17aa4ce
Show file tree

Hide file tree

Showing 28 changed files with 77,025 additions and 14 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "tensorflow_k8s"]
+	path = tensorflow_k8s
+	url = https://github.com/tensorflow/k8s.git
diff --git a/tensorflow_k8s b/tensorflow_k8s
diff --git a/testing/Dockerfile b/testing/Dockerfile
@@ -0,0 +1,97 @@
+# Docker image for running E2E tests using Argo.
+
+FROM python:2.7-slim
+MAINTAINER Jeremy Lewi
+
+# Never prompt the user for choices on installation/configuration of packages
+ENV DEBIAN_FRONTEND noninteractive
+ENV TERM linux
+
+# Define en_US.
+ENV LANGUAGE=en_US.UTF-8 \
+    LANG=en_US.UTF-8 \
+    LC_ALL=en_US.UTF-8 \
+    LC_CTYPE=en_US.UTF-8 \
+    LC_MESSAGES=en_US.UTF-8 \
+    LC_ALL=en_US.UTF-8
+
+
+# buildDeps should be packages needed only to build some other packages as
+# these packages are purged in a later step.
+#
+# gcc & python-dev are needed so we can install crcmod for gsutil
+RUN set -ex \
+    && apt-get update -yqq \
+    && apt-get install -yqq --no-install-recommends \
+        curl \
+        locales \
+        wget \
+        ca-certificates \
+        git \
+        zip \
+        unzip \
+        gcc python-dev \
+        python-setuptools \
+    && apt-get clean \
+    && rm -rf \
+        /var/lib/apt/lists/* \
+        /tmp/* \
+        /var/tmp/* \
+        /usr/share/man \
+        /usr/share/doc \
+        /usr/share/doc-base
+
+# Set the locale
+RUN sed -i 's/^# en_US.UTF-8 UTF-8$/en_US.UTF-8 UTF-8/g' /etc/locale.gen \
+    && locale-gen \
+    && update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
+
+# Install go
+RUN cd /tmp && \
+    wget -O /tmp/go.tar.gz https://redirector.gvt1.com/edgedl/go/go1.9.2.linux-amd64.tar.gz && \
+    tar -C /usr/local -xzf go.tar.gz
+
+# Install gcloud
+ENV PATH=/google-cloud-sdk/bin:/workspace:${PATH} \
+    CLOUDSDK_CORE_DISABLE_PROMPTS=1
+
+RUN wget -q https://dl.google.com/dl/cloudsdk/channels/rapid/google-cloud-sdk.tar.gz && \
+    tar xzf google-cloud-sdk.tar.gz -C / && \
+    rm google-cloud-sdk.tar.gz && \
+    /google-cloud-sdk/install.sh \
+        --disable-installation-options \
+        --bash-completion=false \
+        --path-update=false \
+        --usage-reporting=false && \
+    gcloud components install alpha beta kubectl
+
+# Install CRCMOD for gsutil
+RUN easy_install -U pip && \
+    pip install -U crcmod
+
+# Install Helm
+RUN wget -O /tmp/get_helm.sh \
+    https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get && \
+    chmod 700 /tmp/get_helm.sh && \
+    /tmp/get_helm.sh && \
+    rm /tmp/get_helm.sh
+
+# Initialize helm
+RUN helm init --client-only
+
+# Install ksonnet
+RUN curl -o /usr/local/bin/ks -L \
+    https://github.com/ksonnet/ksonnet/releases/download/v0.8.0/ks-linux-amd64 && \
+    chmod a+x /usr/local/bin/ks
+
+# Install various python libraries.
+RUN  pip install --upgrade six pyyaml google-api-python-client \
+     google-cloud-storage google-auth-httplib2 pylint kubernetes==4.0.0 mock retrying
+
+COPY bootstrap.sh /usr/local/bin
+RUN chmod a+x /usr/local/bin/bootstrap.sh
+
+COPY checkout.sh /usr/local/bin
+RUN chmod a+x /usr/local/bin/checkout.sh
+
+ENTRYPOINT ["/usr/local/bin/bootstrap.sh"]
diff --git a/testing/Makefile b/testing/Makefile
@@ -0,0 +1,35 @@
+# Copyright 2017 The Kubernetes Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Requirements:
+#   https://github.com/mattrobenolt/jinja2-cli
+#   pip install jinja2-clie
+IMG = gcr.io/mlkube-testing/kubeflow-testing
+TAG := $(shell date +v%Y%m%d)-$(shell git describe --tags --always --dirty)-$(shell git diff | sha256sum | cut -c -6)
+DIR := ${CURDIR}
+
+all: build
+
+# To build without the cache set the environment variable
+# export DOCKER_BUILD_OPTS=--no-cache
+build:
+	@echo {\"image\": \"$(IMG):$(TAG)\"} > version.json
+	docker build ${DOCKER_BUILD_OPTS} -t $(IMG):$(TAG) .
+	docker tag $(IMG):$(TAG) $(IMG):latest
+	@echo Built $(IMG):$(TAG) and tagged with latest
+
+push: build
+	gcloud docker -- push $(IMG):$(TAG)
+	gcloud docker -- push $(IMG):latest
+	@echo Pushed $(IMG) with :latest and :$(TAG) tags
diff --git a/testing/README.md b/testing/README.md
@@ -14,21 +14,158 @@ The current thinking is this will work as follows
   * Each step in the pipeline can write outputs and junit.xml files to a test directory in the volume
   * A final step in the Argo pipeline will upload the outputs to GCS so they are available in gubernator
 
+## Accessing Argo UI
 
-## Permissions
+You can access the Argo UI over the API Server proxy.
 
-User or service account deploying Kubeflow needs sufficient permissions to create the roles that are created as part of a Kubeflow deployment. For example you may need to run
+We currently use the cluster
 
 ```
-kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --user=user@gmail.com
+PROJECT=mlkube-testing
+ZONE=us-east1-d
+CLUSTER=kubeflow-testing
+NAMESPACE=kubeflow-test-infra
+```
+
+After starting `kubectl proxy` you can connect to it at
+
+Then you can connect to the UI via the proxy at
+
+```
+http://127.0.0.1:8001/api/v1/proxy/namespaces/kubeflow-test-infra/services/argo-ui:80/
+```
+
+TODO(jlewi): We can probably make the UI publicly available since I don't think it offers any ability to launch workflows.
+
+
+## Running the tests
+
+### Run a presubmit
+
+```
+ks param set workflows name e2e-test-pr-`date '+%Y%m%d-%H%M%S'`
+ks param set workflows prow_env REPO_OWNER=google,REPO_NAME=kubeflow,PULL_NUMBER=${PULL_NUMBER},PULL_PULL_SHA=${COMMIT}
+ks param set workflows commit ${COMMIT}
+ks apply prow -c workflows
+```
+	* You can set COMMIT to `pr` to checkout the latest change on the PR.
+
+### Run a postsubmit
+
+```
+ks param set workflows name e2e-test-postsubmit-`date '+%Y%m%d-%H%M%S'`
+ks param set workflows prow_env REPO_OWNER=google,REPO_NAME=kubeflow,PULL_BASE_SHA=${COMMIT}
+ks param set workflows commit ${COMMIT}
+ks apply prow -c workflows
+```
+  * You can set COMMIT to `master` to use HEAD
+
+
+## Setting up the Test Infrastructure
+
+Our tests require a K8s cluster with Argo installed. This section provides the instructions 
+for setting this.
+
+Create a GKE cluster
+
 ```
+PROJECT=mlkube-testing
+ZONE=us-east1-d
+CLUSTER=kubeflow-testing
+NAMESPACE=kubeflow-test-infra
+
+gcloud --project=${PROJECT} container clusters create \
+	--zone=${ZONE} \
+	--machine-type=n1-standard-8 \
+	--cluster-version=1.8.4-gke.1 \
+	${CLUSTER}
+```
+
+
+### Create a GCP service account
+
+	* The tests need a GCP service account to upload data to GCS for Gubernator
+
+	```
+	SERVICE_ACCOUNT=kubeflow-testing
+	gcloud iam service-accounts --project=mlkube-testing create ${SERVICE_ACCOUNT} --display-name "Kubeflow testing account"
+	gcloud projects add-iam-policy-binding ${PROJECT} \
+    	--member serviceAccount:${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com --role roles/container.developer
+	```
+		* The service account needs to be able to create K8s resources as part of the test.
+
+
+	Create a secret key for the service account
 
-## GitHub tokens
+	```
+	gcloud iam service-accounts keys create ~/tmp/key.json \
+    	--iam-account ${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
+    kubectl create secret generic kubeflow-testing-credentials \
+        --namespace=kubeflow-test-infra --from-file=`echo ~/tmp/key.json`
+    rm ~/tmp/key.json
+	```
+
+	Make the service account a cluster admin
+
+	```
+	kubectl create clusterrolebinding  ${SERVICE_ACCOUNT}-admin --clusterrole=cluster-admin  \
+		--user=${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com 
+	```
+		* The service account is used to deploye Kubeflow which entails creating various roles; so 
+		  it needs sufficient RBAC permission to do so.
+
+### Create a GitHub Token
 
 You need to use a GitHub token with ksonnet otherwise the test quickly runs into GitHub API limits.
 
 TODO(jlewi): We should create a GitHub bot account to use with our tests and then create API tokens for that bot.
 
+You can use the GitHub API to create a token
+
+   * The token doesn't need any scopes because its only accessing public data and is just need for API metering.
+
+To create the secret run
+
+```
+kubectl create secret generic github-token --namespace=kubeflow-test-infra --from-literal=github_token=${TOKEN}
+```
+
+### Create a PD for NFS
+
+Create a PD to act as the backing storage for the NFS filesystem that will be used to store data from
+the test runs.
+
+```
+  gcloud --project=${PROJECT} compute disks create  \
+  	--zone=${ZONE} kubeflow-testing --description="PD to back NFS storage for kubeflow testing." --size=1TB
+```
+### Create K8s Resources for Testing
+
+The ksonnet app `test-infra` contains ksonnet configs to deploy the test infrastructure.
+
+You can deploy argo as follows (you don't need to use argo's CLI)
+
+```
+ks apply prow -c argo
+```  
+
+Deploy NFS & Jupyter
+
+```
+ks apply prow -c nfs-jupyter
+```
+
+	* This creates the NFS share
+	* We use JupyterHub as a convenient way to access the NFS share for manual inspection of the file contents.
+
+#### Troubleshooting
+
+User or service account deploying the test infrastructure needs sufficient permissions to create the roles that are created as part deploying the test infrastructe. So you may need to run the following command before using ksonnet to deploy the test infrastructure.
+
+```
+kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --user=user@gmail.com
+```
+
 ## Managing namespaces
 
 All namespaces created for the tests should be labeled with `app=kubeflow-e2e-test`.

diff --git a/testing/__init__.py b/testing/__init__.py
diff --git a/testing/bootstrap.sh b/testing/bootstrap.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+#
+# This script is used to bootstrap our prow jobs.
+# The point of this script is to check out the google/kubeflow repo
+# at the commit corresponding to the Prow job. We can then
+# invoke the launcher script at that commit to submit and
+# monitor an Argo workflow
+set -xe
+
+mkdir -p /src
+git clone https://github.com/google/kubeflow.git /src/google_kubeflow
+
+cd /src/google_kubeflow
+
+echo Job Name = ${JOB_NAME}
+
+# See https://github.com/kubernetes/test-infra/tree/master/prow#job-evironment-variables
+if [ ! -z ${PULL_NUMBER} ]; then
+ git fetch origin  pull/${PULL_NUMBER}/head:pr
+ git checkout ${PULL_PULL_SHA}
+else 
+ if [ ! -z ${PULL_BASE_SHA} ]; then
+ 	# Its a post submit; checkout the commit to test.
+  	git checkout ${PULL_BASE_SHA}
+ fi
+fi	
+
+# Update submodules.
+git submodule init
+git submodule update
+
+# Print out the commit so we can tell from logs what we checked out.
+echo Repo is at `git describe --tags --always --dirty`
+git submodule
+git status