Skip to content

Commit

Permalink
Argo workflow to run E2E tests (#72)
Browse files Browse the repository at this point in the history
* Create an Argo workflow to run the E2E test for Kubeflow deployment
* Create a ksonnet app for deploying Argo in our test infrastructure
* Create a ksonnet component to trigger the E2E workflow.
* Add tensorflow/k8s as a git submodule because we want to reuse some python scripts in that project to write our tests. 
* bootstrap.sh is the entrypoint for our prow jobs
    It will be used to check out the repo at the commit corresponding to the prow job and then invoke
a test script in the repo. This ensures that the bulk of our test logic is pulled from the repo at the
commit being tested.
* checkout.sh is a script for checking out the source to be used as the first step in our workflows 
  The Argo workflow uses an NFS share to store test data so that we can have multiple steps running
  in parallel and accessing the same files.
  • Loading branch information
jlewi authored Jan 6, 2018
1 parent 7b543ef commit 17aa4ce
Show file tree
Hide file tree
Showing 28 changed files with 77,025 additions and 14 deletions.
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "tensorflow_k8s"]
path = tensorflow_k8s
url = https://github.com/tensorflow/k8s.git
1 change: 1 addition & 0 deletions tensorflow_k8s
Submodule tensorflow_k8s added at ed7cae
97 changes: 97 additions & 0 deletions testing/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Docker image for running E2E tests using Argo.

FROM python:2.7-slim
MAINTAINER Jeremy Lewi

# Never prompt the user for choices on installation/configuration of packages
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux

# Define en_US.
ENV LANGUAGE=en_US.UTF-8 \
LANG=en_US.UTF-8 \
LC_ALL=en_US.UTF-8 \
LC_CTYPE=en_US.UTF-8 \
LC_MESSAGES=en_US.UTF-8 \
LC_ALL=en_US.UTF-8


# buildDeps should be packages needed only to build some other packages as
# these packages are purged in a later step.
#
# gcc & python-dev are needed so we can install crcmod for gsutil
RUN set -ex \
&& apt-get update -yqq \
&& apt-get install -yqq --no-install-recommends \
curl \
locales \
wget \
ca-certificates \
git \
zip \
unzip \
gcc python-dev \
python-setuptools \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/* \
/usr/share/man \
/usr/share/doc \
/usr/share/doc-base

# Set the locale
RUN sed -i 's/^# en_US.UTF-8 UTF-8$/en_US.UTF-8 UTF-8/g' /etc/locale.gen \
&& locale-gen \
&& update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8

# Install go
RUN cd /tmp && \
wget -O /tmp/go.tar.gz https://redirector.gvt1.com/edgedl/go/go1.9.2.linux-amd64.tar.gz && \
tar -C /usr/local -xzf go.tar.gz

# Install gcloud
ENV PATH=/google-cloud-sdk/bin:/workspace:${PATH} \
CLOUDSDK_CORE_DISABLE_PROMPTS=1

RUN wget -q https://dl.google.com/dl/cloudsdk/channels/rapid/google-cloud-sdk.tar.gz && \
tar xzf google-cloud-sdk.tar.gz -C / && \
rm google-cloud-sdk.tar.gz && \
/google-cloud-sdk/install.sh \
--disable-installation-options \
--bash-completion=false \
--path-update=false \
--usage-reporting=false && \
gcloud components install alpha beta kubectl

# Install CRCMOD for gsutil
RUN easy_install -U pip && \
pip install -U crcmod

# Install Helm
RUN wget -O /tmp/get_helm.sh \
https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get && \
chmod 700 /tmp/get_helm.sh && \
/tmp/get_helm.sh && \
rm /tmp/get_helm.sh

# Initialize helm
RUN helm init --client-only

# Install ksonnet
RUN curl -o /usr/local/bin/ks -L \
https://github.com/ksonnet/ksonnet/releases/download/v0.8.0/ks-linux-amd64 && \
chmod a+x /usr/local/bin/ks

# Install various python libraries.
RUN pip install --upgrade six pyyaml google-api-python-client \
google-cloud-storage google-auth-httplib2 pylint kubernetes==4.0.0 mock retrying

COPY bootstrap.sh /usr/local/bin
RUN chmod a+x /usr/local/bin/bootstrap.sh

COPY checkout.sh /usr/local/bin
RUN chmod a+x /usr/local/bin/checkout.sh

ENTRYPOINT ["/usr/local/bin/bootstrap.sh"]
35 changes: 35 additions & 0 deletions testing/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Copyright 2017 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Requirements:
# https://github.com/mattrobenolt/jinja2-cli
# pip install jinja2-clie
IMG = gcr.io/mlkube-testing/kubeflow-testing
TAG := $(shell date +v%Y%m%d)-$(shell git describe --tags --always --dirty)-$(shell git diff | sha256sum | cut -c -6)
DIR := ${CURDIR}

all: build

# To build without the cache set the environment variable
# export DOCKER_BUILD_OPTS=--no-cache
build:
@echo {\"image\": \"$(IMG):$(TAG)\"} > version.json
docker build ${DOCKER_BUILD_OPTS} -t $(IMG):$(TAG) .
docker tag $(IMG):$(TAG) $(IMG):latest
@echo Built $(IMG):$(TAG) and tagged with latest

push: build
gcloud docker -- push $(IMG):$(TAG)
gcloud docker -- push $(IMG):latest
@echo Pushed $(IMG) with :latest and :$(TAG) tags
145 changes: 141 additions & 4 deletions testing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,158 @@ The current thinking is this will work as follows
* Each step in the pipeline can write outputs and junit.xml files to a test directory in the volume
* A final step in the Argo pipeline will upload the outputs to GCS so they are available in gubernator

## Accessing Argo UI

## Permissions
You can access the Argo UI over the API Server proxy.

User or service account deploying Kubeflow needs sufficient permissions to create the roles that are created as part of a Kubeflow deployment. For example you may need to run
We currently use the cluster

```
kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --user=user@gmail.com
PROJECT=mlkube-testing
ZONE=us-east1-d
CLUSTER=kubeflow-testing
NAMESPACE=kubeflow-test-infra
```

After starting `kubectl proxy` you can connect to it at

Then you can connect to the UI via the proxy at

```
http://127.0.0.1:8001/api/v1/proxy/namespaces/kubeflow-test-infra/services/argo-ui:80/
```

TODO(jlewi): We can probably make the UI publicly available since I don't think it offers any ability to launch workflows.


## Running the tests

### Run a presubmit

```
ks param set workflows name e2e-test-pr-`date '+%Y%m%d-%H%M%S'`
ks param set workflows prow_env REPO_OWNER=google,REPO_NAME=kubeflow,PULL_NUMBER=${PULL_NUMBER},PULL_PULL_SHA=${COMMIT}
ks param set workflows commit ${COMMIT}
ks apply prow -c workflows
```
* You can set COMMIT to `pr` to checkout the latest change on the PR.

### Run a postsubmit

```
ks param set workflows name e2e-test-postsubmit-`date '+%Y%m%d-%H%M%S'`
ks param set workflows prow_env REPO_OWNER=google,REPO_NAME=kubeflow,PULL_BASE_SHA=${COMMIT}
ks param set workflows commit ${COMMIT}
ks apply prow -c workflows
```
* You can set COMMIT to `master` to use HEAD


## Setting up the Test Infrastructure

Our tests require a K8s cluster with Argo installed. This section provides the instructions
for setting this.

Create a GKE cluster

```
PROJECT=mlkube-testing
ZONE=us-east1-d
CLUSTER=kubeflow-testing
NAMESPACE=kubeflow-test-infra
gcloud --project=${PROJECT} container clusters create \
--zone=${ZONE} \
--machine-type=n1-standard-8 \
--cluster-version=1.8.4-gke.1 \
${CLUSTER}
```


### Create a GCP service account

* The tests need a GCP service account to upload data to GCS for Gubernator

```
SERVICE_ACCOUNT=kubeflow-testing
gcloud iam service-accounts --project=mlkube-testing create ${SERVICE_ACCOUNT} --display-name "Kubeflow testing account"
gcloud projects add-iam-policy-binding ${PROJECT} \
--member serviceAccount:${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com --role roles/container.developer
```
* The service account needs to be able to create K8s resources as part of the test.


Create a secret key for the service account

## GitHub tokens
```
gcloud iam service-accounts keys create ~/tmp/key.json \
--iam-account ${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
kubectl create secret generic kubeflow-testing-credentials \
--namespace=kubeflow-test-infra --from-file=`echo ~/tmp/key.json`
rm ~/tmp/key.json
```

Make the service account a cluster admin

```
kubectl create clusterrolebinding ${SERVICE_ACCOUNT}-admin --clusterrole=cluster-admin \
--user=${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
```
* The service account is used to deploye Kubeflow which entails creating various roles; so
it needs sufficient RBAC permission to do so.

### Create a GitHub Token

You need to use a GitHub token with ksonnet otherwise the test quickly runs into GitHub API limits.

TODO(jlewi): We should create a GitHub bot account to use with our tests and then create API tokens for that bot.

You can use the GitHub API to create a token

* The token doesn't need any scopes because its only accessing public data and is just need for API metering.

To create the secret run

```
kubectl create secret generic github-token --namespace=kubeflow-test-infra --from-literal=github_token=${TOKEN}
```

### Create a PD for NFS

Create a PD to act as the backing storage for the NFS filesystem that will be used to store data from
the test runs.

```
gcloud --project=${PROJECT} compute disks create \
--zone=${ZONE} kubeflow-testing --description="PD to back NFS storage for kubeflow testing." --size=1TB
```
### Create K8s Resources for Testing

The ksonnet app `test-infra` contains ksonnet configs to deploy the test infrastructure.

You can deploy argo as follows (you don't need to use argo's CLI)

```
ks apply prow -c argo
```

Deploy NFS & Jupyter

```
ks apply prow -c nfs-jupyter
```

* This creates the NFS share
* We use JupyterHub as a convenient way to access the NFS share for manual inspection of the file contents.

#### Troubleshooting

User or service account deploying the test infrastructure needs sufficient permissions to create the roles that are created as part deploying the test infrastructe. So you may need to run the following command before using ksonnet to deploy the test infrastructure.

```
kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --user=user@gmail.com
```

## Managing namespaces

All namespaces created for the tests should be labeled with `app=kubeflow-e2e-test`.
Expand Down
Empty file added testing/__init__.py
Empty file.
35 changes: 35 additions & 0 deletions testing/bootstrap.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash
#
# This script is used to bootstrap our prow jobs.
# The point of this script is to check out the google/kubeflow repo
# at the commit corresponding to the Prow job. We can then
# invoke the launcher script at that commit to submit and
# monitor an Argo workflow
set -xe

mkdir -p /src
git clone https://github.com/google/kubeflow.git /src/google_kubeflow

cd /src/google_kubeflow

echo Job Name = ${JOB_NAME}

# See https://github.com/kubernetes/test-infra/tree/master/prow#job-evironment-variables
if [ ! -z ${PULL_NUMBER} ]; then
git fetch origin pull/${PULL_NUMBER}/head:pr
git checkout ${PULL_PULL_SHA}
else
if [ ! -z ${PULL_BASE_SHA} ]; then
# Its a post submit; checkout the commit to test.
git checkout ${PULL_BASE_SHA}
fi
fi

# Update submodules.
git submodule init
git submodule update

# Print out the commit so we can tell from logs what we checked out.
echo Repo is at `git describe --tags --always --dirty`
git submodule
git status
Loading

0 comments on commit 17aa4ce

Please sign in to comment.