Skip to content

Commit

Permalink
Add instructions for setting up infrastructure for releasing. (#40)
Browse files Browse the repository at this point in the history
Our release infra is pretty much a mirror of our test infra except more restricted (e.g. we don't expose the Argo UI.)

We also need to grant the service account permissions on projects used as
GCR registries.

Update the instructions for setting up our test infra.
Provide gcloud commands for some steps.

Related to kubeflow/training-operator#400
  • Loading branch information
jlewi authored Feb 28, 2018
1 parent 2a12725 commit a413d63
Show file tree
Hide file tree
Showing 10 changed files with 75,674 additions and 23 deletions.
100 changes: 80 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,38 +218,63 @@ NAMESPACE=kubeflow-test-infra
gcloud --project=${PROJECT} container clusters create \
--zone=${ZONE} \
--machine-type=n1-standard-8 \
--cluster-version=1.8.4-gke.1 \
--machine-type=n1-standard-8 \
${CLUSTER}
```


### Create a static ip for the Argo UI

```
gcloud compute --project=mlkube-testing addresses create argo-ui --global
gcloud compute --project=${PROJECT} addresses create argo-ui --global
```

### Enable GCP APIs

```
gcloud services --project=${PROJECT} enable cloudbuild.googleapis.com
gcloud services --project=${PROJECT} enable containerregistry.googleapis.com
gcloud services --project=${PROJECT} enable container.googleapis.com
```
### Create a GCP service account

* The tests need a GCP service account to upload data to GCS for Gubernator

```
SERVICE_ACCOUNT=kubeflow-testing
gcloud iam service-accounts --project=mlkube-testing create ${SERVICE_ACCOUNT} --display-name "Kubeflow testing account"
gcloud iam service-accounts --project=${PROJECT} create ${SERVICE_ACCOUNT} --display-name "Kubeflow testing account"
gcloud projects add-iam-policy-binding ${PROJECT} \
--member serviceAccount:${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com --role roles/container.developer
--member serviceAccount:${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com --role roles/container.admin \
--role=roles/viewer \
--role=roles/cloudbuild.builds.editor \
--role=roles/logging.viewer \
--role=roles/storage.admin
```
* Our tests create K8s resources (e.g. namespaces) which is why we grant it developer permissions.
* Project Viewer (because GCB requires this with gcloud)
* Kubernetes Engine Admin (some tests create GKE clusters)
* Logs viewer (for GCB)
* Storage Admin (For GCR)


```
GCE_DEFAULT=${PROJECT_NUMBER}-compute@developer.gserviceaccount.com
FULL_SERVICE=${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
gcloud --project=${PROJECT} iam service-accounts add-iam-policy-binding \
${GCE_DEFAULT} --member="serviceAccount:${FULL_SERVICE}" \
--role=roles/iam.serviceAccountUser
```
* Service Account User of the Compute Engine Default Service account (to avoid this [error](https://stackoverflow.com/questions/40367866/gcloud-the-user-does-not-have-access-to-service-account-default))


Create a secret key containing a GCP private key for the service account

```
KEY_FILE=<path to key>
SECRET_NAME=gcp-credentials
gcloud iam service-accounts keys create ${KEY_FILE} \
--iam-account ${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
kubectl create secret generic kubeflow-testing-credentials \
--namespace=kubeflow-test-infra --from-file=key.json=${KEY_FILE}
kubectl create secret generic ${SECRET_NAME} \
--namespace=${NAMESPACE} --from-file=key.json=${KEY_FILE}
```

Make the service account a cluster admin
Expand All @@ -260,15 +285,6 @@ kubectl create clusterrolebinding ${SERVICE_ACCOUNT}-admin --clusterrole=cluste
```
* The service account is used to deploye Kubeflow which entails creating various roles; so it needs sufficient RBAC permission to do so.

The service account also needs the following GCP privileges because various tests use them

* Project Viewer (because GCB requires this with gcloud)
* Cloud Container Builder
* Kubernetes Engine Admin (some tests create GKE clusters)
* Logs viewer
* Storage Admin
* Service Account User of the Compute Engine Default Service account (to avoid this [error](https://stackoverflow.com/questions/40367866/gcloud-the-user-does-not-have-access-to-service-account-default))

### Create a GitHub Token

You need to use a GitHub token with ksonnet otherwise the test quickly runs into GitHub API limits.
Expand All @@ -282,7 +298,7 @@ You can use the GitHub API to create a token
To create the secret run

```
kubectl create secret generic github-token --namespace=kubeflow-test-infra --from-literal=github_token=${GITHUB_TOKEN}
kubectl create secret generic github-token --namespace=${NAMESPACE} --from-literal=github_token=${GITHUB_TOKEN}
```

### Deploy NFS
Expand All @@ -307,14 +323,58 @@ point to your cluster.

You can deploy argo as follows (you don't need to use argo's CLI)

Set up the environment

```
ks apply prow -c argo
NFS_SERVER=<Internal GCE IP address of the NFS Server>
ks env add ${ENV}
ks param set --env=${ENV} argo namespace ${NAMESPACE}
ks param set --env=${ENV} debug-worker namespace ${NAMESPACE}
ks param set --env=${ENV} nfs-external namespace ${NAMESPACE}
ks param set --env=${ENV} nfs-external nfsServer ${NFS_SERVER}
```

In the testing environment (but not release) we also expose the UI

```
ks param set --env=${ENV} argo exposeUi true
```

```
ks apply ${ENV} -c argo
```

Create the PVs corresponding to external NFS

```
ks apply prow -c nfs-external
ks apply ${ENV} -c nfs-external
```

### Release infrastructure

Our release infrastructure is largely identical to our test infrastructure
except its more locked down.

In particular, we don't expose the Argo UI publicly.

Additionally we need to grant the service account access to the GCR
registry used to host our images.

```
GCR_PROJECT=kubeflow-images-staging
gcloud projects add-iam-policy-binding ${GCR_PROJECT} \
--member serviceAccount:${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
--role=roles/storage.admin
```

We also need to give access to the GCB service account to the registry

```
GCR_PROJECT=kubeflow-images-staging
GCB_SERVICE_ACCOUNT=${PROJECT_NUMBER}@cloudbuild.gserviceaccount.com
gcloud projects add-iam-policy-binding ${GCR_PROJECT} \
--member serviceAccount:${GCB_SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
--role=roles/storage.admin
```

#### Troubleshooting
Expand Down
9 changes: 6 additions & 3 deletions test-infra/components/argo.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,17 @@ local k = import 'k.libsonnet';
local argo = import 'argo.libsonnet';
local namespace = params.namespace;

local ingress = if params.exposeUi then
[argo.parts(namespace).uiIngress]
else [];

std.prune(k.core.v1.list.new([
argo.parts(namespace).crd,
argo.parts(namespace).config,
argo.parts(namespace).deploy,
argo.parts(namespace).deployUi,
argo.parts(namespace).uiService,
argo.parts(namespace).uiIngress,
argo.parts(namespace).uiService,
argo.parts(namespace).serviceAccount,
argo.parts(namespace).roleBinding,
argo.parts(namespace).defaultRoleBinding,
]))
] + ingress))
1 change: 1 addition & 0 deletions test-infra/components/params.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
// Each object below should correspond to a component in the components/ directory
argo: {
namespace: "kubeflow-test-infra",
exposeUi: false,
},
"nfs-external": {
name: "nfs-external",
Expand Down
3 changes: 3 additions & 0 deletions test-infra/environments/kubeflow-ci/params.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,8 @@ params + {
"nfs-external" +: {
nfsServer: "10.128.0.3",
},
argo +: {
exposeUi: true,
},
},
}
80 changes: 80 additions & 0 deletions test-infra/environments/releasing/.metadata/k.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
local k8s = import "k8s.libsonnet";

local apps = k8s.apps;
local core = k8s.core;
local extensions = k8s.extensions;

local hidden = {
mapContainers(f):: {
local podContainers = super.spec.template.spec.containers,
spec+: {
template+: {
spec+: {
// IMPORTANT: This overwrites the 'containers' field
// for this deployment.
containers: std.map(f, podContainers),
},
},
},
},

mapContainersWithName(names, f) ::
local nameSet =
if std.type(names) == "array"
then std.set(names)
else std.set([names]);
local inNameSet(name) = std.length(std.setInter(nameSet, std.set([name]))) > 0;
self.mapContainers(
function(c)
if std.objectHas(c, "name") && inNameSet(c.name)
then f(c)
else c
),
};

k8s + {
apps:: apps + {
v1beta1:: apps.v1beta1 + {
local v1beta1 = apps.v1beta1,

daemonSet:: v1beta1.daemonSet + {
mapContainers(f):: hidden.mapContainers(f),
mapContainersWithName(names, f):: hidden.mapContainersWithName(names, f),
},

deployment:: v1beta1.deployment + {
mapContainers(f):: hidden.mapContainers(f),
mapContainersWithName(names, f):: hidden.mapContainersWithName(names, f),
},
},
},

core:: core + {
v1:: core.v1 + {
list:: {
new(items)::
{apiVersion: "v1"} +
{kind: "List"} +
self.items(items),

items(items):: if std.type(items) == "array" then {items+: items} else {items+: [items]},
},
},
},

extensions:: extensions + {
v1beta1:: extensions.v1beta1 + {
local v1beta1 = extensions.v1beta1,

daemonSet:: v1beta1.daemonSet + {
mapContainers(f):: hidden.mapContainers(f),
mapContainersWithName(names, f):: hidden.mapContainersWithName(names, f),
},

deployment:: v1beta1.deployment + {
mapContainers(f):: hidden.mapContainers(f),
mapContainersWithName(names, f):: hidden.mapContainersWithName(names, f),
},
},
},
}
Loading

0 comments on commit a413d63

Please sign in to comment.