Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge batch #4414

Merged
merged 74 commits into from
Sep 25, 2018
Merged
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
f063f98
initial revision
cseed Jul 2, 2018
fc65bad
wip
cseed Jul 6, 2018
9a422b3
added deployment
cseed Jul 7, 2018
1bd0e48
added service
cseed Jul 7, 2018
2037e4e
wip
cseed Jul 7, 2018
a55af85
wip
cseed Jul 7, 2018
d641c13
cancellation, primitive client library
cseed Jul 8, 2018
1068eac
batches, higher level api
cseed Jul 8, 2018
7babf6d
fixed readme
cseed Jul 8, 2018
0a0f5eb
minor
cseed Jul 9, 2018
82d75bc
added tests, reorg
cseed Jul 9, 2018
7a23054
tests run in itself
cseed Jul 9, 2018
53226dc
wip
cseed Jul 9, 2018
fdbb8ba
added callbacks, callback test
cseed Jul 9, 2018
bda2ca9
makefile changes
cseed Jul 9, 2018
071d17e
wip
cseed Jul 9, 2018
5822792
tweaks
cseed Jul 9, 2018
1250ed0
callback test works on k8s
cseed Jul 10, 2018
bae3c51
starting to play with containerized spark
cseed Jul 16, 2018
f16101b
Create setup.py
Jul 18, 2018
81f4fae
move setup.py to correct location
Jul 18, 2018
a4a71eb
ignore compiled pyc files
Jul 23, 2018
604f7fd
describe how to use minikube with local docker images
Jul 23, 2018
95c0727
add special note about `imagePullPolicy`
Jul 23, 2018
6f3bc95
add an environment yaml
Jul 23, 2018
3140d76
add a getting started section
Jul 23, 2018
8b26281
add a dockerignore file
Jul 23, 2018
c51ff53
do not fail if callback fails
Jul 23, 2018
7c3caf1
expose pod/container volumes
Jul 20, 2018
8aaa162
fix volumeMounts field name
Jul 20, 2018
fdb9a38
add resources
Jul 27, 2018
07c5a62
add tolerations
Jul 27, 2018
b33b7b9
add jobs listing
Jul 27, 2018
bcee9d7
add jobs test
Jul 27, 2018
4edfecd
fix the environment
Jul 27, 2018
f2d3741
update Batch.create_job for all the new parameters
Jul 27, 2018
8fd70d5
fix missing id
Jul 27, 2018
495f91a
avoid crashing on bad event types
Jul 27, 2018
ea56322
update dockerignore
Jul 30, 2018
7ff29ae
retry event loops
cseed Jul 28, 2018
cfbddcb
Update server.py
danking Jul 30, 2018
a801317
fix dockerignore
Jul 30, 2018
85706f5
fix dockerignore
Jul 30, 2018
cbe3393
stash the attributes sent by api
Jul 30, 2018
0fa2e7a
cache the status
danking Aug 22, 2018
b687232
fixed tests (#28)
cseed Aug 26, 2018
99b12ab
Pr builder image (#27)
cseed Aug 27, 2018
ab07f4e
add hail-ci-build.sh (#30)
cseed Aug 27, 2018
2be8185
don't log the (pod) log (#29)
cseed Aug 27, 2018
464a1dc
added bash to pr-builder image (#32)
cseed Aug 28, 2018
2714944
Add git (#35)
cseed Aug 28, 2018
8d6d5c8
add python alias in pr-builder image
cseed Aug 28, 2018
c529ad0
added curl to pr-builder image
cseed Aug 28, 2018
25d77bd
Job delete (#36)
cseed Aug 29, 2018
d901524
tag executable images (#38)
danking Aug 29, 2018
a892d6e
update deployment (#39)
danking Aug 29, 2018
6d6ded3
make batch single threaded (#40)
cseed Aug 29, 2018
513d7c8
label batch job pods (#41)
cseed Aug 29, 2018
386ecdf
add hail-ci-deploy.sh (#42)
cseed Sep 11, 2018
d75b9b8
install kubectl in image (#44)
cseed Sep 11, 2018
f413878
make batch subproject (#46)
cseed Sep 12, 2018
547b05c
authenticate docker to push to gcr.io (#48)
cseed Sep 13, 2018
40da72d
expose pod name as well (#49)
danking Sep 14, 2018
e8ee2f1
restart ci on deploy (#50)
cseed Sep 17, 2018
6826a55
Fix SHA check (#51)
danking Sep 17, 2018
4fd17aa
add /jobs/<id>/log endpoint (#52)
cseed Sep 17, 2018
88c2186
prep to merge into monorepo
cseed Sep 22, 2018
34e68c4
updated build image
cseed Sep 22, 2018
541e67c
Merge remote-tracking branch 'batch/master'
cseed Sep 24, 2018
2fe67ea
fixes
cseed Sep 24, 2018
b94e01f
fixed typo
cseed Sep 24, 2018
1542c6f
fixed README.md conflict.
cseed Sep 24, 2018
4441c45
activate environment
cseed Sep 24, 2018
0e3185c
add cloudtools to list of project-changed.py projects
cseed Sep 25, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,4 @@ open-source by providing free hosting, and YourKit, LLC for generously providing
free licenses for <a href="https://www.yourkit.com/java/profiler/">YourKit Java
Profiler</a> for open-source development.

<img src="https://www.yourkit.com/images/yklogo.png" align="right" />
<img src="https://www.yourkit.com/images/yklogo.png" align="right" />
5 changes: 5 additions & 0 deletions batch/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
batch.log
logs
*~
*.pyc
**/__pycache__
5 changes: 5 additions & 0 deletions batch/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
batch.log
logs
*~
*.pyc
**/__pycache__
14 changes: 14 additions & 0 deletions batch/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM alpine:3.8

RUN apk update
RUN apk add python3 py3-cffi py3-cryptography
RUN pip3 install -U pip
RUN pip install flask
RUN pip install kubernetes
RUN pip install cerberus

COPY batch /batch

EXPOSE 5000

CMD ["python3", "/batch/server.py"]
13 changes: 13 additions & 0 deletions batch/Dockerfile.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FROM alpine:3.8

RUN apk update
RUN apk add python3 # python3=3.6.4-r1
RUN pip3 install -U pip
RUN pip install flask
RUN pip install kubernetes
RUN pip install cerberus

COPY batch /batch
COPY test /test

CMD ["python3", "-m", "unittest", "/test/test_batch.py"]
40 changes: 40 additions & 0 deletions batch/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.PHONY: hail-ci-build-image push-hail-ci-build-image

hail-ci-build-image:
docker build -t batch-pr-builder -f Dockerfile.pr-builder .
echo "gcr.io/broad-ctsa/batch-pr-builder:`docker images -q --no-trunc batch-pr-builder | sed -e 's,[^:]*:,,'`" > ../hail-ci-build-image
docker tag batch-pr-builder `cat ../hail-ci-build-image`

push-hail-ci-build-image: hail-ci-build-image
docker push `cat ../hail-ci-build-image`

build: build-batch build-batch-test

build-batch:
docker build -t batch .

build-batch-test:
docker build -t batch-test -f Dockerfile.test .

push: push-batch push-batch-test

push-batch: IMAGE="gcr.io/broad-ctsa/batch:$(shell docker images -q --no-trunc batch | sed -e 's,[^:]*:,,')"
push-batch: build-batch
echo $(IMAGE) > batch-image
docker tag batch $(IMAGE)
docker push $(IMAGE)

push-batch-test: IMAGE="gcr.io/broad-ctsa/batch-test:$(shell docker images -q --no-trunc batch-test | sed -e 's,[^:]*:,,')"
push-batch-test: build-batch-test
echo $(IMAGE) > batch-test-image
docker tag batch $(IMAGE)
docker push $(IMAGE)

run-docker:
docker run -e BATCH_USE_KUBE_CONFIG=1 -i -v $(HOME)/.kube:/root/.kube -p 5000:5000 -t batch

run:
BATCH_USE_KUBE_CONFIG=1 python batch/server.py

test-local:
POD_IP='127.0.0.1' BATCH_URL='http://127.0.0.1:5000' python -m unittest -v test/test_batch.py
196 changes: 196 additions & 0 deletions batch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
Getting Started
---

Start a `minikube` k8s cluster and configure your `kubectl` to point at that k8s
cluster:

```
minikube start
```

If you get a weird minikube error, try

```
minikube delete
rm -rf ~/.minikube
brew cask reinstall minikube # or equivalent on your OS
minikube start
```

When you want to return to using a google k8s cluster, you can run this:

```
gcloud container clusters get-credentials CLUSTER_NAME
```

Set some environment variables so that docker images are placed in the
`minikube` cluster's docker registry:

```
eval $(minikube docker-env)
```

Build the batch and test image

```
make build-batch build-test
```

edit the `deployment.yaml` so that the container named `batch` has
`imagePullPolicy: Never`. This ensures that k8s does not go look for the image
in the Google Container Registry and instead uses the local image cache (which
you just updated when you ran `make build-batch build-test`).

Give way too many privileges to the default service account so that `batch` can
start new pods:

```
kubectl create clusterrolebinding \
cluster-admin-default \
--clusterrole cluster-admin \
--serviceaccount=default:default
```

Create a batch service:

```
kubectl create -f deployment.yaml
```

If you ever need to shutdown the service, execute:

```
kubectl delete -f deployment.yaml
```

Look for the newly created batch pod:

```
kubectl get pods
```

And create a port forward from the k8s cluster to your local machine (this works
for clusters in GKE too):

```
kubectl port-forward POD_NAME 5000:5000
```

The former port is the local one and the latter port is the remote one (i.e. in
the k8s pod). Now you can load the conda environment for testing and run the
tests against this deployment:

```
conda env create -f environment.yaml
conda activate hail-batch
make test-local
```



---

Kubernetes [Python client](https://github.com/kubernetes-client/python/blob/master/kubernetes/README.md)
- [V1Pod](https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1Pod.md)
- [create_namespaced_pod](https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CoreV1Api.md#create_namespaced_pod)
- [delete_namespaced_pod](https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CoreV1Api.md#delete_namespaced_pod)
-

To get kubectl credentials for a GKE cluster:

```
$ gcloud container clusters get-credentials <cluster>
```

To authorize docker to push to GCR:

```
$ gcloud auth configure-docker
```

To run batch locally, using the local kube credentials:

```
$ docker run -i -v $HOME/.kube:/root/.kube -p 5000:5000 -t batch
```

On OSX, the port will be accessible on the docker-machine:

```
$(docker-machine ip default):5000
```

Get a shell in a running pod:

```
$ kubectl exec -it <pod> -- /bin/sh
```

Hit a Flask REST endpoint with Curl:

```
$ curl -X POST -H "Content-Type: application/json" -d <data> <url>
$ curl -X POST -H "Content-Type: application/json" -d '{"name": "batchtest", "image": "gcr.io/broad-ctsa/true"}' batch/jobs/create
```

Give default:default serviceaccount cluster-admin privileges:

```
$ kubectl create clusterrolebinding cluster-admin-default --clusterrole cluster-admin --serviceaccount=default:default
```

Run an image in a new pod:

```
$ kubectl run <name> --restart=Never --image <image> -- <cmd>
```

For example, run a shell in an new pod:

```
$ kubectl run -i --tty apline --image=alpine --restart=Never -- sh
```

Forward from a local port to a port on pod:

```
$ kubectl port-forward jupyter-deployment-5f54cff675-msr85 8888:8888 # <local port>:<remote port>
```

Run container with a given hostname:

$ docker run -d --rm --name spark-m -h spark-m -p 8080:8080 -p 7077:7077 spark-m

List all containers, included stopped containers:

$ docker ps -a

Remove all stopped containers:

$ docker ps -aq --no-trunc -f status=exited | xargs docker rm

Run a docker container linked to another:

$ docker run -d --rm --cpus 0.5 --name spark-w-0 --link spark-m spark-w -c 1 -m 2g

Get IP of container:

$ docker inspect <container-id> | grep IPAddress

---

The following will set some environment variables so that future invocations of
`docker build` will make images available to the minikube cluster. This allows
you to test images without pushing them to a remote container registry.

```
eval $(minikube docker-env)
make build-batch build-test
```

NB: you must also set the `imagePullPolicy` of any `container` you `kubectl
create` to `Never` if you're using the `:latest` image tag (which is implicitly
used if no tag is specified on the image name). Otherwise, k8s will always try
to check if there is a newer version of the image. Even if `imagePullPolicy`
is set to `NotIfPresent`, k8s will still check for a newer image if you use the
`:latest` tag.
2 changes: 2 additions & 0 deletions batch/batch/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
import batch.client
import batch.api
64 changes: 64 additions & 0 deletions batch/batch/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import json
import time
import random
import requests

def create_job(url, spec, attributes, batch_id, callback):
d = {'spec': spec}
if attributes:
d['attributes'] = attributes
if batch_id:
d['batch_id'] = batch_id
if callback:
d['callback'] = callback

r = requests.post(url + '/jobs/create', json = d)
r.raise_for_status()
return r.json()

def list_jobs(url):
r = requests.get(url + '/jobs')
r.raise_for_status()
return r.json()

def get_job(url, job_id):
r = requests.get(url + '/jobs/{}'.format(job_id))
r.raise_for_status()
return r.json()

def get_job_log(url, job_id):
r = requests.get(url + '/jobs/{}/log'.format(job_id))
r.raise_for_status()
return r.text

def delete_job(url, job_id):
r = requests.delete(url + '/jobs/{}/delete'.format(job_id))
r.raise_for_status()
return r.json()

def cancel_job(url, job_id):
r = requests.post(url + '/jobs/{}/cancel'.format(job_id))
r.raise_for_status()
return r.json()

def create_batch(url, attributes):
d = {}
if attributes:
d['attributes'] = attributes
r = requests.post(url + '/batches/create', json = d)
r.raise_for_status()
return r.json()

def get_batch(url, batch_id):
r = requests.get(url + '/batches/{}'.format(batch_id))
r.raise_for_status()
return r.json()

def delete_batch(url, batch_id):
r = requests.delete(url + '/batches/{}'.format(batch_id))
r.raise_for_status()
return r.json()

def refresh_k8s_state(url):
r = requests.post(url + '/refresh_k8s_state')
r.raise_for_status()
Loading