Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend]Initial execution cache #3036

Merged
merged 54 commits into from
Mar 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
6994984
Initial execution cache
rui5i Feb 10, 2020
19a61e7
Add initial server logic
rui5i Feb 10, 2020
0448a12
Add const
rui5i Feb 10, 2020
d2fad48
Change folder name
rui5i Feb 10, 2020
c561ca4
Change execution key name
rui5i Feb 11, 2020
55ee8a7
Fix unit test
rui5i Feb 11, 2020
0f83144
Add Dockerfile and OWNERS file
rui5i Feb 14, 2020
1ecc855
fix go.sum
rui5i Feb 15, 2020
1b5b681
Add local deployment scripts
rui5i Feb 18, 2020
83126c7
Merge branch 'execution_cache' of https://github.com/rui5i/pipelines …
rui5i Feb 18, 2020
ae472d7
refactor src code
rui5i Feb 19, 2020
8c0cac8
Add standalone deployment scripts and yamls
rui5i Feb 20, 2020
cb075b2
Minor fix
rui5i Feb 20, 2020
4b0831a
Add execution cache image build in test folder
rui5i Feb 20, 2020
e1bedf7
Merge branch 'master' into execution_cache
rui5i Feb 20, 2020
ed1aa0e
fix test cloudbuild
rui5i Feb 20, 2020
3b1323c
Fix cloudbuild
rui5i Feb 20, 2020
da3eb42
Add execution cache deployer image to test folder
rui5i Feb 20, 2020
37521ec
Add copyright
rui5i Feb 20, 2020
5efe589
Fix deployer build
rui5i Feb 20, 2020
fa12c40
Add license for execution cache and cloudbuild for execution cache
rui5i Feb 21, 2020
3c302af
Refactor license intermediate data
rui5i Feb 21, 2020
17f714e
Fix execution cache image manifest
rui5i Feb 21, 2020
79e4711
Typo fix for cache and cache deployer images
rui5i Feb 21, 2020
a16335c
Add arguments in ca generation scripts and change deployer base image…
rui5i Feb 22, 2020
bb796c1
minor fix
rui5i Feb 22, 2020
db797b3
fix arg
rui5i Feb 22, 2020
57523c0
Mirror source code with MPL in execution_cache image
rui5i Feb 24, 2020
893e8b1
Minor fix
rui5i Feb 24, 2020
851ce25
minor refactor on error handling
rui5i Feb 24, 2020
1720ee6
Refactor cache source code, Docker image and manifest
rui5i Feb 25, 2020
f482c9f
Fix variable names
rui5i Feb 26, 2020
6523570
Add images in .release.cloudbuild.yaml
rui5i Feb 26, 2020
5c5298e
resolve merge conflict
rui5i Feb 27, 2020
914d9fd
Change execution_cache to generic name
rui5i Feb 27, 2020
4e7e3a9
revice readme
rui5i Feb 27, 2020
ad560f4
Move deployer job out of upgrade script
rui5i Feb 28, 2020
ba26e48
fix tests
rui5i Feb 28, 2020
065af7a
fix tests
rui5i Feb 28, 2020
a8aa15c
Seperate cache service and cache deployer job
rui5i Feb 28, 2020
e7aefad
mysql set up
rui5i Feb 28, 2020
5cebf4e
Delete cache service in manifest, only test in presubmit tests
rui5i Mar 2, 2020
2ccf3b3
fix
rui5i Mar 2, 2020
5b65988
Merge branch 'master' of https://github.com/kubeflow/pipelines into e…
rui5i Mar 2, 2020
2175fb8
fix presubmit tests
rui5i Mar 2, 2020
65c4f53
fix
rui5i Mar 2, 2020
a147e19
fix
rui5i Mar 2, 2020
b282938
Merge branch 'master' of https://github.com/kubeflow/pipelines into e…
rui5i Mar 3, 2020
75b0c0a
Merge branch 'execution_cache' of https://github.com/rui5i/pipelines …
rui5i Mar 3, 2020
a13c318
revert unnecessary change
rui5i Mar 3, 2020
86ebe17
fix cache image tag
rui5i Mar 3, 2020
e7ba5dc
change image gcr to ml-pipeline-test
rui5i Mar 3, 2020
60972bd
Remove namespace in standalone manifest and add to test manifest
rui5i Mar 3, 2020
e33cef8
Merge branch 'master' of https://github.com/kubeflow/pipelines into e…
rui5i Mar 3, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .cloudbuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,18 @@ steps:
'--build-arg', 'COMMIT_HASH=$COMMIT_SHA', '-f',
'/workspace/backend/metadata_writer/Dockerfile', '/workspace']
waitFor: ["-"]
- id: 'buildCacheServer'
name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/$PROJECT_ID/cache-server:$COMMIT_SHA',
'--build-arg', 'COMMIT_HASH=$COMMIT_SHA', '-f',
'/workspace/backend/Dockerfile.cacheserver', '/workspace']
waitFor: ["-"]
- id: 'buildCacheDeployer'
name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/$PROJECT_ID/cache-deployer:$COMMIT_SHA',
'--build-arg', 'COMMIT_HASH=$COMMIT_SHA', '-f',
'/workspace/backend/src/cache/deployer/Dockerfile', '/workspace']
waitFor: ["-"]

# Build marketplace deployer
- id: 'buildMarketplaceDeployer'
Expand Down Expand Up @@ -215,6 +227,8 @@ images:
- 'gcr.io/$PROJECT_ID/inverse-proxy-agent:$COMMIT_SHA'
- 'gcr.io/$PROJECT_ID/visualization-server:$COMMIT_SHA'
- 'gcr.io/$PROJECT_ID/metadata-writer:$COMMIT_SHA'
- 'gcr.io/$PROJECT_ID/cache-server:$COMMIT_SHA'
- 'gcr.io/$PROJECT_ID/cache-deployer:$COMMIT_SHA'

# Images for Marketplace
- 'gcr.io/$PROJECT_ID/deployer:$COMMIT_SHA'
Expand Down
68 changes: 68 additions & 0 deletions .release.cloudbuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,70 @@ steps:
docker push gcr.io/ml-pipeline/google/pipelines/metadatawriter:$(cat /workspace/mm.ver)
docker push gcr.io/ml-pipeline/google/pipelines-test/metadatawriter:$(cat /workspace/mm.ver)

- id: 'pullCacheServer'
name: 'gcr.io/cloud-builders/docker'
args: ['pull', 'gcr.io/$PROJECT_ID/cache-server:$COMMIT_SHA']
waitFor: ['-']
- id: 'tagCacheServerVersionNumber'
name: 'gcr.io/cloud-builders/docker'
args: ['tag', 'gcr.io/$PROJECT_ID/cache-server:$COMMIT_SHA', 'gcr.io/ml-pipeline/cache-server:$TAG_NAME']
waitFor: ['pullCacheServer']
- id: 'tagCacheServerCommitSHA'
name: 'gcr.io/cloud-builders/docker'
args: ['tag', 'gcr.io/$PROJECT_ID/cache-server:$COMMIT_SHA', 'gcr.io/ml-pipeline/cache-server:$COMMIT_SHA']
waitFor: ['pullCacheServer']
- id: 'tagCacheServerForMarketplace'
name: 'gcr.io/cloud-builders/docker'
args: ['tag', 'gcr.io/$PROJECT_ID/cache-server:$COMMIT_SHA', 'gcr.io/ml-pipeline/google/pipelines/cacheserver:$TAG_NAME']
waitFor: ['pullCacheServer']
- id: 'tagCacheServerForMarketplaceTest'
name: 'gcr.io/cloud-builders/docker'
args: ['tag', 'gcr.io/$PROJECT_ID/cache-server:$COMMIT_SHA', 'gcr.io/ml-pipeline/google/pipelines-test/cacheserver:$TAG_NAME']
waitFor: ['pullCacheServer']
- id: 'tagCacheServerForMarketplaceMajorMinor'
waitFor: ['pullCacheServer', 'parseMajorMinorVersion']
name: 'gcr.io/cloud-builders/docker'
entrypoint: bash
args:
- -ceux
- |
docker tag gcr.io/$PROJECT_ID/cache-server:$COMMIT_SHA gcr.io/ml-pipeline/google/pipelines/cacheserver:$(cat /workspace/mm.ver)
docker tag gcr.io/$PROJECT_ID/cache-server:$COMMIT_SHA gcr.io/ml-pipeline/google/pipelines-test/cacheserver:$(cat /workspace/mm.ver)
docker push gcr.io/ml-pipeline/google/pipelines/cacheserver:$(cat /workspace/mm.ver)
docker push gcr.io/ml-pipeline/google/pipelines-test/cacheserver:$(cat /workspace/mm.ver)

- id: 'pullCacheDeployer'
name: 'gcr.io/cloud-builders/docker'
args: ['pull', 'gcr.io/$PROJECT_ID/cache-deployer:$COMMIT_SHA']
waitFor: ['-']
- id: 'tagCacheDeployerVersionNumber'
name: 'gcr.io/cloud-builders/docker'
args: ['tag', 'gcr.io/$PROJECT_ID/cache-deployer:$COMMIT_SHA', 'gcr.io/ml-pipeline/cache-deployer:$TAG_NAME']
waitFor: ['pullCacheDeployer']
- id: 'tagCacheDeployerCommitSHA'
name: 'gcr.io/cloud-builders/docker'
args: ['tag', 'gcr.io/$PROJECT_ID/cache-deployer:$COMMIT_SHA', 'gcr.io/ml-pipeline/cache-deployer:$COMMIT_SHA']
waitFor: ['pullCacheDeployer']
- id: 'tagCacheDeployerForMarketplace'
name: 'gcr.io/cloud-builders/docker'
args: ['tag', 'gcr.io/$PROJECT_ID/cache-deployer:$COMMIT_SHA', 'gcr.io/ml-pipeline/google/pipelines/cachedeployer:$TAG_NAME']
waitFor: ['pullCacheDeployer']
- id: 'tagCacheDeployerForMarketplaceTest'
name: 'gcr.io/cloud-builders/docker'
args: ['tag', 'gcr.io/$PROJECT_ID/cache-deployer:$COMMIT_SHA', 'gcr.io/ml-pipeline/google/pipelines-test/cachedeployer:$TAG_NAME']
waitFor: ['pullCacheDeployer']
- id: 'tagCacheDeployerForMarketplaceMajorMinor'
waitFor: ['pullCacheDeployer', 'parseMajorMinorVersion']
name: 'gcr.io/cloud-builders/docker'
entrypoint: bash
args:
- -ceux
- |
docker tag gcr.io/$PROJECT_ID/cache-deployer:$COMMIT_SHA gcr.io/ml-pipeline/google/pipelines/cachedeployer:$(cat /workspace/mm.ver)
docker tag gcr.io/$PROJECT_ID/cache-deployer:$COMMIT_SHA gcr.io/ml-pipeline/google/pipelines-test/cachedeployer:$(cat /workspace/mm.ver)
docker push gcr.io/ml-pipeline/google/pipelines/cachedeployer:$(cat /workspace/mm.ver)
docker push gcr.io/ml-pipeline/google/pipelines-test/cachedeployer:$(cat /workspace/mm.ver)

rui5i marked this conversation as resolved.
Show resolved Hide resolved
- name: 'gcr.io/cloud-builders/docker'
args: ['pull', 'gcr.io/$PROJECT_ID/metadata-envoy:$COMMIT_SHA']
id: 'pullMetadataEnvoy'
Expand Down Expand Up @@ -558,6 +622,8 @@ images:
- 'gcr.io/ml-pipeline/google/pipelines/metadataenvoy:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines/metadatawriter:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines/deployer:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines/cacheserver:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines/cachedeployer:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines-test/frontend:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines-test/apiserver:$TAG_NAME'
Expand All @@ -574,6 +640,8 @@ images:
- 'gcr.io/ml-pipeline/google/pipelines-test/argoworkflowcontroller:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines-test/metadataenvoy:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines-test/metadatawriter:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines-test/cacheserver:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines-test/cachedeployer:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines-test/deployer:$TAG_NAME'
- 'gcr.io/ml-pipeline/google/pipelines-test:$TAG_NAME'
timeout: '1200s'
Expand Down
20 changes: 20 additions & 0 deletions backend/Dockerfile.cacheserver
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Dockerfile for building the source code of cache_server
FROM golang:1.11-alpine3.7 as builder

RUN apk update && apk upgrade && \
apk add --no-cache bash git openssh gcc musl-dev

WORKDIR /go/src/github.com/kubeflow/pipelines
COPY . .

RUN GO111MODULE=on go build -o /bin/cache_server backend/src/cache/*.go
RUN git clone https://github.com/hashicorp/golang-lru.git /kfp/cache/golang-lru/

FROM alpine:3.8
WORKDIR /bin

COPY --from=builder /bin/cache_server /bin/cache_server
COPY --from=builder /go/src/github.com/kubeflow/pipelines/third_party/license.txt /bin/license.txt
COPY --from=builder /kfp/cache/golang-lru/* /bin/golang-lru/

ENTRYPOINT [ "/bin/cache_server" ]
6 changes: 6 additions & 0 deletions backend/src/cache/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
approvers:
- Ark-kun
- rui5i
reviewers:
- Ark-kun
- rui5i
23 changes: 23 additions & 0 deletions backend/src/cache/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## Build src image
To build the Docker image of cache server, run the following Docker command from the pipelines directory:

```
docker build -t gcr.io/ml-pipeline/cache-server:latest -f backend/Dockerfile.cacheserver .
```

## Deploy cache service to an existing KFP deployment
1. Configure kubectl to talk to your newly created cluster. Refer to [Configuring cluster access for kubectl](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl).
2. Run deploy shell script to generate certificates and create MutatingWebhookConfiguration:

```
# Assume KFP is deployed in the namespace kubeflow
export NAMESPACE=kubeflow
./deployer/deploy-cache-service.sh
```

3. Go to pipelines/manifests/kustomize/base/cache folder and run following scripts:

```
kubectl apply -f cache-deployment.yaml --namespace $NAMESPACE
kubectl apply -f cache-service.yaml --namespace $NAMESPACE
```
178 changes: 178 additions & 0 deletions backend/src/cache/admission.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
// Copyright 2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package main

import (
"encoding/json"
"errors"
"fmt"
"io/ioutil"
"log"
"net/http"

"k8s.io/api/admission/v1beta1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/serializer"
"k8s.io/apimachinery/pkg/types"
)

type OperationType string

const (
OperationTypeAdd OperationType = "add"
)

// patchOperation is an operation of a JSON patch, see https://tools.ietf.org/html/rfc6902 .
type patchOperation struct {
Op OperationType `json:"op"`
Path string `json:"path"`
Value interface{} `json:"value,omitempty"`
}

// admitFunc is a callback for admission controller logic. Given an AdmissionRequest, it returns the sequence of patch
// operations to be applied in case of success, or the error that will be shown when the operation is rejected.
type admitFunc func(*v1beta1.AdmissionRequest) ([]patchOperation, error)

const (
ContentType string = "Content-Type"
JsonContentType string = "application/json"
)

var (
universalDeserializer = serializer.NewCodecFactory(runtime.NewScheme()).UniversalDeserializer()
)

// isKubeNamespace checks if the given namespace is a Kubernetes-owned namespace.
func isKubeNamespace(ns string) bool {
return ns == metav1.NamespacePublic || ns == metav1.NamespaceSystem
}

// doServeAdmitFunc parses the HTTP request for an admission controller webhook, and -- in case of a well-formed
// request -- delegates the admission control logic to the given admitFunc. The response body is then returned as raw
// bytes.
func doServeAdmitFunc(w http.ResponseWriter, r *http.Request, admit admitFunc) ([]byte, error) {
// Step 1: Request validation. Only handle POST requests with a body and json content type.

if r.Method != http.MethodPost {
w.WriteHeader(http.StatusMethodNotAllowed)
return nil, fmt.Errorf("Invalid method %q, only POST requests are allowed", r.Method)
}

body, err := ioutil.ReadAll(r.Body)
if err != nil {
w.WriteHeader(http.StatusBadRequest)
return nil, fmt.Errorf("Could not read request body: %v", err)
}

if contentType := r.Header.Get(ContentType); contentType != JsonContentType {
w.WriteHeader(http.StatusBadRequest)
return nil, fmt.Errorf("Unsupported content type %q, only %q is supported", contentType, JsonContentType)
}

// Step 2: Parse the AdmissionReview request.

var admissionReviewReq v1beta1.AdmissionReview

_, _, err = universalDeserializer.Decode(body, nil, &admissionReviewReq)

if err != nil {
w.WriteHeader(http.StatusBadRequest)
return nil, fmt.Errorf("Could not deserialize request: %v", err)
}
if admissionReviewReq.Request == nil {
w.WriteHeader(http.StatusBadRequest)
return nil, errors.New("Malformed admission review request: request body is nil")
}

// Step 3: Construct the AdmissionReview response.

// Apply the admit() function only for non-Kubernetes namespaces. For objects in Kubernetes namespaces, return
// an empty set of patch operations.
if isKubeNamespace(admissionReviewReq.Request.Namespace) {
return allowedResponse(admissionReviewReq.Request.UID, nil), nil
}

var patchOps []patchOperation

patchOps, err = admit(admissionReviewReq.Request)
if err != nil {
return errorResponse(admissionReviewReq.Request.UID, err), nil
}

patchBytes, err := json.Marshal(patchOps)
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
return nil, fmt.Errorf("Could not marshal JSON patch: %v", err)
}

return allowedResponse(admissionReviewReq.Request.UID, patchBytes), nil
}

// serveAdmitFunc is a wrapper around doServeAdmitFunc that adds error handling and logging.
func serveAdmitFunc(w http.ResponseWriter, r *http.Request, admit admitFunc) {
log.Print("Handling webhook request ...")

var writeErr error
if bytes, err := doServeAdmitFunc(w, r, admit); err != nil {
log.Printf("Error handling webhook request: %v", err)
w.WriteHeader(http.StatusInternalServerError)
_, writeErr = w.Write([]byte(err.Error()))
} else {
log.Print("Webhook request handled successfully")
_, writeErr = w.Write(bytes)
}

if writeErr != nil {
log.Printf("Could not write response: %v", writeErr)
}
}

// admitFuncHandler takes an admitFunc and wraps it into a http.Handler by means of calling serveAdmitFunc.
func admitFuncHandler(admit admitFunc) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
serveAdmitFunc(w, r, admit)
})
}

func allowedResponse(uid types.UID, patchBytes []byte) []byte {
admissionReviewResponse := v1beta1.AdmissionReview{
Response: &v1beta1.AdmissionResponse{
UID: uid,
},
}
admissionReviewResponse.Response.Allowed = true
admissionReviewResponse.Response.Patch = patchBytes

bytes, err := json.Marshal(&admissionReviewResponse)
if err != nil {
return errorResponse(uid, err)
}
return bytes
}

func errorResponse(uid types.UID, err error) []byte {
admissionReviewResponse := v1beta1.AdmissionReview{
Response: &v1beta1.AdmissionResponse{
UID: uid,
},
}
admissionReviewResponse.Response.Allowed = false
admissionReviewResponse.Response.Result = &metav1.Status{
Message: err.Error(),
}
bytes, _ := json.Marshal(&admissionReviewResponse)
return bytes
}
Loading