Skip to content

Commit

Permalink
Onboarding on openshift-ci
Browse files Browse the repository at this point in the history
These are the needed changes to have openshift-ci running the E2E tests successfully.

There are several groups of E2E tests that can be deduced from the .github/workflows/e2e-test.yaml file: fast, slow, explainer, transformer-mms, qpext, grpc, helm, raw and kourier. For ODH, the `fast`, `slow` and `grpc` groups are the ones that cover the features that are going to be supported in the initial adoption of ODH.

This commit contains the needed adaptations to the E2E tests of the `fast` and `slow` groups to successfully run them in an openshift cluster. It also adds a few scripts on test/scripts/openshift-ci to run these E2Es in the openshift-ci operator.

Some of these changes should be seen as provisional and should be rolled back:
* test/e2e/common/utils.py: because of the networking/DNS expectations, that are currently not covered in ODH's installation.
* test/e2e/predictor/*:
  * In general all changes under this path should be seen as provisional. However, since ODH won't support all ServingRuntimes, it is possible that some of the tests will stay out.
  * There are some GRPC-related tests marked as skipped. Since this work is not enabling the `grpc` group, a subsequent commit/PR for enabling GRPC E2Es should remove/revert those skip marks.
  * Also, there are some tests skipped with the `Not testable in ODH at the moment` reason. The root cause of the failure should be investigated to re-enable these tests.
* python/kserve/kserve/models/v1beta1_inference_service.py: This is injecting an annotation that is required given the specifics of OSSM/Maistra and OpenShift-Serverless that are used in ODH. This annotation is, currently, user responsibility and this was the cleanest way to add it in the E2Es. Being platform-specific, it's been discussed that this (and some other) annotation should be injected by some controller to relief the user from this responsibility. If this happens, this change should be reverted.

Also, ideally, changes to the following files should be contributed back to upstream. Those changes are not required in upstream and should have no effect, but in openshift-ci become required because a different builder image is being used:
* Dockerfile
* agent.Dockerfile

Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
  • Loading branch information
israel-hdez committed Jul 27, 2023
1 parent caea0d6 commit cffa750
Show file tree
Hide file tree
Showing 12 changed files with 470 additions and 12 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ COPY cmd/ cmd/
COPY pkg/ pkg/

# Build
RUN CGO_ENABLED=0 GOOS=linux go build -a -o manager ./cmd/manager
RUN CGO_ENABLED=0 GOOS=linux GOFLAGS=-mod=mod go build -a -o manager ./cmd/manager

# Copy the controller-manager into a thin image
FROM gcr.io/distroless/static:nonroot
Expand Down
2 changes: 1 addition & 1 deletion agent.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ COPY pkg/ pkg/
COPY cmd/ cmd/

# Build
RUN CGO_ENABLED=0 GOOS=linux go build -a -o agent ./cmd/agent
RUN CGO_ENABLED=0 GOOS=linux GOFLAGS=-mod=mod go build -a -o agent ./cmd/agent

# Copy the inference-agent into a thin image
FROM gcr.io/distroless/static:nonroot
Expand Down
2 changes: 2 additions & 0 deletions python/kserve/kserve/models/v1beta1_inference_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,8 @@ def metadata(self, metadata):
:param metadata: The metadata of this V1beta1InferenceService. # noqa: E501
:type: V1ObjectMeta
"""
if metadata is not None:
metadata.annotations = {"serving.knative.openshift.io/enablePassthrough": "true"}

self._metadata = metadata

Expand Down
19 changes: 10 additions & 9 deletions test/e2e/common/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,22 +80,23 @@ def predict_str(service_name, input_json, protocol_version="v1",
)
# temporary sleep until this is fixed https://github.com/kserve/kserve/issues/604
time.sleep(10)
cluster_ip = get_cluster_ip()
host = urlparse(isvc["status"]["url"]).netloc
path = urlparse(isvc["status"]["url"]).path
# cluster_ip = get_cluster_ip()
host = urlparse(isvc["status"]["components"]["predictor"]["url"]).netloc
path = urlparse(isvc["status"]["components"]["predictor"]["url"]).path
cluster_ip = host
headers = {"Host": host, "Content-Type": "application/json"}

if model_name is None:
model_name = service_name

url = f"http://{cluster_ip}{path}/v1/models/{model_name}:predict"
url = f"https://{cluster_ip}{path}/v1/models/{model_name}:predict"
if protocol_version == "v2":
url = f"http://{cluster_ip}{path}/v2/models/{model_name}/infer"
url = f"https://{cluster_ip}{path}/v2/models/{model_name}/infer"

logging.info("Sending Header = %s", headers)
logging.info("Sending url = %s", url)
logging.info("Sending request data: %s", input_json)
response = requests.post(url, input_json, headers=headers)
response = requests.post(url, input_json, headers=headers, verify=False)
logging.info("Got response code %s, content %s", response.status_code, response.content)
if response.status_code == 200:
preds = json.loads(response.content.decode("utf-8"))
Expand All @@ -118,7 +119,7 @@ def predict_ig(ig_name, input_json, protocol_version="v1",
)

cluster_ip = get_cluster_ip()
host = urlparse(ig["status"]["url"]).netloc
host = urlparse(ig["status"]["components"]["predictor"]["url"]).netloc
headers = {"Host": host}
url = f"http://{cluster_ip}"

Expand Down Expand Up @@ -154,7 +155,7 @@ def explain_response(service_name, input_json):
# temporary sleep until this is fixed https://github.com/kserve/kserve/issues/604
time.sleep(10)
cluster_ip = get_cluster_ip()
host = urlparse(isvc["status"]["url"]).netloc
host = urlparse(isvc["status"]["components"]["predictor"]["url"]).netloc
url = "http://{}/v1/models/{}:explain".format(cluster_ip, service_name)
headers = {"Host": host}
with open(input_json) as json_file:
Expand Down Expand Up @@ -217,7 +218,7 @@ def predict_grpc(service_name, payload, parameters=None, version=constants.KSERV
namespace=KSERVE_TEST_NAMESPACE,
version=version,
)
host = urlparse(isvc["status"]["url"]).netloc
host = urlparse(isvc["status"]["components"]["predictor"]["url"]).netloc
if ":" not in cluster_ip:
cluster_ip = cluster_ip + ":80"

Expand Down
1 change: 1 addition & 0 deletions test/e2e/predictor/test_paddle.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ def test_paddle_v2_kserve():


@pytest.mark.slow
@pytest.mark.skip("GRPC tests are failing in ODH at the moment")
def test_paddle_v2_grpc():
service_name = "isvc-paddle-v2-grpc"
model_name = "paddle"
Expand Down
5 changes: 5 additions & 0 deletions test/e2e/predictor/test_sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ def test_sklearn_v2():


@pytest.mark.slow
@pytest.mark.skip("GRPC tests are failing in ODH at the moment")
def test_sklearn_v2_grpc():
service_name = "isvc-sklearn-v2-grpc"
model_name = "sklearn"
Expand Down Expand Up @@ -254,7 +255,10 @@ def test_sklearn_v2_grpc():
kserve_client.delete(service_name, KSERVE_TEST_NAMESPACE)


# In ODH, this test generates the following response:
# Code 500 - 'ColumnTransformer' object has no attribute '_name_to_fitted_passthrough'
@pytest.mark.slow
@pytest.mark.skip("Not testable in ODH at the moment")
def test_sklearn_v2_mixed():
service_name = "isvc-sklearn-v2-mixed"
predictor = V1beta1PredictorSpec(
Expand Down Expand Up @@ -291,6 +295,7 @@ def test_sklearn_v2_mixed():


@pytest.mark.slow
@pytest.mark.skip("GRPC tests are failing in ODH at the moment")
def test_sklearn_v2_mixed_grpc():
service_name = "isvc-sklearn-v2-mixed-grpc"
model_name = "sklearn"
Expand Down
4 changes: 3 additions & 1 deletion test/e2e/predictor/test_tensorflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,10 @@ def test_tensorflow_kserve():
# Delete the InferenceService
kserve_client.delete(service_name, namespace=KSERVE_TEST_NAMESPACE)


# In ODH, this test generates the following response:
# 502 Server Error: Bad Gateway for url
@pytest.mark.slow
@pytest.mark.skip("Not testable in ODH at the moment")
def test_tensorflow_runtime_kserve():
service_name = 'isvc-tensorflow-runtime'
predictor = V1beta1PredictorSpec(
Expand Down
1 change: 1 addition & 0 deletions test/e2e/predictor/test_torchserve.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
from ..common.utils import KSERVE_TEST_NAMESPACE
from ..common import inference_pb2

pytest.skip("ODH does not support torchserve at the moment", allow_module_level=True)

@pytest.mark.slow
def test_torchserve_kserve():
Expand Down
1 change: 1 addition & 0 deletions test/e2e/predictor/test_triton.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ def test_triton():


@pytest.mark.fast
@pytest.mark.skip(reason="Not testable until the following issue is solved: https://github.com/opendatahub-io/odh-model-controller/issues/59")
def test_triton_runtime_with_transformer():
service_name = 'isvc-triton-runtime'
predictor = V1beta1PredictorSpec(
Expand Down
160 changes: 160 additions & 0 deletions test/scripts/openshift-ci/deploy.ossm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
#!/usr/bin/env bash
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -eu

waitforpodlabeled() {
local ns=${1?namespace is required}; shift
local podlabel=${1?pod label is required}; shift

echo "Waiting for pod -l $podlabel to be created"
until oc get pod -n "$ns" -l $podlabel -o=jsonpath='{.items[0].metadata.name}' >/dev/null 2>&1; do
sleep 1
done
}

waitpodready() {
local ns=${1?namespace is required}; shift
local podlabel=${1?pod label is required}; shift

waitforpodlabeled "$ns" "$podlabel"
echo "Waiting for pod -l $podlabel to become ready"
oc wait --for=condition=ready --timeout=180s pod -n $ns -l $podlabel
}


# Deploy Distributed tracing operator (Jaeger)
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: openshift-distributed-tracing
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: openshift-distributed-tracing
namespace: openshift-distributed-tracing
spec:
upgradeStrategy: Default
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: jaeger-product
namespace: openshift-distributed-tracing
spec:
channel: stable
installPlanApproval: Automatic
name: jaeger-product
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF

waitpodready "openshift-distributed-tracing" "name=jaeger-operator"

# Deploy Kiali operator
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: kiali-ossm
namespace: openshift-operators
spec:
channel: stable
installPlanApproval: Automatic
name: kiali-ossm
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF

waitpodready "openshift-operators" "app=kiali-operator"

# Deploy OSSM operator
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: servicemeshoperator
namespace: openshift-operators
spec:
channel: stable
installPlanApproval: Automatic
name: servicemeshoperator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF

waitpodready "openshift-operators" "name=istio-operator"

# Install OSSM
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: istio-system
---
apiVersion: maistra.io/v2
kind: ServiceMeshControlPlane
metadata:
name: basic
namespace: istio-system
spec:
version: v2.3
tracing:
type: Jaeger
sampling: 10000
# Uncomment if on ROSA
# security:
# identity:
# type: ThirdParty
addons:
jaeger:
name: jaeger
install:
storage:
type: Memory
kiali:
enabled: true
name: kiali
grafana:
enabled: true
EOF

# Waiting for OSSM minimum start
waitpodready "istio-system" "app=istiod"

# Create SMMR to enroll namespaces via a label. Also, set mTLS policy to strict by default.
cat <<EOF | oc apply -f -
apiVersion: maistra.io/v1
kind: ServiceMeshMemberRoll
metadata:
name: default
namespace: istio-system
spec:
memberSelectors:
- matchLabels:
testing.kserve.io/add-to-mesh: "true"
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
EOF

echo -e "\n OSSM has partially started and should be fully ready soon."
Loading

0 comments on commit cffa750

Please sign in to comment.