Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update e2e tests #2161

Merged
merged 4 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 3 additions & 18 deletions .github/workflows/integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,23 +72,8 @@ jobs:
- name: Run unit tests
run: make unit-test

- name: Build Spark-Operator Docker Image
run: make docker-build IMAGE_TAG=latest

- name: Check changes in resources used in docker file
run: |
DOCKERFILE_RESOURCES=$(cat Dockerfile | grep -P -o "COPY [a-zA-Z0-9].*? " | cut -c6-)
for resource in $DOCKERFILE_RESOURCES; do
# If the resource is different
if ! git diff --quiet origin/master -- $resource; then
## And the appVersion hasn't been updated
if ! git diff origin/master -- charts/spark-operator-chart/Chart.yaml | grep +appVersion; then
echo "resource used in docker.io/kubeflow/spark-operator has changed in $resource, need to update the appVersion in charts/spark-operator-chart/Chart.yaml"
git diff origin/master -- $resource;
echo "failing the build... " && false
fi
fi
done
- name: Build Spark operator
run: make build-operator

build-helm-chart:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -131,7 +116,7 @@ jobs:
- name: Run chart-testing (lint)
if: steps.list-changed.outputs.changed == 'true'
env:
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
run: ct lint --check-version-increment=false --target-branch $BRANCH

- name: Detect CRDs drift between chart and manifest
Expand Down
7 changes: 1 addition & 6 deletions examples/spark-pi-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,13 @@ spec:
configMap:
name: test-configmap
driver:
labels:
version: 3.5.2
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
volumeMounts:
- name: config-vol
mountPath: /opt/spark/config
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.2
instances: 1
cores: 1
memory: 512m
Expand Down
22 changes: 4 additions & 18 deletions examples/spark-pi-custom-resource.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,27 +28,13 @@ spec:
sparkVersion: 3.5.2
restartPolicy:
type: Never
volumes:
- name: test-volume
hostPath:
path: /tmp
type: Directory
driver:
labels:
version: 3.5.2
cores: 1
coreLimit: 1200m
coreRequest: "0.5"
coreLimit: 800m
memory: 512m
serviceAccount: spark-operator-spark
volumeMounts:
- name: test-volume
mountPath: /tmp
executor:
labels:
version: 3.5.2
instances: 1
cores: 1
coreRequest: "1200m"
coreLimit: 1500m
memory: 512m
volumeMounts:
- name: test-volume
mountPath: /tmp
8 changes: 0 additions & 8 deletions examples/spark-pi-dynamic-allocation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,21 +26,13 @@ spec:
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
sparkVersion: 3.5.2
arguments:
- "50000"
driver:
labels:
version: 3.5.2
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.2
instances: 1
cores: 1
coreLimit: 1200m
memory: 512m
dynamicAllocation:
enabled: true
Expand Down
6 changes: 0 additions & 6 deletions examples/spark-pi-kube-scheduler.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,11 @@ spec:
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
sparkVersion: 3.5.2
driver:
labels:
version: 3.5.2
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.2
instances: 2
cores: 1
coreLimit: 1200m
memory: 512m
batchScheduler: kube-scheduler
1 change: 0 additions & 1 deletion examples/spark-pi-prometheus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ spec:
type: Never
driver:
cores: 1
coreLimit: 1200m
memory: 512m
labels:
version: 3.1.1
Expand Down
6 changes: 0 additions & 6 deletions examples/spark-pi-python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,10 @@ spec:
mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
sparkVersion: 3.5.2
driver:
labels:
version: 3.5.2
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.2
instances: 1
cores: 1
coreLimit: 1200m
memory: 512m
6 changes: 0 additions & 6 deletions examples/spark-pi-scheduled.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,10 @@ spec:
restartPolicy:
type: Never
driver:
labels:
version: 3.5.2
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.2
instances: 1
cores: 1
coreLimit: 1200m
memory: 512m
6 changes: 0 additions & 6 deletions examples/spark-pi-volcano.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,11 @@ spec:
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
sparkVersion: 3.5.2
driver:
labels:
version: 3.5.2
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.2
instances: 2
cores: 1
coreLimit: 1200m
memory: 512m
batchScheduler: volcano
6 changes: 0 additions & 6 deletions examples/spark-pi-yunikorn.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,12 @@ spec:
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
sparkVersion: 3.5.2
driver:
labels:
version: 3.5.2
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.2
instances: 2
cores: 1
coreLimit: 1200m
memory: 512m
batchScheduler: yunikorn
batchSchedulerOptions:
Expand Down
4 changes: 2 additions & 2 deletions examples/spark-pi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,18 @@ spec:
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
arguments:
- "5000"
sparkVersion: 3.5.2
driver:
labels:
version: 3.5.2
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.2
instances: 1
cores: 1
coreLimit: 1200m
memory: 512m
57 changes: 54 additions & 3 deletions test/e2e/sparkapplication_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,14 @@ import (
. "github.com/onsi/gomega"

corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/equality"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/yaml"

"github.com/kubeflow/spark-operator/api/v1beta2"
"github.com/kubeflow/spark-operator/pkg/common"
"github.com/kubeflow/spark-operator/pkg/util"
)

Expand Down Expand Up @@ -130,13 +133,43 @@ var _ = Describe("Example SparkApplication", func() {
}
})

It("Should complete successfully", func() {
It("Should complete successfully with configmap mounted", func() {
By("Waiting for SparkApplication to complete")
key := types.NamespacedName{Namespace: app.Namespace, Name: app.Name}
Expect(waitForSparkApplicationCompleted(ctx, key)).NotTo(HaveOccurred())

By("Checking out driver logs")
By("Checking out whether volumes are mounted to driver pod")
driverPodName := util.GetDriverPodName(app)
driverPodKey := types.NamespacedName{Namespace: app.Namespace, Name: driverPodName}
driverPod := &corev1.Pod{}
Expect(k8sClient.Get(ctx, driverPodKey, driverPod)).NotTo(HaveOccurred())
hasVolumes := false
hasVolumeMounts := false
for _, volume := range app.Spec.Volumes {
for _, podVolume := range driverPod.Spec.Volumes {
if volume.Name == podVolume.Name {
hasVolumes = true
break
}
}
}
for _, volumeMount := range app.Spec.Driver.VolumeMounts {
for _, container := range driverPod.Spec.Containers {
if container.Name != common.SparkDriverContainerName {
continue
}
for _, podVolumeMount := range container.VolumeMounts {
if equality.Semantic.DeepEqual(volumeMount, podVolumeMount) {
hasVolumeMounts = true
break
}
}
}
}
Expect(hasVolumes).To(BeTrue())
Expect(hasVolumeMounts).To(BeTrue())

By("Checking out driver logs")
bytes, err := clientset.CoreV1().Pods(app.Namespace).GetLogs(driverPodName, &corev1.PodLogOptions{}).Do(ctx).Raw()
Expect(err).NotTo(HaveOccurred())
Expect(bytes).NotTo(BeEmpty())
Expand Down Expand Up @@ -176,8 +209,26 @@ var _ = Describe("Example SparkApplication", func() {
key := types.NamespacedName{Namespace: app.Namespace, Name: app.Name}
Expect(waitForSparkApplicationCompleted(ctx, key)).NotTo(HaveOccurred())

By("Checking out driver logs")
By("Checking out whether resource requests and limits of driver pod are set")
driverPodName := util.GetDriverPodName(app)
driverPodKey := types.NamespacedName{Namespace: app.Namespace, Name: driverPodName}
driverPod := &corev1.Pod{}
Expect(k8sClient.Get(ctx, driverPodKey, driverPod)).NotTo(HaveOccurred())
for _, container := range driverPod.Spec.Containers {
if container.Name != common.SparkDriverContainerName {
continue
}
if app.Spec.Driver.CoreRequest != nil {
Expect(container.Resources.Requests.Cpu().Equal(resource.MustParse(*app.Spec.Driver.CoreRequest))).To(BeTrue())
}
if app.Spec.Driver.CoreLimit != nil {
Expect(container.Resources.Limits.Cpu().Equal(resource.MustParse(*app.Spec.Driver.CoreLimit))).To(BeTrue())
}
Expect(container.Resources.Requests.Memory).NotTo(BeNil())
Expect(container.Resources.Limits.Memory).NotTo(BeNil())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: was there any specific reason to add this assertion? The request and limit fields on the pods are set via spark-submit args and aren't patched by the webhook issue that I believe you raised this PR as part of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the cpu/memory requests/limits are not patched by the webhook. I added these assertions just to ensure that spark-pi-custom-resouce.yaml works as expected in setting resource requests/limits correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing the coreLimit field from the existing example applications and creating a new application that specifies both coreRequest and coreLimit, creating a new end-to-end test for that, then moving this assertion logic to that test? I feel like that captures the intention better but open to your opinion too obviously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobsalway Sounds good. I have removed the coreLimit field from all the example applications except for spark-pi-custom-resource.yaml. For that, I added the coreRequest and coreLimit fields to it.


By("Checking out driver logs")
bytes, err := clientset.CoreV1().Pods(app.Namespace).GetLogs(driverPodName, &corev1.PodLogOptions{}).Do(ctx).Raw()
Expect(err).NotTo(HaveOccurred())
Expect(bytes).NotTo(BeEmpty())
Expand Down
2 changes: 2 additions & 0 deletions test/e2e/suit_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@ var _ = BeforeSuite(func() {
validatingWebhookKey := types.NamespacedName{Name: ValidatingWebhookName}
Expect(waitForMutatingWebhookReady(context.Background(), mutatingWebhookKey)).NotTo(HaveOccurred())
Expect(waitForValidatingWebhookReady(context.Background(), validatingWebhookKey)).NotTo(HaveOccurred())
// TODO: Remove this when there is a better way to ensure the webhooks are ready before running the e2e tests.
time.Sleep(10 * time.Second)
Copy link
Member

@jacobsalway jacobsalway Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as a temporary workaround. Two ways we can solve this in the future:

  • Remove the dependency on the webhook altogether by constructing pod template files on the fly from the existing SparkApplication spec. Maybe via a ConfigMap into the driver pod?
  • If the webhook is enabled, the controller should only start reconciling if the webhook is ready to serve mutation requests.

})

var _ = AfterSuite(func() {
Expand Down