Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update e2e tests #2161

Merged
merged 4 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 3 additions & 18 deletions .github/workflows/integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,23 +72,8 @@ jobs:
- name: Run unit tests
run: make unit-test

- name: Build Spark-Operator Docker Image
run: make docker-build IMAGE_TAG=latest

- name: Check changes in resources used in docker file
run: |
DOCKERFILE_RESOURCES=$(cat Dockerfile | grep -P -o "COPY [a-zA-Z0-9].*? " | cut -c6-)
for resource in $DOCKERFILE_RESOURCES; do
# If the resource is different
if ! git diff --quiet origin/master -- $resource; then
## And the appVersion hasn't been updated
if ! git diff origin/master -- charts/spark-operator-chart/Chart.yaml | grep +appVersion; then
echo "resource used in docker.io/kubeflow/spark-operator has changed in $resource, need to update the appVersion in charts/spark-operator-chart/Chart.yaml"
git diff origin/master -- $resource;
echo "failing the build... " && false
fi
fi
done
- name: Build Spark operator
run: make build-operator

build-helm-chart:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -131,7 +116,7 @@ jobs:
- name: Run chart-testing (lint)
if: steps.list-changed.outputs.changed == 'true'
env:
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
run: ct lint --check-version-increment=false --target-branch $BRANCH

- name: Detect CRDs drift between chart and manifest
Expand Down
57 changes: 54 additions & 3 deletions test/e2e/sparkapplication_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,14 @@ import (
. "github.com/onsi/gomega"

corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/equality"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/yaml"

"github.com/kubeflow/spark-operator/api/v1beta2"
"github.com/kubeflow/spark-operator/pkg/common"
"github.com/kubeflow/spark-operator/pkg/util"
)

Expand Down Expand Up @@ -130,13 +133,43 @@ var _ = Describe("Example SparkApplication", func() {
}
})

It("Should complete successfully", func() {
It("Should complete successfully with configmap mounted", func() {
By("Waiting for SparkApplication to complete")
key := types.NamespacedName{Namespace: app.Namespace, Name: app.Name}
Expect(waitForSparkApplicationCompleted(ctx, key)).NotTo(HaveOccurred())

By("Checking out driver logs")
By("Checking out whether volumes are mounted to driver pod")
driverPodName := util.GetDriverPodName(app)
driverPodKey := types.NamespacedName{Namespace: app.Namespace, Name: driverPodName}
driverPod := &corev1.Pod{}
Expect(k8sClient.Get(ctx, driverPodKey, driverPod)).NotTo(HaveOccurred())
hasVolumes := false
hasVolumeMounts := false
for _, volume := range app.Spec.Volumes {
for _, podVolume := range driverPod.Spec.Volumes {
if volume.Name == podVolume.Name {
hasVolumes = true
break
}
}
}
for _, volumeMount := range app.Spec.Driver.VolumeMounts {
for _, container := range driverPod.Spec.Containers {
if container.Name != common.SparkDriverContainerName {
continue
}
for _, podVolumeMount := range container.VolumeMounts {
if equality.Semantic.DeepEqual(volumeMount, podVolumeMount) {
hasVolumeMounts = true
break
}
}
}
}
Expect(hasVolumes).To(BeTrue())
Expect(hasVolumeMounts).To(BeTrue())

By("Checking out driver logs")
bytes, err := clientset.CoreV1().Pods(app.Namespace).GetLogs(driverPodName, &corev1.PodLogOptions{}).Do(ctx).Raw()
Expect(err).NotTo(HaveOccurred())
Expect(bytes).NotTo(BeEmpty())
Expand Down Expand Up @@ -176,8 +209,26 @@ var _ = Describe("Example SparkApplication", func() {
key := types.NamespacedName{Namespace: app.Namespace, Name: app.Name}
Expect(waitForSparkApplicationCompleted(ctx, key)).NotTo(HaveOccurred())

By("Checking out driver logs")
By("Checking out whether resource requests and limits of driver pod are set")
driverPodName := util.GetDriverPodName(app)
driverPodKey := types.NamespacedName{Namespace: app.Namespace, Name: driverPodName}
driverPod := &corev1.Pod{}
Expect(k8sClient.Get(ctx, driverPodKey, driverPod)).NotTo(HaveOccurred())
for _, container := range driverPod.Spec.Containers {
if container.Name != common.SparkDriverContainerName {
continue
}
if app.Spec.Driver.CoreRequest != nil {
Expect(container.Resources.Requests.Cpu().Equal(resource.MustParse(*app.Spec.Driver.CoreRequest))).To(BeTrue())
}
if app.Spec.Driver.CoreLimit != nil {
Expect(container.Resources.Limits.Cpu().Equal(resource.MustParse(*app.Spec.Driver.CoreLimit))).To(BeTrue())
}
Expect(container.Resources.Requests.Memory).NotTo(BeNil())
Expect(container.Resources.Limits.Memory).NotTo(BeNil())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: was there any specific reason to add this assertion? The request and limit fields on the pods are set via spark-submit args and aren't patched by the webhook issue that I believe you raised this PR as part of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the cpu/memory requests/limits are not patched by the webhook. I added these assertions just to ensure that spark-pi-custom-resouce.yaml works as expected in setting resource requests/limits correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing the coreLimit field from the existing example applications and creating a new application that specifies both coreRequest and coreLimit, creating a new end-to-end test for that, then moving this assertion logic to that test? I feel like that captures the intention better but open to your opinion too obviously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobsalway Sounds good. I have removed the coreLimit field from all the example applications except for spark-pi-custom-resource.yaml. For that, I added the coreRequest and coreLimit fields to it.


By("Checking out driver logs")
bytes, err := clientset.CoreV1().Pods(app.Namespace).GetLogs(driverPodName, &corev1.PodLogOptions{}).Do(ctx).Raw()
Expect(err).NotTo(HaveOccurred())
Expect(bytes).NotTo(BeEmpty())
Expand Down
2 changes: 2 additions & 0 deletions test/e2e/suit_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@ var _ = BeforeSuite(func() {
validatingWebhookKey := types.NamespacedName{Name: ValidatingWebhookName}
Expect(waitForMutatingWebhookReady(context.Background(), mutatingWebhookKey)).NotTo(HaveOccurred())
Expect(waitForValidatingWebhookReady(context.Background(), validatingWebhookKey)).NotTo(HaveOccurred())
// TODO: Remove this when there is a better way to ensure the webhooks are ready before running the e2e tests.
time.Sleep(10 * time.Second)
Copy link
Member

@jacobsalway jacobsalway Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as a temporary workaround. Two ways we can solve this in the future:

  • Remove the dependency on the webhook altogether by constructing pod template files on the fly from the existing SparkApplication spec. Maybe via a ConfigMap into the driver pod?
  • If the webhook is enabled, the controller should only start reconciling if the webhook is ready to serve mutation requests.

})

var _ = AfterSuite(func() {
Expand Down