Track driver deploy time in e2e test pipeline #815

AndyXiangLi · 2021-03-23T23:38:46Z

Is this a bug fix or adding new feature?
Fixes #804
What is this PR about? / Why do we need it?
Add driver start time info during e2e test. so we have better understanding on driver's behavior.
Set threshold as 20s for now, may adjust as we have more info.

k8s-ci-robot · 2021-03-23T23:38:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndyXiangLi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [AndyXiangLi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coveralls · 2021-03-23T23:43:13Z

Pull Request Test Coverage Report for Build 1765

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 81.789%

Totals
Change from base Build 1757:	0.0%
Covered Lines:	1756
Relevant Lines:	2147

💛 - Coveralls

wongma7 · 2021-03-24T17:52:13Z

hack/e2e/run.sh

@@ -116,6 +118,15 @@ if [[ -r "${EBS_SNAPSHOT_CRD}" ]]; then
  kubectl apply -f "$EBS_SNAPSHOT_CRD"
  # TODO deploy snapshot controller too instead of including in helm chart
 fi
+endSec=$(date +'%s')


have to wait for the pods to become ready. not so easy in bash to be honest.

Yeah agree, I did some research but no luck to find any tool to track container start up time.
But one thing is if we use helm --wait flag, helm will wait for containersReady condition before exit. IMO It would be good enough as for now.

And open for any suggestion to track this info :-)

ooh yeah let's use helm built-in functionality --wait flag, that is cool if they have it.

Otherwise I was thinking of how to call a python/go script from here and use the kube python/go client, which would be really ugly...

Maybe we can use cloudwatch? The tests run in an AWS-internal account. And if running locally, i don't mind pushing metrics to my own account. We can fail-open in case for whatever reason the metric push fails (transient cloudwatch issue or whatever)

That's a good find. Does it wait for everything deployed by the chart to be ready?

Ideally there is a way to expose the metric publicly though, cloudwatch wont be visible like https://testgrid.k8s.io/provider-aws-efs-csi-driver#e2e-test&width=20 is.

Yes, it will wait for all the containers to be ready
https://helm.sh/docs/intro/using_helm/

ayberk · 2021-03-24T18:42:20Z

hack/e2e/run.sh

@@ -116,6 +118,15 @@ if [[ -r "${EBS_SNAPSHOT_CRD}" ]]; then
  kubectl apply -f "$EBS_SNAPSHOT_CRD"
  # TODO deploy snapshot controller too instead of including in helm chart
 fi
+endSec=$(date +'%s')


That's a good find. Does it wait for everything deployed by the chart to be ready?

ayberk · 2021-03-24T18:42:44Z

hack/e2e/run.sh

@@ -25,6 +25,7 @@ source "${BASE_DIR}"/util.sh

 DRIVER_NAME=${DRIVER_NAME:-aws-ebs-csi-driver}
 CONTAINER_NAME=${CONTAINER_NAME:-ebs-plugin}
+DRIVER_START_TIME_THRESHOLD=25


How did we come up with this number? I feel like it should be higher? We can adjust later as we go.

I observed on my cluster that usually takes ~15s so I set this number. But makes sense to increase that a little bit in the initial commit, we can adjust this later.

Yeah looks like it failed on the CI.

I think the delta is the image pull time.. Cold start apparently takes longer than I expected lol

ayberk · 2021-03-24T18:43:28Z

hack/e2e/run.sh

+secondUsed=$(( (endSec-startSec)/1 ))
+# Set timeout threshold as 20 seconds for now, usually it takes less than 10s to startup
+if [ $secondUsed -gt $DRIVER_START_TIME_THRESHOLD ]; then
+  loudecho "Driver start timeout, test fail!"


You should log here how long it took and what the threshold is, so we can see the gap immediately without reading the code.

wongma7 · 2021-03-24T18:49:25Z

I think the main bottleneck will be image pulling which we don't really control. I guess if we are trying to measure from ux perspecitve what 'cold start' would look like then it doesn't amtter the details of what is happening under the hood, just the overall #. anyway I'm ok with merging variation of this and seeing how it goes.

ayberk · 2021-03-24T19:44:53Z

hack/e2e/run.sh

+secondUsed=$(( (endSec-startSec)/1 ))
+# Set timeout threshold as 20 seconds for now, usually it takes less than 10s to startup
+if [ $secondUsed -gt $DRIVER_START_TIME_THRESHOLD ]; then
+  loudecho "Driver start timeout, Cost $secondUsed but the threshold is $DRIVER_START_TIME_THRESHOLD Fail the test."


nit: s/Cost/Took

Looks like it took ~30s to start up now, do you think it is ok I change the threshold to 45s? 60s looks a bit too much here

I don't see a problem with going to 60s honestly. For now we can use this to make sure we don't introduce a change that'd increase the startup time too much. So let's start with a high number and go from there.

Sounds good, Thank you!

ayberk · 2021-03-24T20:38:27Z

hack/e2e/run.sh

@@ -25,6 +25,7 @@ source "${BASE_DIR}"/util.sh

 DRIVER_NAME=${DRIVER_NAME:-aws-ebs-csi-driver}
 CONTAINER_NAME=${CONTAINER_NAME:-ebs-plugin}
+DRIVER_START_TIME_THRESHOLD=60


one last thing, can you add a comment like # seconds here?

Updated, added seconds in the name to make it clear.

ayberk · 2021-03-24T20:41:43Z

hack/e2e/run.sh

@@ -25,6 +25,7 @@ source "${BASE_DIR}"/util.sh

 DRIVER_NAME=${DRIVER_NAME:-aws-ebs-csi-driver}
 CONTAINER_NAME=${CONTAINER_NAME:-ebs-plugin}
+DRIVER_START_TIME_THRESHOLD_SECONDS=60


ayberk · 2021-03-24T20:42:15Z

/lgtm

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 23, 2021

k8s-ci-robot requested review from d-nishi and gnufied March 23, 2021 23:38

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 23, 2021

AndyXiangLi force-pushed the track-start-time branch from 4339329 to ffe53a0 Compare March 24, 2021 17:10

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 24, 2021

wongma7 reviewed Mar 24, 2021

View reviewed changes

AndyXiangLi force-pushed the track-start-time branch from ffe53a0 to 2eec738 Compare March 24, 2021 18:23

AndyXiangLi changed the title ~~[WIP] test driver deploy time, do not merge!!!~~ Track driver deploy time in e2e test pipeline Mar 24, 2021

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2021

AndyXiangLi force-pushed the track-start-time branch from 2eec738 to a059b1c Compare March 24, 2021 18:38

ayberk suggested changes Mar 24, 2021

View reviewed changes

AndyXiangLi force-pushed the track-start-time branch from a059b1c to 6260b0a Compare March 24, 2021 18:51

ayberk reviewed Mar 24, 2021

View reviewed changes

AndyXiangLi force-pushed the track-start-time branch from 6260b0a to 91d1e5c Compare March 24, 2021 20:02

ayberk reviewed Mar 24, 2021

View reviewed changes

track driver start time in the e2e test

bdf5075

AndyXiangLi force-pushed the track-start-time branch from 91d1e5c to bdf5075 Compare March 24, 2021 20:40

ayberk reviewed Mar 24, 2021

View reviewed changes

k8s-ci-robot assigned ayberk Mar 24, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 24, 2021

k8s-ci-robot merged commit cdbec43 into kubernetes-sigs:master Mar 24, 2021

AndyXiangLi mentioned this pull request Mar 25, 2021

Release 0.10 #811

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track driver deploy time in e2e test pipeline #815

Track driver deploy time in e2e test pipeline #815

AndyXiangLi commented Mar 23, 2021 •

edited

Loading

k8s-ci-robot commented Mar 23, 2021

coveralls commented Mar 23, 2021 •

edited

Loading

wongma7 Mar 24, 2021

AndyXiangLi Mar 24, 2021

AndyXiangLi Mar 24, 2021

wongma7 Mar 24, 2021

wongma7 Mar 24, 2021

ayberk Mar 24, 2021

wongma7 Mar 24, 2021

AndyXiangLi Mar 24, 2021 •

edited

Loading

ayberk Mar 24, 2021

ayberk Mar 24, 2021

AndyXiangLi Mar 24, 2021

ayberk Mar 24, 2021

AndyXiangLi Mar 24, 2021

ayberk Mar 24, 2021

AndyXiangLi Mar 24, 2021

wongma7 commented Mar 24, 2021

ayberk Mar 24, 2021

AndyXiangLi Mar 24, 2021

ayberk Mar 24, 2021 •

edited

Loading

AndyXiangLi Mar 24, 2021 •

edited

Loading

ayberk Mar 24, 2021

AndyXiangLi Mar 24, 2021

ayberk Mar 24, 2021

ayberk commented Mar 24, 2021

Track driver deploy time in e2e test pipeline #815

Track driver deploy time in e2e test pipeline #815

Conversation

AndyXiangLi commented Mar 23, 2021 • edited Loading

k8s-ci-robot commented Mar 23, 2021

coveralls commented Mar 23, 2021 • edited Loading

Pull Request Test Coverage Report for Build 1765

💛 - Coveralls

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyXiangLi Mar 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wongma7 commented Mar 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayberk Mar 24, 2021 • edited Loading

Choose a reason for hiding this comment

AndyXiangLi Mar 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayberk commented Mar 24, 2021

AndyXiangLi commented Mar 23, 2021 •

edited

Loading

coveralls commented Mar 23, 2021 •

edited

Loading

AndyXiangLi Mar 24, 2021 •

edited

Loading

ayberk Mar 24, 2021 •

edited

Loading

AndyXiangLi Mar 24, 2021 •

edited

Loading