Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce automated performance testing. #1068

Merged
merged 45 commits into from
Jul 23, 2020

Conversation

bnapolitan
Copy link
Contributor

Scale pods up and down on different instance types and track how long it takes to propagate.

@bnapolitan bnapolitan marked this pull request as draft June 30, 2020 18:03
@bnapolitan
Copy link
Contributor Author

Still in progress because a communal bucket for test results is still undetermined.

Copy link
Contributor

@mogren mogren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Some changes needed.

deploy-130-pods.yaml Outdated Show resolved Hide resolved
deploy-130-pods.yaml Outdated Show resolved Hide resolved
scripts/run-integration-tests.sh Show resolved Hide resolved
test/integration/README.md Outdated Show resolved Hide resolved
test/integration/README.md Outdated Show resolved Hide resolved
test/integration/README.md Outdated Show resolved Hide resolved
commit 67b6363
Merge: fd80aff afdb125
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Tue Jul 7 12:22:14 2020 -0400

    Merge branch 'upstream-master' into scale-test-single-node

commit fd80aff
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Tue Jul 7 12:20:56 2020 -0400

    Forgotten readme commit.

commit dae08fd
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Tue Jul 7 12:20:43 2020 -0400

    Fix duration calculation for timeout, remove eksctl, revise readme.

commit 80a50fd
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Tue Jul 7 01:04:09 2020 -0400

    Change image to kubernetes pause.

commit 6104022
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Mon Jul 6 16:59:36 2020 -0400

    Revert back to 98 node startup.

commit c7d9a5f
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Mon Jul 6 14:49:30 2020 -0400

    Reduce initial replicas to 1

commit ddf7cd8
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Mon Jul 6 13:11:50 2020 -0400

    Add timeout to performance tests, add content to readme.

commit 44092a6
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Mon Jul 6 11:56:52 2020 -0400

    Revert image to google.

commit 2c8291e
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Thu Jul 2 15:36:18 2020 -0400

    Don't exit if s3 bucket upload fails.

commit 318101a
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Thu Jul 2 13:37:36 2020 -0400

    Fix file path issue.

commit 16254ad
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Wed Jul 1 17:07:12 2020 -0400

    Fix CircleCI yml syntax error.

commit 43dd11d
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Wed Jul 1 17:05:34 2020 -0400

    Configure weekly performance.

commit d9b58bb
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Wed Jul 1 16:57:17 2020 -0400

    Start mng with 1 node, put metadata into data file names, suppress copy errors.

commit 5bab04d
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Wed Jul 1 02:43:28 2020 -0400

    Changes from PR.

commit 72a8608
Author: Ben Napolitan <bnapolitan@outlook.com>
Date:   Fri Jun 26 11:58:25 2020 -0400

    Squashed commit of the following:

    commit 5aac358
    Merge: 0bcf24b 30f98bd
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Fri Jun 26 11:57:31 2020 -0400

        Merge branch 'upstream-master' into scale-test-single-node-old

    commit 0bcf24b
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Fri Jun 26 11:55:48 2020 -0400

        Revert rolling update change.

    commit 53866a0
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Thu Jun 25 16:22:33 2020 -0400

        Increase rollingupdate limit.

    commit 966466a
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Thu Jun 25 11:01:07 2020 -0400

        Fix environment unset environment variables.

    commit f429283
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Wed Jun 24 13:26:51 2020 -0400

        Remove sleeps, deleted load balancers in test account.

    commit 166a168
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Wed Jun 24 09:21:17 2020 -0400

        Attempt all scale tests.

    commit 81dd0aa
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Tue Jun 23 12:31:48 2020 -0400

        Try adding all node groups back.

    commit 828f7aa
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Tue Jun 23 11:37:35 2020 -0400

        Attempt only large performance test and no conformance.

    commit 82a80e7
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Mon Jun 22 18:02:59 2020 -0400

        Try deleting other node groups.

    commit 284fcd1
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Mon Jun 22 16:13:47 2020 -0400

        Trying again.

    commit e5ef16b
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Mon Jun 22 16:10:20 2020 -0400

        Altar size again.

    commit d1e0062
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Mon Jun 22 12:53:06 2020 -0400

        Attempt instance size change.

    commit 686e7f2
    Author: Ben Napolitan <bnapolitan@outlook.com>
    Date:   Fri Jun 19 16:47:51 2020 -0400

        Fix duplicate name.

    commit e17358c
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Fri Jun 19 14:04:58 2020 -0400

        Attempt 5000 pod scale test.

    commit e9ea95d
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Thu Jun 18 17:53:28 2020 -0400

        Attempt 730 pods on one node performance test.

    commit cad25af
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Thu Jun 18 13:26:51 2020 -0400

        Fix file output syntax.

    commit 974ac0e
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Thu Jun 18 11:42:30 2020 -0400

        Verify scale test uploading works.

    commit b7efa10
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Wed Jun 17 17:56:32 2020 -0400

        Create data file after scale test.

    commit 3a9eaec
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Mon Jun 15 14:27:37 2020 -0400

        Fix if syntax.

    commit 00d74bc
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Mon Jun 15 11:36:03 2020 -0400

        Run scale tests moved and hidden behind env var.

    commit ef6841e
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Sat Jun 13 21:35:21 2020 -0400

        Fix grep causing failure.

    commit 4fbce7e
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Sat Jun 13 18:37:11 2020 -0400

        Reduce sleep for scale test.

    commit d766018
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Sat Jun 13 13:32:50 2020 -0400

        Try to diagnose polling problem.

    commit 1ac7d35
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Fri Jun 12 17:46:54 2020 -0400

        Run scale test for 130 pods.

    commit 9933a09
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Fri Jun 12 13:29:32 2020 -0400

        Add new nodegroup and move directory copy to proper place.

    commit 470116c
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Fri Jun 12 12:04:48 2020 -0400

        Move to after kubeconfig.

    commit 1f1f0fb
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Fri Jun 12 01:19:04 2020 -0400

        Switch to use KUBECTL_PATH.

    commit 1b43268
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Thu Jun 11 23:46:58 2020 -0400

        Retry with one nodegroup.

    commit b0d3228
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Thu Jun 11 23:00:48 2020 -0400

        Try to create new nodegroup and apply deployment to it.

    commit abd9015
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Thu Jun 11 21:25:40 2020 -0400

        Correct cluster name and change region in CircleCI.

    commit 46fe54f
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Thu Jun 11 19:03:03 2020 -0400

        Get info for eksctl.

    commit bbb3557
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Wed Jun 10 16:08:26 2020 -0400

        Attempt to ssh into test run.

    commit 353130b
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Wed Jun 10 14:22:18 2020 -0400

        Delete eks nodegroup create.

    commit 0ff7589
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Wed Jun 10 13:14:51 2020 -0400

        Try to use eksctl.

    commit 3ec6da4
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Wed Jun 10 12:28:23 2020 -0400

        Syntax fix.

    commit e79b32f
    Author: Ben Napolitan <bennapol@amazon.com>
    Date:   Tue Jun 9 19:55:25 2020 -0400

        Trying to create nodegroup and deploy pods.
@bnapolitan bnapolitan force-pushed the scale-test-single-node branch from 67b6363 to a8323cc Compare July 7, 2020 16:26
@bnapolitan bnapolitan marked this pull request as ready for review July 7, 2020 16:50
echo $((SCALE_UP_DURATION_ARRAY[2])), $((SCALE_DOWN_DURATION_ARRAY[2])) >> $now

cat $now
aws s3 cp $now s3://cni-performance-test-data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this bucket needs to be a configurable setting. And if it's not set, we should skip the upload. Something like

if [[ -n "${S3_PERF_TEST_BUCKET:-}" ]]; then
    aws s3 cp $filename "$S3_PERF_TEST_BUCKET"
else 
    echo "No S3 bucket name given, not uploading results"
fi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do the check for S3 bucket before the test?

Comment on lines 60 to 61
now="pod-130-Test#${TEST_ID}-$(date +"%m-%d-%Y-%T").csv"
echo $now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this filename instead?

do
ITERATION_START=$SECONDS
$KUBECTL_PATH scale -f ./testdata/deploy-5000-pods.yaml --replicas=5000
sleep 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 100 seconds based on here?

Comment on lines 223 to 225
$KUBECTL_PATH apply -f ./testdata/deploy-130-pods.yaml
run_performance_test_130_pods
$KUBECTL_PATH delete -f ./testdata/deploy-130-pods.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the testdata apply functions not done inside the test functions?

@bnapolitan bnapolitan marked this pull request as draft July 14, 2020 17:42
@bnapolitan bnapolitan marked this pull request as ready for review July 20, 2020 14:55
Copy link
Contributor

@jayanthvn jayanthvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Contributor

@mogren mogren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM

@mogren mogren merged commit 9dae761 into aws:master Jul 23, 2020
mogren pushed a commit to mogren/amazon-vpc-cni-k8s that referenced this pull request Aug 11, 2020
* Add version info to file, display start of performance tests.
* Scale up node group before running 5000 pod test.
* Create unique mng names.
* Update data files for performance tests.
* Add failure checking for performance tests.
* Upload files to corresponding folders in s3 bucket.
* Check for slow performance update.
* Weekly performance test (midnight Wednesday)
SaranBalaji90 pushed a commit that referenced this pull request Aug 11, 2020
* Add version info to file, display start of performance tests.
* Scale up node group before running 5000 pod test.
* Create unique mng names.
* Update data files for performance tests.
* Add failure checking for performance tests.
* Upload files to corresponding folders in s3 bucket.
* Check for slow performance update.
* Weekly performance test (midnight Wednesday)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants