Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize audit logging #14596

Merged
merged 1 commit into from
Aug 1, 2022
Merged

Conversation

spowelljr
Copy link
Member

@spowelljr spowelljr commented Jul 15, 2022

Related #14543

Base Case (empty audit log):

$ ls -lah ~/.minikube/logs
-rw-r--r--   1 powellsteven  primarygroup     0B Jul 15 12:58 audit.json
$ minikube options
# average time 0.531s

Before (with 5.5 MB audit log):

$ ls -lah ~/.minikube/logs
-rw-r--r--   1 powellsteven  primarygroup   5.5M Jul 15 12:52 audit.json
$ minikube options
# average time 9.81s

After (with 5.5 MB audit log):

$ ls -lah ~/.minikube/logs
-rw-r--r--   1 powellsteven  primarygroup   5.5M Jul 15 12:52 audit.json
$ minikube options
# average time 0.714s

92.8% reduction

Cause of problem:

auditContents += string(auditLog) + "\n"

Appending the contents to a string variable is what was taking up all the time.

Solution:

currentLogFile.WriteString(string(auditLog) + "\n")

Writing to the file every time instead of concatenating to a string resolved the bottleneck.

Also made small changes to the tests, discovered the tests were using the main audit.json file instead of the tmp file the tests were creating.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 15, 2022
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 15, 2022
@spowelljr
Copy link
Member Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jul 15, 2022
Copy link
Member

@medyagh medyagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a way we could test this ?
like could we make a fast minikube command run 10000 times in a for loop? and then expect the audit file to be not more than X ?
for that test we could use a separate MINIKUBE_HOME folder to not mess up other test audit

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
Docker_Windows TestNoKubernetes/serial/Stop (gopogh) 34.12 (chart)
Docker_Windows TestNoKubernetes/serial/StartNoArgs (gopogh) 39.29 (chart)
Docker_Windows TestAddons/Setup (gopogh) 48.20 (chart)
Docker_Windows TestCertOptions (gopogh) 48.20 (chart)
Docker_Windows TestDockerFlags (gopogh) 48.20 (chart)
Docker_Windows TestForceSystemdEnv (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/CertSync (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/CpCmd (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/FileSync (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageBuild (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageListJson (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageListShort (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageListTable (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageListYaml (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageLoadDaemon (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageLoadFromFile (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageReloadDaemon (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageSaveDaemon (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageSaveToFile (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageTagAndLoadDaemon (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/Setup (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/NonActiveRuntimeDisabled (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/SSHCmd (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/UpdateContextCmd/no_changes (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/UpdateContextCmd/no_clusters (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/UpdateContextCmd/no_minikube_cluster (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/parallel/Version/components (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/serial/CacheCmd/cache/cache_reload (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/serial/CacheCmd/cache/verify_cache_inside_node (gopogh) 48.20 (chart)
Docker_Windows TestFunctional/serial/KubeContext (gopogh) 48.20 (chart)
More tests... Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

@spowelljr
Copy link
Member Author

/retest-this-please

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 14596) |
+----------------+----------+---------------------+
| minikube start | 56.8s    | 56.3s               |
| enable ingress | 27.4s    | 27.9s               |
+----------------+----------+---------------------+

Times for minikube start: 58.4s 56.5s 55.6s 56.8s 56.7s
Times for minikube (PR 14596) start: 56.2s 57.5s 56.4s 55.5s 56.0s

Times for minikube (PR 14596) ingress: 29.1s 25.0s 30.0s 30.1s 25.1s
Times for minikube ingress: 28.6s 29.1s 25.1s 28.5s 25.5s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 14596) |
+----------------+----------+---------------------+
| minikube start | 28.0s    | 27.9s               |
| enable ingress | 22.7s    | 23.4s               |
+----------------+----------+---------------------+

Times for minikube start: 27.9s 27.1s 27.7s 28.9s 28.2s
Times for minikube (PR 14596) start: 27.8s 27.8s 27.5s 28.2s 28.1s

Times for minikube ingress: 22.4s 22.4s 22.9s 22.9s 22.9s
Times for minikube (PR 14596) ingress: 22.9s 22.4s 22.9s 25.4s 23.4s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 14596) |
+----------------+----------+---------------------+
| minikube start | 25.8s    | 26.3s               |
| enable ingress | 27.1s    | 27.3s               |
+----------------+----------+---------------------+

Times for minikube ingress: 27.4s 27.4s 26.9s 26.9s 27.0s
Times for minikube (PR 14596) ingress: 27.4s 27.4s 27.4s 27.4s 26.9s

Times for minikube start: 23.7s 23.7s 23.7s 23.5s 34.4s
Times for minikube (PR 14596) start: 24.2s 24.2s 34.3s 24.6s 24.0s

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
KVM_Linux TestStartStop/group/no-preload/serial/AddonExistsAfterStop (gopogh) 1.32 (chart)
KVM_Linux TestStartStop/group/no-preload/serial/Pause (gopogh) 1.32 (chart)
KVM_Linux TestStartStop/group/no-preload/serial/SecondStart (gopogh) 1.32 (chart)
KVM_Linux TestStartStop/group/no-preload/serial/UserAppExistsAfterStop (gopogh) 1.32 (chart)
KVM_Linux TestStartStop/group/no-preload/serial/VerifyKubernetesImages (gopogh) 1.32 (chart)
Docker_Linux_containerd TestNetworkPlugins/group/auto/DNS (gopogh) 6.40 (chart)
Docker_Windows TestPause/serial/Pause (gopogh) 7.55 (chart)
Docker_Cloud_Shell TestStartStop/group/cloud-shell/serial/AddonExistsAfterStop (gopogh) 14.55 (chart)
Docker_Cloud_Shell TestStartStop/group/cloud-shell/serial/EnableAddonAfterStop (gopogh) 14.55 (chart)
Docker_Cloud_Shell TestStartStop/group/cloud-shell/serial/SecondStart (gopogh) 14.55 (chart)
Docker_Cloud_Shell TestStartStop/group/cloud-shell/serial/Stop (gopogh) 14.55 (chart)
Docker_Cloud_Shell TestStartStop/group/cloud-shell/serial/UserAppExistsAfterStop (gopogh) 14.55 (chart)
Docker_macOS TestFunctional/parallel/DashboardCmd (gopogh) 16.67 (chart)
Docker_Linux_containerd TestPause/serial/Start (gopogh) 19.75 (chart)
Docker_Linux_containerd TestOffline (gopogh) 27.16 (chart)
Docker_Linux_containerd TestAddons/Setup (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/parallel/DashboardCmd (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/parallel/MountCmd/any-port (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/parallel/MySQL (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/parallel/PersistentVolumeClaim (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/parallel/ServiceCmd (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/parallel/ServiceCmdConnect (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/parallel/TunnelCmd/serial/AccessDirect (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/parallel/TunnelCmd/serial/WaitService/Setup (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/serial/ExtraConfig (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/serial/SoftStart (gopogh) 27.78 (chart)
Docker_Linux_containerd TestFunctional/serial/StartWithProxy (gopogh) 27.78 (chart)
Docker_Linux_containerd TestIngressAddonLegacy/serial/ValidateIngressAddonActivation (gopogh) 27.78 (chart)
Docker_Linux_containerd TestIngressAddonLegacy/StartLegacyK8sCluster (gopogh) 27.78 (chart)
Docker_Linux_containerd TestJSONOutput/start/Command (gopogh) 27.78 (chart)
More tests... Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

@spowelljr
Copy link
Member Author

I wonder if there is a way we could test this ? like could we make a fast minikube command run 10000 times in a for loop? and then expect the audit file to be not more than X ? for that test we could use a separate MINIKUBE_HOME folder to not mess up other test audit

This PR only optimizes the audit logging so it's faster, this does not limit the size of the audit log, that can be a follow up PR.

@klaases
Copy link
Contributor

klaases commented Jul 30, 2022

As a part of the approval process, I worked with @spowelljr to test this PR locally.

  • Backup the audit file.
  • Expand the audit log file?
  • Append additional logs to the file.
  • The file with 200,000 lines was approx. 78MB!!!

cat ~/.minikube/logs/audit.json

Before - running without update:

$ time ./out/minikube options
The following options can be passed to any command:

      --add_dir_header=false: If true, adds the file directory to the header of the log messages
      --alsologtostderr=false: log to standard error as well as files (no effect when -logtostderr=true)
  -b, --bootstrapper='kubeadm': The name of the cluster bootstrapper that will set up the Kubernetes cluster.
  -h, --help=false:
      --log_backtrace_at=:0: when logging hits line file:N, emit a stack trace
      --log_dir='': If non-empty, write log files in this directory (no effect when -logtostderr=true)
      --log_file='': If non-empty, use this log file (no effect when -logtostderr=true)
      --log_file_max_size=1800: Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited.
      --logtostderr=true: log to standard error instead of files
      --one_output=false: If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
  -p, --profile='minikube': The name of the minikube VM being used. This can be set to allow having multiple instances of minikube independently.
      --rootless=false: Force to use rootless driver (docker and podman driver only)
      --skip_headers=false: If true, avoid header prefixes in the log messages
      --skip_log_headers=false: If true, avoid headers when opening log files (no effect when -logtostderr=true)
      --stderrthreshold=2: logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=false)
      --user='': Specifies the user executing the operation. Useful for auditing operations executed by 3rd party tools. Defaults to the operating system username.
  -v, --v=0: number for the log level verbosity
      --vmodule=: comma-separated list of pattern=N settings for file-filtered logging

real	10m57.713s
user	46m47.903s
sys	6m8.325s

After - running with update:

time ./out/minikube options
The following options can be passed to any command:

      --add_dir_header=false: If true, adds the file directory to the header of the log messages
      --alsologtostderr=false: log to standard error as well as files (no effect when -logtostderr=true)
  -b, --bootstrapper='kubeadm': The name of the cluster bootstrapper that will set up the Kubernetes cluster.
  -h, --help=false:
      --log_backtrace_at=:0: when logging hits line file:N, emit a stack trace
      --log_dir='': If non-empty, write log files in this directory (no effect when -logtostderr=true)
      --log_file='': If non-empty, use this log file (no effect when -logtostderr=true)
      --log_file_max_size=1800: Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited.
      --logtostderr=true: log to standard error instead of files
      --one_output=false: If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
  -p, --profile='minikube': The name of the minikube VM being used. This can be set to allow having multiple instances of minikube independently.
      --rootless=false: Force to use rootless driver (docker and podman driver only)
      --skip_headers=false: If true, avoid header prefixes in the log messages
      --skip_log_headers=false: If true, avoid headers when opening log files (no effect when -logtostderr=true)
      --stderrthreshold=2: logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=false)
      --user='': Specifies the user executing the operation. Useful for auditing operations executed by 3rd party tools. Defaults to the operating system username.
  -v, --v=0: number for the log level verbosity
      --vmodule=: comma-separated list of pattern=N settings for file-filtered logging

real	0m3.284s
user	0m1.689s
sys	0m0.472s

From 11 minutes (660 seconds) to about 3 seconds.

This is an efficient change with a marked 99.5455% decrease in time.

Copy link
Contributor

@klaases klaases left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved after reviewing a 99+% improvement.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: klaases, spowelljr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@spowelljr spowelljr merged commit 757b877 into kubernetes:master Aug 1, 2022
@spowelljr spowelljr deleted the optimizeAudit branch August 1, 2022 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants