Added standard deviation and standard error of the mean to stats calc…

…ulated in debezium/perf experiments. (#2117) * Added stdev and sem stats to cpu samples in debezium/perf experiment output. * Updated README. * Corrected an error in the README.
deephaven · Mar 24, 2022 · 17722b3 · 17722b3
1 parent 677fe74
commit 17722b3
Show file tree

Hide file tree

Showing 3 changed files with 38 additions and 27 deletions.
diff --git a/debezium/perf/README.md b/debezium/perf/README.md
@@ -41,40 +41,42 @@ It will:
 - Set the given pageviews per second rate, and wait a fixed amount of time thereafter for processing to settle.
 - Take multiple samples for CPU and memory utilization over a defined period.
   Output from top in batch mode is sent to a log file and later post-processed.
+  Aside from individual samples, the following statistics are computed:
+  - median
+  - sandard deviation (`sd` in the output).
+  - standard error of the mean (`sem` in the output).
 - Stop and "reset" the containers.
 
 The example
 
 ```
 cd debezium/perf
-./run_experiment.sh dh 5000 20 10 1.0
+./run_experiment.sh dh 50000 20 30 1.0
 ```
 
-will run an experiment for Deephaven (tag `dh`; use tag `mz` for Materialize) with a target rate of 5,000 pageviews per second.
+will run an experiment for Deephaven (tag `dh`; use tag `mz` for Materialize) with a target rate of 50,000 pageviews per second.
 It will wait 20 seconds after setting the target rate to begin sampling CPU and memory utilization using `top` in batch mode.
-10 samples will be obtained, with a delay between samples of 1.0 seconds.
+30 samples will be obtained, with a delay between samples of 1.0 seconds.
 
 Example output from a run:
 
 ```
-cfs@erke 12:18:20 ~/dh/oss3/deephaven-core/debezium/perf
-$ ./run_experiment.sh dh 5000 20 10 1.0
-About to run an experiment for dh with 5000 pageviews/s.
+About to run an experiment for dh with 50000 pageviews/s.
 
 Actions that will be performed in this run:
-1. Start compose services required for for dh.
+1. Start compose services required for for dh and initialize simulation.
 2. Execute demo in dh and setup update delay logging.
-3. Set 5000 pageviews per second rate.
+3. Set 50000 pageviews per second rate.
 4. Wait 20 seconds.
-5. Take 10 samples for mem and CPU utilization, 1.0 seconds between samples.
+5. Take 30 samples for mem and CPU utilization, 1.0 seconds between samples.
 6. Stop and 'reset' (down) compose.
 
 Running experiment.
 
-1. Starting compose.
-PERF_TAG=2022.03.22.16.18.41_UTC_dh_5000
+1. Starting compose and initializing simulation.
+PERF_TAG=2022.03.24.01.59.34_UTC_dh_50000
 
-Logs are being saved to logs/2022.03.22.16.18.41_UTC_dh_5000.
+Logs are being saved to logs/2022.03.24.01.59.34_UTC_dh_50000.
 
 2. Running demo in dh and sampling delays.
 1 compiler directives added
@@ -103,32 +105,32 @@ No displayable variables updated
 
 3. Setting pageviews per second
 LOADGEN Connected.
-Setting pageviews_per_second: old value was 50, new value is 5000.
+Setting pageviews_per_second: old value was 50, new value is 50000.
 Goodbye.
 
 4. Waiting for 20 seconds.
 
-5. Sampling top.
-name=redpanda, tag=CPU_PCT, mean=84.14, samples=80.0, 84.2, 85.0, 87.0, 85.0, 82.0, 85.0, 84.0, 84.2, 85.0
-name=redpanda, tag=RES_GiB, mean=0.77, samples=0.7678, 0.7698, 0.7698, 0.7698, 0.7718, 0.7718, 0.7718, 0.7718, 0.7718, 0.7776
-name=deephaven, tag=CPU_PCT, mean=35.21, samples=66.7, 31.7, 28.0, 31.0, 27.0, 23.0, 46.0, 47.0, 25.7, 26.0
-name=deephaven, tag=RES_GiB, mean=2.40, samples=2.4, 2.4, 2.4, 2.4, 2.4, 2.4, 2.4, 2.4, 2.4, 2.4
+5. Sampling top for 30 * 1.0 seconds.
+name=deephaven, tag=CPU_PCT, mean=86.48, sd=29.84, sem=5.45, samples=[120.0, 64.4, 64.0, 66.0, 108.0, 59.0, 83.0, 68.0, 102.0, 71.0, 83.0, 169.3, 75.0, 63.0, 73.0, 147.0, 66.0, 104.0, 58.0, 70.3, 62.0, 86.0, 88.0, 138.0, 90.0, 78.0, 67.0, 56.4, 72.0, 143.0]
+name=deephaven, tag=RES_GiB, mean=3.10, sd=0.00, sem=0.00, samples=[3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1]
+name=redpanda, tag=CPU_PCT, mean=89.08, sd=2.46, sem=0.45, samples=[80.0, 86.1, 88.0, 88.0, 85.0, 87.0, 86.0, 89.0, 90.0, 89.0, 89.0, 89.1, 90.0, 90.0, 88.0, 90.0, 91.0, 91.0, 91.0, 89.1, 90.0, 91.0, 90.0, 89.0, 92.0, 92.0, 89.0, 90.1, 92.0, 91.0]
+name=redpanda, tag=RES_GiB, mean=0.95, sd=0.05, sem=0.01, samples=[0.8659, 0.8678, 0.8783, 0.8844, 0.8903, 0.8942, 0.9001, 0.9059, 0.9098, 0.9176, 0.9235, 0.9294, 0.9333, 0.9391, 0.945, 0.9489, 0.9548, 0.9606, 0.9665, 0.9704, 0.9763, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
 
 6. Stopping and 'reset' (down) compose.
 Stopping core-debezium-perf_envoy_1      ... done
 Stopping core-debezium-perf_grpc-proxy_1 ... done
 Stopping core-debezium-perf_loadgen_1    ... done
 Stopping core-debezium-perf_debezium_1   ... done
-Stopping core-debezium-perf_server_1     ... done
 Stopping core-debezium-perf_redpanda_1   ... done
+Stopping core-debezium-perf_server_1     ... done
 Stopping core-debezium-perf_mysql_1      ... done
 Stopping core-debezium-perf_web_1        ... done
 Removing core-debezium-perf_envoy_1      ... done
 Removing core-debezium-perf_grpc-proxy_1 ... done
 Removing core-debezium-perf_loadgen_1    ... done
 Removing core-debezium-perf_debezium_1   ... done
-Removing core-debezium-perf_server_1     ... done
 Removing core-debezium-perf_redpanda_1   ... done
+Removing core-debezium-perf_server_1     ... done
 Removing core-debezium-perf_mysql_1      ... done
 Removing core-debezium-perf_web_1        ... done
 Removing network core-debezium-perf_default
@@ -137,7 +139,7 @@ Experiment finished.
 ```
 
 The CPU and memory utilization samples are shown on stdout and also saved to a file in the
-new directory under `logs/`, in this case `logs/2022.03.22.16.18.41_UTC_dh_5000.`
+new directory under `logs/`, in this case `logs/2022.03.24.01.59.34_UTC_dh_50000`.
 
 ## Manual testing
 

diff --git a/debezium/perf/run_experiment.sh b/debezium/perf/run_experiment.sh
@@ -24,7 +24,7 @@ top_delay="$5"
 echo "About to run an experiment for ${engine} with ${rate_per_s} pageviews/s."
 echo
 echo "Actions that will be performed in this run:"
-echo "1. Start compose services required for for ${engine}."
+echo "1. Start compose services required for for ${engine} and initialize simulation."
 echo "2. Execute demo in ${engine} and setup update delay logging."
 echo "3. Set ${rate_per_s} pageviews per second rate."
 echo "4. Wait ${wait_s} seconds."
@@ -33,7 +33,7 @@ echo "6. Stop and 'reset' (down) compose."
 echo
 echo "Running experiment."
 echo
-echo "1. Starting compose."
+echo "1. Starting compose and initializing simulation."
 export PERF_TAG=$(./start_perf_run.sh "$engine" "$rate_per_s")
 echo "PERF_TAG=${PERF_TAG}"
 echo
@@ -61,7 +61,7 @@ echo "4. Waiting for $wait_s seconds."
 sleep "$wait_s"
 echo
 
-echo "5. Sampling top."
+echo "5. Sampling top for ${top_samples} * ${top_delay} seconds."
 ./sample_top.sh "$engine" "$top_samples" "$top_delay"
 echo
 

diff --git a/debezium/perf/sample_top.py b/debezium/perf/sample_top.py
@@ -1,5 +1,6 @@
 import argparse
 import datetime as dt
+import math
 import re
 import os
 import subprocess
@@ -28,14 +29,17 @@
 pids = []
 pids_args = []
 name_pidre = {}
+names = []
 for proc_spec_str in args.proc_specs_strs:
     name, pid = proc_spec_str.split(':', maxsplit=1)
+    names.append(name)
     pids.append(pid)
     pids_args.append('-p')
     pids_args.append(pid)
     name_pidre[name] = re.compile(f'^{pid} ')
     proc_specs[name] = pid
 
+names.sort()
 top_out = subprocess.run(
     ['top', '-Eg', '-n', f'{args.nsamples}', '-b', '-c', '-d', f'{args.delay_s}'] + pids_args,
     stdout=subprocess.PIPE).stdout.decode('utf-8').splitlines()
@@ -48,6 +52,7 @@
 
 cputag = 'CPU_PCT'   # CPU utilization in percentage
 restag = 'RES_GiB'   # Resident size in GiB
+tags = [cputag, restag]
 
 kib_2_gib = 1.0/(1024 * 1024)
 
@@ -85,11 +90,15 @@ def format_samples(precision : int, samples):
     return s
 
 with open(f'{out_dir}/{now_str}_top_samples.log', 'w') as f:
-    for name, result in results.items():
-        for tag, samples in result.items():
+    for name in names:
+        result = results[name]
+        for tag in tags:
+            samples = result[tag]
             mean = stats.mean(samples)
+            sd = stats.stdev(samples, mean)
+            sem = sd / math.sqrt(len(samples))
             precision = 4 if tag == restag else -1
             samples_str = format_samples(precision, samples)
-            line = f'name={name}, tag={tag}, mean={mean:.2f}, samples={samples_str}'
+            line = f'name={name}, tag={tag}, mean={mean:.2f}, sd={sd:.2f}, sem={sem:.2f}, samples=[{samples_str}]'
             print(line)
             f.write(line + '\n')