Instrument the instance and shell using Swift Metrics #56 #70

ktoso · 2020-09-17T14:16:16Z

Motivation:

SWIM is a great example of a very internal piece of clusters which should expose metrics well.

And there's all kinds of types of metrics, gauges for the membership, counters for total messages etc, and recorders for how many data we sent etc.

In this PR trying to explore the best patterns for instrumenting such middleware with swift-metrics.

Modifications:

adds metrics configuration
instruments the specific places with the metrics calls

Still work in progress

Result:

Resolves SWIM + Metrics: Expose SWIM metrics, like how long pings take to come back etc #8

ktoso · 2020-09-17T14:28:50Z

This added testing infra for the metrics and I'll finish up adding all metrics soon as I'd like to put them to good use.

ktoso · 2020-10-01T08:12:24Z

Example metrics:



# TYPE swim_members gauge
swim_members 0.0
swim_members{status="unreachable"} 0.0
swim_members{status="suspect"} 0.0
swim_members{status="alive"} 3.0
# TYPE swim_members_total counter
swim_members_total 0
# TYPE swim_removedmembertombstones gauge
swim_removedmembertombstones 0.0
# TYPE swim_lha gauge
swim_lha 0.0
# TYPE swim_incarnation gauge
swim_incarnation 0.0
# TYPE swim_probe_ping counter
swim_probe_ping 0
swim_probe_ping{type="successful"} 37
# TYPE swim_probe_pingrequest counter
swim_probe_pingrequest 0
# TYPE swim_roundtriptime_ping summary
swim_roundtriptime_ping{quantile="0.01"} 7261898.0
swim_roundtriptime_ping{quantile="0.05"} 7529004.0
swim_roundtriptime_ping{quantile="0.5"} 10351879.0
swim_roundtriptime_ping{quantile="0.9"} 16244390.0
swim_roundtriptime_ping{quantile="0.95"} 18839738.0
swim_roundtriptime_ping{quantile="0.99"} 56799421.0
swim_roundtriptime_ping{quantile="0.999"} 56799421.0
swim_roundtriptime_ping_count 37
swim_roundtriptime_ping_sum 452125972.0
# TYPE swim_roundtriptime_pingrequest summary
swim_roundtriptime_pingrequest{quantile="0.01"} 0.0
swim_roundtriptime_pingrequest{quantile="0.05"} 0.0
swim_roundtriptime_pingrequest{quantile="0.5"} 0.0
swim_roundtriptime_pingrequest{quantile="0.9"} 0.0
swim_roundtriptime_pingrequest{quantile="0.95"} 0.0
swim_roundtriptime_pingrequest{quantile="0.99"} 0.0
swim_roundtriptime_pingrequest{quantile="0.999"} 0.0
swim_roundtriptime_pingrequest_count 0
swim_roundtriptime_pingrequest_sum 0.0
# TYPE swim_message_count counter
swim_message_count 0
swim_message_count{direction="out"} 76
swim_message_count{direction="in"} 76
# TYPE swim_message_bytes histogram
swim_message_bytes_bucket{le="0.005"} 0.0
swim_message_bytes_bucket{le="0.01"} 0.0
swim_message_bytes_bucket{le="0.025"} 0.0
swim_message_bytes_bucket{le="0.05"} 0.0
swim_message_bytes_bucket{le="0.1"} 0.0
swim_message_bytes_bucket{le="0.25"} 0.0
swim_message_bytes_bucket{le="0.5"} 0.0
swim_message_bytes_bucket{le="1.0"} 0.0
swim_message_bytes_bucket{le="2.5"} 0.0
swim_message_bytes_bucket{le="5.0"} 0.0
swim_message_bytes_bucket{le="10.0"} 0.0
swim_message_bytes_bucket{le="+Inf"} 152.0
swim_message_bytes_count 152.0
swim_message_bytes_sum 34650.0
swim_message_bytes_bucket{le="0.005", direction="in"} 0.0
swim_message_bytes_bucket{le="0.01", direction="in"} 0.0
swim_message_bytes_bucket{le="0.025", direction="in"} 0.0
swim_message_bytes_bucket{le="0.05", direction="in"} 0.0
swim_message_bytes_bucket{le="0.1", direction="in"} 0.0
swim_message_bytes_bucket{le="0.25", direction="in"} 0.0
swim_message_bytes_bucket{le="0.5", direction="in"} 0.0
swim_message_bytes_bucket{le="1.0", direction="in"} 0.0
swim_message_bytes_bucket{le="2.5", direction="in"} 0.0
swim_message_bytes_bucket{le="5.0", direction="in"} 0.0
swim_message_bytes_bucket{le="10.0", direction="in"} 0.0
swim_message_bytes_bucket{le="+Inf", direction="in"} 76.0
swim_message_bytes_count{direction="in"} 76.0
swim_message_bytes_sum{direction="in"} 17119.0
swim_message_bytes_bucket{le="0.005", direction="out"} 0.0
swim_message_bytes_bucket{le="0.01", direction="out"} 0.0
swim_message_bytes_bucket{le="0.025", direction="out"} 0.0
swim_message_bytes_bucket{le="0.05", direction="out"} 0.0
swim_message_bytes_bucket{le="0.1", direction="out"} 0.0
swim_message_bytes_bucket{le="0.25", direction="out"} 0.0
swim_message_bytes_bucket{le="0.5", direction="out"} 0.0
swim_message_bytes_bucket{le="1.0", direction="out"} 0.0
swim_message_bytes_bucket{le="2.5", direction="out"} 0.0
swim_message_bytes_bucket{le="5.0", direction="out"} 0.0
swim_message_bytes_bucket{le="10.0", direction="out"} 0.0
swim_message_bytes_bucket{le="+Inf", direction="out"} 76.0
swim_message_bytes_count{direction="out"} 76.0
swim_message_bytes_sum{direction="out"} 17531.0

Sources/SWIMNIOExample/Metrics+Extensions.swift

ktoso · 2020-10-01T08:37:18Z

Tests/SWIMNIOExampleTests/SWIMNIOMetricsTests.swift

+import SWIMTestKit
+import XCTest
+
+final class SWIMNIOMetricsTests: RealClusteredXCTestCase {


fully integration tested metrics, yay! 🥳

ktoso · 2020-10-01T08:55:32Z

Samples/Package.swift

@@ -9,6 +9,7 @@ var targets: [PackageDescription.Target] = [
        dependencies: [
            "SWIM",
            "SWIMNIOExample",
+            "SwiftPrometheus",


only in the example, since it has an easy way to just "print" metrics to CLI

ktoso · 2020-10-01T08:56:42Z

Sources/SWIM/Metrics.swift

+    /// Object containing all metrics a SWIM instance and shell should be reporting.
+    ///
+    /// - SeeAlso: `SWIM.Metrics.Shell` for metrics that a specific implementation should emit
+    public struct Metrics {


We list all metrics we offer here.

It's a good pattern to document them like this, and also -- thanks to this, rather than ad hoc creation when used, we're able to nicely test and mock them, even if there are many instances in the same process. We would not be able to do this if we went with super global stuff.

ktoso · 2020-10-01T08:57:08Z

Sources/SWIM/Metrics.swift

+            case .unreachable:
+                unreachables += 1
+            case .dead:
+                () // dead is reported as a removal when they're removed and tombstoned, not as a gauge


deads are not counted, as they are removed already; however notice that we do count the tombstones

ktoso · 2020-10-01T08:59:26Z

Sources/SWIM/SWIMInstance.swift

+
+            self.metrics.incarnation.record(self.incarnation)
+            self.metrics.localHealthMultiplier.record(self.localHealthMultiplier)
+            self.metrics.updateMembership(self.members)


important to set some initial values, not to be completely empty and confusing when people look at it

… Counter

+testkit move testing utilities to shared module, since we need to reuse them implement more metrics, failures and timing intervals

comment out in example by default cleanup

ktoso · 2020-10-01T09:44:31Z

Please have a look as well if the metrics (all listed here https://github.com/apple/swift-cluster-membership/pull/70/files#r498088594 ) look good. I think that's the main things we care about, @budde 👍

--

Sanity check for style and testing how we instrument such systems welcome;
If you are busy no worries, nothing tremendously tricky or weird here 👍 // @avolokhov @yim-lee @tomerd @drexin

Mostly pinging since this is the first time we introduce more instrumentation into any of our OSS packages, and I believe this general pattern is the right to suggest and apply as we instrument other systems too, open to opinions though!

I'll follow up a bit in another project where and then merge and release this SWIM update as a minor bump.

Sources/SWIM/Metrics.swift

Tests/SWIMNIOExampleTests/SWIMNIOMetricsTests.swift

Tests/SWIMTestKit/TestMetrics.swift

Tests/SWIMTests/SWIMMetricsTests.swift

drexin

👍

Co-authored-by: Yim Lee <yim_lee@apple.com>

ktoso · 2020-10-02T07:56:24Z

Thank you for reviews everyone!

ktoso marked this pull request as draft September 17, 2020 14:28

ktoso mentioned this pull request Sep 17, 2020

[WIP] Instrument the instance and shell using Swift Metrics #56

Closed

ktoso force-pushed the wip-metrics branch from 12525c6 to fbde696 Compare October 1, 2020 07:01

ktoso commented Oct 1, 2020

View reviewed changes

Sources/SWIMNIOExample/Metrics+Extensions.swift Show resolved Hide resolved

ktoso commented Oct 1, 2020

View reviewed changes

ktoso added 5 commits October 1, 2020 18:39

+metrics,swim implement metrics in SWIM instance and NIO Shell

8b74c8a

prepared testing infra for metrics and prepared most of them

637efd5

implementest test infra properly for metrics; dead is not a Gauge but…

ef3a524

… Counter

more metrics, lha value as well as shell specific values

5608950

+testkit move testing utilities to shared module, since we need to reuse them implement more metrics, failures and timing intervals

adjust labels to be unique per type, a prometheus requirement

6c66fae

comment out in example by default cleanup

ktoso requested a review from avolokhov October 1, 2020 09:40

ktoso force-pushed the wip-metrics branch from c6d5881 to 6c66fae Compare October 1, 2020 09:41

yim-lee approved these changes Oct 1, 2020

View reviewed changes

drexin approved these changes Oct 1, 2020

View reviewed changes

budde approved these changes Oct 1, 2020

View reviewed changes

ktoso mentioned this pull request Oct 2, 2020

Additional sugar: Timer.recordInterval(since: DispatchTime, now: DispatchTime = .now()) apple/swift-metrics#79

Closed

Apply suggestions from code review

a5cf57d

Co-authored-by: Yim Lee <yim_lee@apple.com>

ktoso changed the title ~~[WIP] Instrument the instance and shell using Swift Metrics #56~~ Instrument the instance and shell using Swift Metrics #56 Oct 2, 2020

ktoso marked this pull request as ready for review October 2, 2020 07:49

ktoso merged commit 343e2ad into main Oct 2, 2020

ktoso deleted the wip-metrics branch October 2, 2020 07:56

BasThomas mentioned this pull request Oct 2, 2020

[170] Issue #170 - October 8, 2020 SwiftWeekly/swiftweekly.github.io#548

Closed

avolokhov mentioned this pull request Oct 17, 2020

API for setting list of dimensions on record, rather than on Metric creation apple/swift-metrics#85

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument the instance and shell using Swift Metrics #56 #70

Instrument the instance and shell using Swift Metrics #56 #70

ktoso commented Sep 17, 2020

ktoso commented Sep 17, 2020

ktoso commented Oct 1, 2020

ktoso Oct 1, 2020

ktoso Oct 1, 2020

ktoso Oct 1, 2020

ktoso Oct 1, 2020

ktoso Oct 1, 2020

ktoso commented Oct 1, 2020

drexin left a comment

ktoso commented Oct 2, 2020

Instrument the instance and shell using Swift Metrics #56 #70

Instrument the instance and shell using Swift Metrics #56 #70

Conversation

ktoso commented Sep 17, 2020

Motivation:

Modifications:

Result:

ktoso commented Sep 17, 2020

ktoso commented Oct 1, 2020

ktoso Oct 1, 2020

Choose a reason for hiding this comment

ktoso Oct 1, 2020

Choose a reason for hiding this comment

ktoso Oct 1, 2020

Choose a reason for hiding this comment

ktoso Oct 1, 2020

Choose a reason for hiding this comment

ktoso Oct 1, 2020

Choose a reason for hiding this comment

ktoso commented Oct 1, 2020

drexin left a comment

Choose a reason for hiding this comment

ktoso commented Oct 2, 2020