Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument the instance and shell using Swift Metrics #56 #70

Merged
merged 6 commits into from
Oct 2, 2020
Merged

Conversation

ktoso
Copy link
Member

@ktoso ktoso commented Sep 17, 2020

Motivation:

SWIM is a great example of a very internal piece of clusters which should expose metrics well.

And there's all kinds of types of metrics, gauges for the membership, counters for total messages etc, and recorders for how many data we sent etc.

In this PR trying to explore the best patterns for instrumenting such middleware with swift-metrics.

Modifications:

  • adds metrics configuration
  • instruments the specific places with the metrics calls

Still work in progress

Result:

@ktoso
Copy link
Member Author

ktoso commented Sep 17, 2020

This added testing infra for the metrics and I'll finish up adding all metrics soon as I'd like to put them to good use.

@ktoso
Copy link
Member Author

ktoso commented Oct 1, 2020

Example metrics:



# TYPE swim_members gauge
swim_members 0.0
swim_members{status="unreachable"} 0.0
swim_members{status="suspect"} 0.0
swim_members{status="alive"} 3.0
# TYPE swim_members_total counter
swim_members_total 0
# TYPE swim_removedmembertombstones gauge
swim_removedmembertombstones 0.0
# TYPE swim_lha gauge
swim_lha 0.0
# TYPE swim_incarnation gauge
swim_incarnation 0.0
# TYPE swim_probe_ping counter
swim_probe_ping 0
swim_probe_ping{type="successful"} 37
# TYPE swim_probe_pingrequest counter
swim_probe_pingrequest 0
# TYPE swim_roundtriptime_ping summary
swim_roundtriptime_ping{quantile="0.01"} 7261898.0
swim_roundtriptime_ping{quantile="0.05"} 7529004.0
swim_roundtriptime_ping{quantile="0.5"} 10351879.0
swim_roundtriptime_ping{quantile="0.9"} 16244390.0
swim_roundtriptime_ping{quantile="0.95"} 18839738.0
swim_roundtriptime_ping{quantile="0.99"} 56799421.0
swim_roundtriptime_ping{quantile="0.999"} 56799421.0
swim_roundtriptime_ping_count 37
swim_roundtriptime_ping_sum 452125972.0
# TYPE swim_roundtriptime_pingrequest summary
swim_roundtriptime_pingrequest{quantile="0.01"} 0.0
swim_roundtriptime_pingrequest{quantile="0.05"} 0.0
swim_roundtriptime_pingrequest{quantile="0.5"} 0.0
swim_roundtriptime_pingrequest{quantile="0.9"} 0.0
swim_roundtriptime_pingrequest{quantile="0.95"} 0.0
swim_roundtriptime_pingrequest{quantile="0.99"} 0.0
swim_roundtriptime_pingrequest{quantile="0.999"} 0.0
swim_roundtriptime_pingrequest_count 0
swim_roundtriptime_pingrequest_sum 0.0
# TYPE swim_message_count counter
swim_message_count 0
swim_message_count{direction="out"} 76
swim_message_count{direction="in"} 76
# TYPE swim_message_bytes histogram
swim_message_bytes_bucket{le="0.005"} 0.0
swim_message_bytes_bucket{le="0.01"} 0.0
swim_message_bytes_bucket{le="0.025"} 0.0
swim_message_bytes_bucket{le="0.05"} 0.0
swim_message_bytes_bucket{le="0.1"} 0.0
swim_message_bytes_bucket{le="0.25"} 0.0
swim_message_bytes_bucket{le="0.5"} 0.0
swim_message_bytes_bucket{le="1.0"} 0.0
swim_message_bytes_bucket{le="2.5"} 0.0
swim_message_bytes_bucket{le="5.0"} 0.0
swim_message_bytes_bucket{le="10.0"} 0.0
swim_message_bytes_bucket{le="+Inf"} 152.0
swim_message_bytes_count 152.0
swim_message_bytes_sum 34650.0
swim_message_bytes_bucket{le="0.005", direction="in"} 0.0
swim_message_bytes_bucket{le="0.01", direction="in"} 0.0
swim_message_bytes_bucket{le="0.025", direction="in"} 0.0
swim_message_bytes_bucket{le="0.05", direction="in"} 0.0
swim_message_bytes_bucket{le="0.1", direction="in"} 0.0
swim_message_bytes_bucket{le="0.25", direction="in"} 0.0
swim_message_bytes_bucket{le="0.5", direction="in"} 0.0
swim_message_bytes_bucket{le="1.0", direction="in"} 0.0
swim_message_bytes_bucket{le="2.5", direction="in"} 0.0
swim_message_bytes_bucket{le="5.0", direction="in"} 0.0
swim_message_bytes_bucket{le="10.0", direction="in"} 0.0
swim_message_bytes_bucket{le="+Inf", direction="in"} 76.0
swim_message_bytes_count{direction="in"} 76.0
swim_message_bytes_sum{direction="in"} 17119.0
swim_message_bytes_bucket{le="0.005", direction="out"} 0.0
swim_message_bytes_bucket{le="0.01", direction="out"} 0.0
swim_message_bytes_bucket{le="0.025", direction="out"} 0.0
swim_message_bytes_bucket{le="0.05", direction="out"} 0.0
swim_message_bytes_bucket{le="0.1", direction="out"} 0.0
swim_message_bytes_bucket{le="0.25", direction="out"} 0.0
swim_message_bytes_bucket{le="0.5", direction="out"} 0.0
swim_message_bytes_bucket{le="1.0", direction="out"} 0.0
swim_message_bytes_bucket{le="2.5", direction="out"} 0.0
swim_message_bytes_bucket{le="5.0", direction="out"} 0.0
swim_message_bytes_bucket{le="10.0", direction="out"} 0.0
swim_message_bytes_bucket{le="+Inf", direction="out"} 76.0
swim_message_bytes_count{direction="out"} 76.0
swim_message_bytes_sum{direction="out"} 17531.0

import SWIMTestKit
import XCTest

final class SWIMNIOMetricsTests: RealClusteredXCTestCase {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fully integration tested metrics, yay! 🥳

@@ -9,6 +9,7 @@ var targets: [PackageDescription.Target] = [
dependencies: [
"SWIM",
"SWIMNIOExample",
"SwiftPrometheus",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only in the example, since it has an easy way to just "print" metrics to CLI

/// Object containing all metrics a SWIM instance and shell should be reporting.
///
/// - SeeAlso: `SWIM.Metrics.Shell` for metrics that a specific implementation should emit
public struct Metrics {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We list all metrics we offer here.

It's a good pattern to document them like this, and also -- thanks to this, rather than ad hoc creation when used, we're able to nicely test and mock them, even if there are many instances in the same process. We would not be able to do this if we went with super global stuff.

case .unreachable:
unreachables += 1
case .dead:
() // dead is reported as a removal when they're removed and tombstoned, not as a gauge
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deads are not counted, as they are removed already; however notice that we do count the tombstones


self.metrics.incarnation.record(self.incarnation)
self.metrics.localHealthMultiplier.record(self.localHealthMultiplier)
self.metrics.updateMembership(self.members)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important to set some initial values, not to be completely empty and confusing when people look at it

@ktoso
Copy link
Member Author

ktoso commented Oct 1, 2020

Please have a look as well if the metrics (all listed here https://github.com/apple/swift-cluster-membership/pull/70/files#r498088594 ) look good. I think that's the main things we care about, @budde 👍

--

Sanity check for style and testing how we instrument such systems welcome;
If you are busy no worries, nothing tremendously tricky or weird here 👍 // @avolokhov @yim-lee @tomerd @drexin

Mostly pinging since this is the first time we introduce more instrumentation into any of our OSS packages, and I believe this general pattern is the right to suggest and apply as we instrument other systems too, open to opinions though!

I'll follow up a bit in another project where and then merge and release this SWIM update as a minor bump.

Sources/SWIM/Metrics.swift Outdated Show resolved Hide resolved
Tests/SWIMNIOExampleTests/SWIMNIOMetricsTests.swift Outdated Show resolved Hide resolved
Tests/SWIMTestKit/TestMetrics.swift Outdated Show resolved Hide resolved
Tests/SWIMTests/SWIMMetricsTests.swift Outdated Show resolved Hide resolved
Copy link
Member

@drexin drexin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Co-authored-by: Yim Lee <yim_lee@apple.com>
@ktoso ktoso changed the title [WIP] Instrument the instance and shell using Swift Metrics #56 Instrument the instance and shell using Swift Metrics #56 Oct 2, 2020
@ktoso ktoso marked this pull request as ready for review October 2, 2020 07:49
@ktoso
Copy link
Member Author

ktoso commented Oct 2, 2020

Thank you for reviews everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SWIM + Metrics: Expose SWIM metrics, like how long pings take to come back etc
5 participants