Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement IORuntimeMetrics #3317

Merged
merged 4 commits into from
Nov 26, 2024
Merged

Conversation

iRevive
Copy link
Contributor

@iRevive iRevive commented Dec 16, 2022

There are a few notable notes regarding this PR:

  • The existing implementation of CpuStarvation MBean somewhat breaks the consistency. All previous MBeans were under cats.effect.unsafe.metrics package, while CpuStarvation is available under cats.effect.metrics. I didn't update the MBean name yet.
  • Platform-specific metrics can be defined in the IORuntimeMetricsPlatform. For example, JVM IORuntimeMetrics offers compute and cpuStarvation metrics. While JS and Native have only cpuStarvation.

Personally, the whole implementation feels clunky, hopefully we can find the right direction together.

This PR will not pass MiMa checks. Once we agree on a direction, I will address the bin compat issue.

@iRevive iRevive changed the title d of IORuntimeMetrics Draft implementation of IORuntimeMetrics Dec 16, 2022
@iRevive iRevive force-pushed the runtime-metrics branch 2 times, most recently from c10bee1 to 29ebdfa Compare December 16, 2022 14:22
@iRevive
Copy link
Contributor Author

iRevive commented Dec 16, 2022

And a small demo:

Demo
import cats.effect.{IO, IOApp}
import cats.effect.std.Random
import cats.effect.unsafe.IORuntimeMetrics
import cats.syntax.foldable._

import scala.concurrent.duration._

object Test extends IOApp.Simple {

  def run: IO[Unit] = {
    IO.delay(print(runtime.metrics)).flatMap(IO.println).delayBy(500.millis).foreverM.background.surround {
      Random.scalaUtilRandom[IO].flatMap { random =>
        val a = random.betweenInt(10, 3000).flatMap { int =>
          if (int % 2 == 0) IO.blocking(Thread.sleep(int)) else IO.delay(Thread.sleep(int))
        }.start.replicateA(100)

        a.flatMap(_.traverse_(_.join))
      } >> IO.never
    }
  }
  
  def print(metrics: IORuntimeMetrics): String = {
    s"""
      |Compute:
      |workerThreadCount: ${metrics.compute.map(_.workerThreadCount())}
      |activeThreadCount: ${metrics.compute.map(_.activeThreadCount())}
      |searchingThreadCount: ${metrics.compute.map(_.searchingThreadCount())}
      |blockedWorkerThreadCount: ${metrics.compute.map(_.blockedWorkerThreadCount())}
      |localQueueFiberCount: ${metrics.compute.map(_.localQueueFiberCount())}
      |suspendedFiberCount: ${metrics.compute.map(_.suspendedFiberCount())}
      |
      |CPU Starvation:
      |starvationCount: ${metrics.cpuStarvation.starvationCount()}
      |maxClockDriftMs: ${metrics.cpuStarvation.maxClockDriftMs()}
      |currentClockDriftMs: ${metrics.cpuStarvation.currentClockDriftMs()}
      |""".stripMargin
  }
}

Output:

Compute:
workerThreadCount: Some(8)
activeThreadCount: Some(8)
searchingThreadCount: Some(0)
blockedWorkerThreadCount: Some(13)
localQueueFiberCount: Some(0)
suspendedFiberCount: Some(2)

CPU Starvation:
starvationCount: 0
maxClockDriftMs: 0
currentClockDriftMs: 0

....

Compute:
workerThreadCount: Some(8)
activeThreadCount: Some(1)
searchingThreadCount: Some(0)
blockedWorkerThreadCount: Some(13)
localQueueFiberCount: Some(0)
suspendedFiberCount: Some(2)

CPU Starvation:
starvationCount: 0
maxClockDriftMs: 11
currentClockDriftMs: 3

@djspiewak djspiewak added this to the v3.5.0 milestone Jan 16, 2023
@iRevive
Copy link
Contributor Author

iRevive commented Jan 31, 2023

Mima is sad:

[error] cats-effect: Failed binary compatibility check against org.typelevel:cats-effect_3:3.4.2 (e:info.apiURL=https://typelevel.org/cats-effect/api/3.x/, e:info.versionScheme=early-semver)! Found 7 potential problems (filtered 2)
[error]  * class cats.effect.metrics.CpuStarvation does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("cats.effect.metrics.CpuStarvation")
[error]  * object cats.effect.metrics.CpuStarvation does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("cats.effect.metrics.CpuStarvation$")
[error]  * interface cats.effect.metrics.CpuStarvationMBean does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("cats.effect.metrics.CpuStarvationMBean")
[error]  * interface cats.effect.metrics.CpuStarvationMetrics does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("cats.effect.metrics.CpuStarvationMetrics")
[error]  * class cats.effect.metrics.JvmCpuStarvationMetrics does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("cats.effect.metrics.JvmCpuStarvationMetrics")
[error]  * object cats.effect.metrics.JvmCpuStarvationMetrics does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("cats.effect.metrics.JvmCpuStarvationMetrics$")
[error]  * class cats.effect.metrics.JvmCpuStarvationMetrics#NoOpCpuStarvationMetrics does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("cats.effect.metrics.JvmCpuStarvationMetrics$NoOpCpuStarvationMetrics")

Since these classes were private to metrics package, I assume it's safe to add an exclusion.

@iRevive
Copy link
Contributor Author

iRevive commented Apr 11, 2023

With #3526 the linking issue in ScalaJS is solved.

Copy link
Member

@djspiewak djspiewak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey it only took me two years to get to this. New record?

Sorry. :( So the implementation overall looks good. I think this sort of thing is always somewhat clunky so that's fine. We should probably make the mbean paths consistent and just publish the change in the release notes (so people know to update any monitoring keys). The unsafe packages are similarly annoying but I think the optimal solution to this.

So long as all the mbean paths are made consistent, I'm pretty okay merging this. Anyone have any strong objections?

@iRevive iRevive changed the title Draft implementation of IORuntimeMetrics Implement IORuntimeMetrics Nov 22, 2024
@iRevive
Copy link
Contributor Author

iRevive commented Nov 22, 2024

@djspiewak thanks for the review! Better late than never :)

I changed the namespace of the CPUStarvation MBean to cats.effect.unsafe.metrics.

The names are the following:

  • CPU starvation: cats.effect.unsafe.metrics:type=CpuStarvation
  • WSTP compute pool: cats.effect.unsafe.metrics:type=ComputePoolSampler-$hash
  • WSTP local queue: cats.effect.unsafe.metrics:type=LocalQueueSampler-$hash-$i
  • Live fiber dump: cats.effect.unsafe.metrics:type=LiveFiberSnapshotTrigger-$hash

@djspiewak
Copy link
Member

Looks good! Once we sort the merge conflicts we can land it.

@iRevive
Copy link
Contributor Author

iRevive commented Nov 22, 2024

Done. Waiting for CI. Nvm, waiting for #4180 first 😅

@armanbilge armanbilge closed this Nov 24, 2024
@armanbilge armanbilge reopened this Nov 24, 2024
@armanbilge
Copy link
Member

I think we should add metrics for the TimerHeap as well (number of outstanding timers, next one due to fire, maybe last one due to fire?).

I'd also like to add polling system metrics. All polling systems should be able to support # of outstanding I/O operations.

@iRevive
Copy link
Contributor Author

iRevive commented Nov 25, 2024

May we do this in the follow-up PR? The current PR is already big, and I would prefer to operate smaller changesets.


Getting stats (at least the number of timers) from the TimerHeap seems relatively simple. However, PollingSystem could require some changes
The naive idea is:

abstract class PollingSystem {
  def stats: PollingSystem.Stats
}
object PollingSystem {
  sealed trait Stats { 
    def outstandingOperations(): Long
  }
}

That way, we can enforce implementations to provide metrics.

I also found this one: #3686.

@iRevive iRevive requested a review from djspiewak November 25, 2024 18:58
@djspiewak djspiewak closed this Nov 26, 2024
@djspiewak djspiewak reopened this Nov 26, 2024
@djspiewak djspiewak merged commit dfda4ae into typelevel:series/3.x Nov 26, 2024
57 of 87 checks passed
@iRevive iRevive deleted the runtime-metrics branch November 27, 2024 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants