-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: Support fetching LFC metrics (but don't use them yet) #895
Conversation
4e7edce
to
bb282a5
Compare
8f2eb8b
to
d3fa7a6
Compare
f6217dd
to
a3f9156
Compare
a3f9156
to
4c461f8
Compare
4c461f8
to
288f86e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes a little bit of time to review for me, as I am not used to this many generics usages. Usually, I would advocate against it in Go, relaying on dynamic dispatch/code duplication. But since we have Rust in the picture, maybe I should get used to it.
LMK if you come up the a way to make it less generic
pkg/agent/runner.go
Outdated
ctx context.Context, | ||
logger *zap.Logger, | ||
newMetrics func(metrics core.Metrics, withLock func()), | ||
kind string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have it as a part of an interface on C?
Also, C is metrics names, right, maybe should be Names
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have it as a part of an interface on C?
Not quite sure what you're referring to here — is it kind
?
Also, C is metrics names, right, maybe should be
Names
?
I was originally thinking we'd want some per-VM configuration for metrics for feature-gating new behavior, but upon reflection, I think implementing that properly is actually more effort than it's worth.
At the same time... I think the semantics are clearer if the interface defines it as "config" for metrics parsing, and then something external provides that "config" - even if it's always just the metric names in practice.
For now, I've added some comments to FromPrometheus
(f47097a). LMK what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what you're referring to here — is it kind?
Yes. Can it be a method returning constant?
I've added some comments to FromPrometheus
But the implementation is actually called SystemMetricNames
. IMO, either both type parameter and implementations needs to be called Names
, or both Config
, even if config only contains names.
UPD: or is it not? It seems I got myself confused in the hierarchy of types, and which ones are generic, and which are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in MetricsSourceConfig[C]
below, C
is actually metric names, as a type parameters to a metrics config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 5dc98a9 satisfies this (in addition to the changes from 4b06d09
..0bb2b59
?
Discussion from the 1-1 meeting: scraping sql_exporter from the autoscaling-agent is the right way to go. I'll add a separate sql_exporter solely for autoscaling metrics. For now, we can just scrape the current one with a reasonable interval. |
fe8737b
to
24f5c94
Compare
pkg/agent/runner.go
Outdated
ctx context.Context, | ||
logger *zap.Logger, | ||
newMetrics func(metrics core.Metrics, withLock func()), | ||
kind string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what you're referring to here — is it kind?
Yes. Can it be a method returning constant?
I've added some comments to FromPrometheus
But the implementation is actually called SystemMetricNames
. IMO, either both type parameter and implementations needs to be called Names
, or both Config
, even if config only contains names.
UPD: or is it not? It seems I got myself confused in the hierarchy of types, and which ones are generic, and which are not.
As discussed in neondatabase/autoscaling#895, we want to have a separate sql_exporter for simple metrics to avoid overload the database because the autoscaling agent needs to scrape at a higher interval. The new exporter is exposed at port 9499. I didn't do any testing for this pull request but given it's just a configuration change I assume this works. Signed-off-by: Alex Chi Z <chi@neon.tech>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mostly nits are left, we should discuss them verbally
pkg/agent/runner.go
Outdated
ctx context.Context, | ||
logger *zap.Logger, | ||
newMetrics func(metrics core.Metrics, withLock func()), | ||
kind string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in MetricsSourceConfig[C]
below, C
is actually metric names, as a type parameters to a metrics config.
pkg/agent/config.go
Outdated
ec.Add(c.Metrics.LFC.Names.Validate()) | ||
erc.Whenf(ec, c.Metrics.LFC.RequestTimeoutSeconds == 0, zeroTmpl, ".metrics.lfc.requestTimeoutSeconds") | ||
erc.Whenf(ec, c.Metrics.LFC.SecondsBetweenRequests == 0, zeroTmpl, ".metrics.lfc.secondsBetweenRequests") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to deduplicate it similar to how I did it in my billing PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in neondatabase/autoscaling#895, we want to have a separate sql_exporter for simple metrics to avoid overload the database because the autoscaling agent needs to scrape at a higher interval. The new exporter is exposed at port 9499. I didn't do any testing for this pull request but given it's just a configuration change I assume this works. Signed-off-by: Alex Chi Z <chi@neon.tech>
Discussion with @Omrigan : It's fine to just hard-code the metric names, and that makes it a lot simpler. Some discussion about whether to encapsulate the callbacks for metrics collection into a single struct with |
There's a lot of places where we have ad-hoc ways to construct pointers to thigns - either via temporary variables, referencing the first element in a slice, or by a local function to produce pointers. We can replace those with lo.ToPtr (like what the control plane already uses) to help with readability and consistency. Originally discussed here: github.com//pull/895#discussion_r1576977602
24f5c94
to
4b06d09
Compare
New changes:
edit: And the first commit has been spun off into a separate PR: #948. |
There's a lot of places where we have ad-hoc ways to construct pointers to things - either via temporary variables, referencing the first element in a slice, or by a local function to produce pointers. We can replace those with lo.ToPtr (like what the control plane already uses) to help with readability and consistency. Originally discussed here: #895 (comment)
666e11f
to
34afd60
Compare
Specifically, where we spawn background workers in (*Runner).Run(), this commit renames the inner function's context and logger: * 'c' → 'ctx2' * 'l' → 'logger2' ref #895 (comment)
ref #895 (comment) Co-authored-by: Oleg Vasilev <oleg@neon.tech>
Continuing with the changelog approach, if only because it helps me manage the rebasing:
Plus a couple more changes to #948 based on review here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending comments and conflicts resolution.
Specifically, where we spawn background workers in (*Runner).Run(), this commit renames the inner function's context and logger: * 'c' → 'ctx2' * 'l' → 'logger2' ref #895 (comment)
66b9289
to
20bd2ba
Compare
Specifically, where we spawn background workers in (*Runner).Run(), this commit renames the inner function's context and logger: * 'c' → 'ctx2' * 'l' → 'logger2' ref #895 (comment)
20bd2ba
to
fb2f910
Compare
Final changes:
Waiting on some issues with latest release. Will hold off on merging for now to avoid accidentally adding much larger changes alongside a small fix. |
Also, switch from using custom parsing to the proper parser provided by the prometheus package(s). Also also, change the global config from specifying LoadPrefix to instead providing the names of all of the system metrics we need.
Another "make it generic" commit. Changes here are: 1. Add indirection in global config - instead of .Metrics.*, it's now .Metrics.System.* (referring to "system" metrics). 2. Rename pkg/agent/core.{Metrics => SystemMetrics} -- and all the associated functions that just referred to "metrics", like core.State's UpdateMetrics => UpdateSystemMetrics. 3. Make (*agent.Runner).getMetricsLoop() generic over the metrics and its configuration -- and because of that, it can't be a method.
It is conditionally enabled as part of api.ScalingConfig, defaulting to disabled with the current config. Currently the metrics don't go anywhere - this commit *just* sets up the piping so we can use it in the scaling algorithm later.
Specifically, where we spawn background workers in (*Runner).Run(), this commit renames the inner function's context and logger: * 'c' → 'ctx2' * 'l' → 'logger2' ref #895 (comment)
d0991a9
to
ca2850a
Compare
Also, switch from using custom parsing to the proper parser provided by the prometheus package(s). Also also, change the global config from specifying LoadPrefix to instead providing the names of all of the system metrics we need.
Another "make it generic" commit. Changes here are: 1. Add indirection in global config - instead of .Metrics.*, it's now .Metrics.System.* (referring to "system" metrics). 2. Rename pkg/agent/core.{Metrics => SystemMetrics} -- and all the associated functions that just referred to "metrics", like core.State's UpdateMetrics => UpdateSystemMetrics. 3. Make (*agent.Runner).getMetricsLoop() generic over the metrics and its configuration -- and because of that, it can't be a method.
It is conditionally enabled as part of api.ScalingConfig, defaulting to disabled with the current config. Currently the metrics don't go anywhere - this commit *just* sets up the piping so we can use it in the scaling algorithm later.
Probably best to review each commit separately.
Most of the commits here are just repositioning to make the final commit, which adds LFC metrics collection, as simple as possible.
Builds on #892, must not be merged before then.