Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/geth: actually enable metrics when passed via toml #30781

Closed

Conversation

lightclient
Copy link
Member

closes #28303, replaces #28101

--

When we added metrics to the config toml back in #22083, they didn't actually seem to get enabled when you start the client with the toml. That is fixed here by only starting the metrics after we have parsed the config for it.

You can verify it's working by running this snippet:

$ go run ./cmd/geth --metrics --metrics.addr 0.0.0.0 dumpconfig > conf && go run ./cmd/geth --config conf -dev 2>&1 | grep metrics
INFO [11-21|21:02:06.148] Enabling metrics collection
INFO [11-21|21:02:06.148] Enabling stand-alone metrics HTTP endpoint address=0.0.0.0:6060
INFO [11-21|21:02:06.148] Starting metrics server                  addr=http://0.0.0.0:6060/debug/metrics

@holiman
Copy link
Contributor

holiman commented Nov 21, 2024

Some additional tests

^C[user@work go-ethereum]$ go run ./cmd/geth --metrics --metrics.addr 0.0.0.0 dumpconfig > conf && go run ./cmd/geth --config conf -dev --metrics.addr 1.1.1.1 2>&1 | grep metrics
INFO [11-21|14:46:47.851] Maximum peer count                       ETH=50 total=50
INFO [11-21|14:46:47.873] Set global gas cap                       cap=50,000,000
INFO [11-21|14:46:47.873] Initializing the KZG library             backend=gokzg
INFO [11-21|14:46:51.231] Enabling metrics collection
INFO [11-21|14:46:51.231] Enabling stand-alone metrics HTTP endpoint address=1.1.1.1:6060
INFO [11-21|14:46:51.231] Starting metrics server                  addr=http://1.1.1.1:6060/debug/metrics
ERROR[11-21|14:46:51.232] Failure in running metrics server        err="listen tcp 1.1.1.1:6060: bind: cannot assign requested address"
^C
[user@work go-ethereum]$ go run ./cmd/geth --metrics --metrics.addr 0.0.0.0 dumpconfig > conf && go run ./cmd/geth --config conf -dev --metrics=false --metrics.addr 1.1.1.1 2>&1 | grep metrics
INFO [11-21|14:47:09.190] Maximum peer count                       ETH=50 total=50
INFO [11-21|14:47:09.211] Set global gas cap                       cap=50,000,000
INFO [11-21|14:47:09.211] Initializing the KZG library             backend=gokzg

👍

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@holiman holiman added this to the 1.14.13 milestone Nov 21, 2024
@holiman
Copy link
Contributor

holiman commented Nov 22, 2024

Actually, this doesn't quite work, as is. Check this:

diff --git a/metrics/metrics.go b/metrics/metrics.go
index c7fe5c7333..dae5217653 100644
--- a/metrics/metrics.go
+++ b/metrics/metrics.go
@@ -129,6 +129,7 @@ func readRuntimeStats(v *runtimeStats) {
 func CollectProcessMetrics(refresh time.Duration) {
        // Short circuit if the metrics system is disabled
        if !Enabled {
+               log.Info("Nah I don't wanna do metrics")
                return
        }
[user@work go-ethereum]$ go run ./cmd/geth --metrics --metrics.addr 0.0.0.0 dumpconfig > conf && go run ./cmd/geth --config conf -dev  2>&1 | grep metrics
INFO [11-22|08:46:03.217] Maximum peer count                       ETH=50 total=50
INFO [11-22|08:46:03.234] Set global gas cap                       cap=50,000,000
INFO [11-22|08:46:03.234] Initializing the KZG library             backend=gokzg
INFO [11-22|08:46:06.278] Enabling metrics collection
INFO [11-22|08:46:06.278] Enabling stand-alone metrics HTTP endpoint address=0.0.0.0:6060
INFO [11-22|08:46:06.278] Starting metrics server                  addr=http://0.0.0.0:6060/debug/metrics
INFO [11-22|08:46:06.278] Nah I don't wanna do metrics

The metrics.CollectProcessMetrics uses a package-global variable Enabled. It needs to be configured by external machinery.

@holiman
Copy link
Contributor

holiman commented Nov 22, 2024

So, as things work now, the --metrics flag must be set explicitly. Because that flag is parsed in the metrics package init, setting the global metrics.Enabled.

So, this PR does work in that it can set the metrics-parameters (creds, ports etc), but it cannot override the --metrics flag. So, we should detect if there's a mismatch and notify the user. Or somehow fix it so that we can enable metrics from config, but that seems difficult. Needs to be done early in the package load.

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, computer says no

@lightclient
Copy link
Member Author

lightclient commented Nov 22, 2024

Nice catch. What do you think about 3fe6879?

You can curl to see it is really putting out data:

curl http://0.0.0.0:6060/debug/metrics

Same hack that we already have except now we peek into the toml. This code path will execute very rarely and the toml lib is already a dependency. So seems reasonable?

@lightclient lightclient force-pushed the geth-metrics-from-cfg branch from 3fe6879 to 71d7b3b Compare November 22, 2024 13:53
@lightclient lightclient force-pushed the geth-metrics-from-cfg branch from 1359296 to 0e775b7 Compare November 22, 2024 13:56
@holiman
Copy link
Contributor

holiman commented Nov 22, 2024

Nice catch. What do you think about 3fe6879?

Hm. Yeah.. It's dirty but it works. I guess it's just another little slide, in the slope ;-)

metrics/metrics.go Outdated Show resolved Hide resolved
metrics/metrics.go Outdated Show resolved Hide resolved
@lightclient lightclient force-pushed the geth-metrics-from-cfg branch from 0e775b7 to dde11d6 Compare November 22, 2024 21:51
@lightclient lightclient force-pushed the geth-metrics-from-cfg branch from dde11d6 to 1635da8 Compare November 22, 2024 21:53
Comment on lines +47 to +49
// using the flags package in the first place! So instead, let's chop off the
// first element in the args list each time a parse fails. This way we will
// eventually parse every arg and get the ones we care about.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks flaky...?

Let's assume metrics is enabled in toml config. Won't these two behave differently:

  1. geth --metrics=false --conf=conf.toml will correctly set Enabled=false.
  2. geth --conf=conf.toml --metrics=false will incorrectly set Enabled=true.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this case, the order of precedence is always flags first, config second. However, the way init() is written in metrics/metrics.go, Enabled is always set to true in case either a flag or config specifies it should be. This is also how it was previously.

So for both 1) and 2) Enabled will be set to true, but we will not actually serve the metrics because that occurs later in the geth command start up. That is where the precedence comes in.

How would you prefer to deal with this? I could change the parsing in init() so that when the flag metrics is set, it always uses that value, regardless of config value. Otherwise could detect a mismatch (enabled in config, disabled explicitly by flag) during client startup and fail there instead.

Otherwise, we can leave as-is. Explicitly setting --metrics=false while actually passing a config with metrics enabled is pretty unlikely and we will not serve the metrics as expected. It will just cause geth to record the metrics internally.

I think my pref is to just fail when metrics are enabled in config, but explicitly disabled by the flag. It is simple to deal with:

diff --git a/cmd/geth/config.go b/cmd/geth/config.go
index 5f252ebf1..200b24109 100644
--- a/cmd/geth/config.go
+++ b/cmd/geth/config.go
@@ -289,6 +289,10 @@ func dumpConfig(ctx *cli.Context) error {

 func applyMetricConfig(ctx *cli.Context, cfg *gethConfig) {
        if ctx.IsSet(utils.MetricsEnabledFlag.Name) {
+               enabled := ctx.Bool(utils.MetricsEnabledFlag.Name)
+               if cfg.Metrics.Enabled && !enabled {
+                       utils.Fatalf("Flag --metrics mismatch with provided config.")
+               }
                cfg.Metrics.Enabled = ctx.Bool(utils.MetricsEnabledFlag.Name)
        }
        if ctx.IsSet(utils.MetricsEnabledExpensiveFlag.Name) {

What do you think?

@fjl
Copy link
Contributor

fjl commented Nov 26, 2024

@holiman will refactor package metrics to allow enabling at runtime.

@fjl fjl closed this Nov 26, 2024
@rjl493456442 rjl493456442 removed this from the 1.14.13 milestone Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics options are ignored in config.toml
4 participants