Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics.reporting-period-seconds doesn't works #15435

Closed
vividcloudpark opened this issue Aug 1, 2024 · 2 comments
Closed

metrics.reporting-period-seconds doesn't works #15435

vividcloudpark opened this issue Aug 1, 2024 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@vividcloudpark
Copy link

vividcloudpark commented Aug 1, 2024

What version of Knative?

1.14.1

Expected Behavior

https://knative.dev/docs/serving/observability/metrics/collecting-metrics/#understanding-the-collector
https://knative.dev/docs/serving/services/service-metrics/#exposing-queue-proxy-metrics

Per this Article,
I expected each metric is going to report as 30s interval when i set
metrics.reporting-period-seconds as 30s on config-observability
even prometheus scrape time set to 10s.

Actual Behavior

prometheus' value changed on 10s interval, (if config works, value should be change by 30s interval)
even when i set prometheus scrape time to 25s, it's interval goes to 25s.

i restart both deploy autoscaler, activator, it doesn't works.
it looks like metrics.request-metrics-reporting-period-seconds doesn't work.

Steps to Reproduce the Problem

set config-observability as below

  metrics.reporting-period-seconds: "30"
  metrics.request-metrics-reporting-period-seconds: "30"

set prometheus,yaml as below

    - job_name: activator
      scrape_interval: 25s
      scrape_timeout: 10s
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app, __meta_kubernetes_pod_container_port_name]
        action: keep
        regex: knative-serving;activator;metrics
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        target_label: service
@vividcloudpark vividcloudpark added the kind/bug Categorizes issue or PR as related to a bug. label Aug 1, 2024
@skonto
Copy link
Contributor

skonto commented Aug 2, 2024

Hi @vividcloudpark

Initially reporting-period was fixed for the OpenTelemetry collector (push model, see #14019). I think we need to update the docs as for the Prometheus exporter this has no effect although reporting period (metrics.request-metrics-reporting-period-seconds) is set correctly. Keep in mind that we use opencensus and unfortuntely the library is now archived.
Also i am not so sure if it makes sense to scrape a pod multiple times when you know that the metrics are not being update (scrape period << reporting period). Also note here that metrics.request-metrics-reporting-period-seconds was meant to configure QP only unlike metrics.reporting-period-seconds which is meant for all the other components.

Now here is why reporting period for Prometheus has no effect. When reporting period is changed in the exporter this is set here: https://github.com/knative/serving/blob/main/vendor/go.opencensus.io/stats/view/worker_commands.go#L178-L185
I verified that part. So then every that period the worker will try to export metrics: https://github.com/knative/serving/blob/main/vendor/go.opencensus.io/stats/view/worker.go#L296.
ReportUsage will call reportView and then exportView will be called: https://github.com/knative/serving/blob/main/vendor/go.opencensus.io/stats/view/worker.go#L376.
Prometheus and the ocagent (used with the OpenTelemetry collector) exporters have different implementations.
The ocagent does ship the metrics: https://github.com/knative/serving/blob/main/vendor/contrib.go.opencensus.io/exporter/ocagent/ocagent.go#L436 while the prometheus one does not export anything because of the pull model approach:
https://github.com/knative/serving/blob/main/vendor/contrib.go.opencensus.io/exporter/prometheus/prometheus.go#L102-L110

// Deprecated: in lieu of metricexport.Reader interface.
func (e *Exporter) ExportView(vd *view.Data) {
}

Now when we create the Prometheus exporter we do use the reader interface but that does not do anything besides being called in order to export all metrics at any given time an http is made eg. Prometheus scraping.
Read more here https://github.com/knative/serving/blob/main/vendor/contrib.go.opencensus.io/exporter/prometheus/prometheus.go#L137-L139.

// Collect is invoked every time a prometheus.Gatherer is run
// for example when the HTTP endpoint is invoked by Prometheus.
func (c *collector) Collect(ch chan<- prometheus.Metric) {
	me := &metricExporter{c: c, metricCh: ch}
	c.reader.ReadAndExport(me)
}

Note: There is an IntervalReader interface that calls ReadAndExport (by default every minute) https://github.com/knative/serving/blob/main/vendor/go.opencensus.io/metric/metricexport/reader.go#L148 but the Prometheus exporter provided by the Opencensus lib does not use it https://github.com/knative/serving/blob/main/vendor/contrib.go.opencensus.io/exporter/prometheus/prometheus.go#L148 and it uses the simple one as it relies on the HTTP call to report metrics.

Copy link

github-actions bot commented Nov 1, 2024

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 1, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

2 participants