Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observe index gateway request count per tenant #9781

Closed
wants to merge 1 commit into from

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Jun 23, 2023

Warning: THIS POTENTIALLY ADDS A LOT OF SERIES (tenants * pods)


What this PR does / why we need it:

This change adds a new loki_index_gateway_client_requests_total which reports the total amount of requests performed by clients against the index gateway.

Even though, there is already the histogram metric loki_index_gateway_request_duration_seconds, which counts the requests, the new metric also reports the tenant ID as well as the status ("success" or "error"). The tenant ID cannot be reported by the former metric, because it is implemented as part of the generic gRPC client instrumentation.

The new metric allows to observe the RPS per tenant, so you can draw conclusions about the required index gateway shard size per tenant.

Note that the new metric is only used when the index gateway is running in ring mode.

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR

This change adds a new `loki_index_gateway_client_requests_total` which
reports the total amount of requests performed by clients against the
index gateway.

Even though, there is already the histogram metric
`loki_index_gateway_request_duration_seconds`, which counts the
requests, the new metric also reports the tenant ID as well as the
status ("success" or "error"). The tenant ID cannot be reported by the
former metric, because it is implemented as part of the generic gRPC
client instrumentation.

The new metric allows to observe the RPS per tenant, so you can draw
conclusions about the required index gateway shard size per tenant.

Note that the new metric is only used when the index gateway is running
in ring mode.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@chaudum chaudum requested a review from a team as a code owner June 23, 2023 13:04
@chaudum chaudum force-pushed the chaudum/per-tenant-index-gw-request-count branch from 5031d87 to 806ef0d Compare June 23, 2023 13:17
@chaudum chaudum marked this pull request as draft June 23, 2023 13:44
chaudum added a commit that referenced this pull request Jun 27, 2023
This commit add a counter metric `loki_index_gateway_requests_total` with labels `operation`, `tenant`, `status` for gRPC requests that are served by the index gateway.

**What for?**

The per-tenant RPS on the index gateway is used to derive the per-tenant shard factor.

**Why tracking on the server?**

Unlike tracking index gateway RPS on the client side, tracking on the server side does not yield that many series, even in multi-tenant installations with a lot of tenants, because the amount of index gateway instances is relatively small compared to the amount of queriers and frontends.

**Special notes for your reviewer**:

The previous approach of tracking requests on the client #9781 has been abandoned.


Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
chaudum added a commit that referenced this pull request Jun 27, 2023
This commit add a counter metric `loki_index_gateway_requests_total` with labels `operation`, `tenant`, `status` for gRPC requests that are served by the index gateway.

**What for?**

The per-tenant RPS on the index gateway is used to derive the per-tenant shard factor.

**Why tracking on the server?**

Unlike tracking index gateway RPS on the client side, tracking on the server side does not yield that many series, even in multi-tenant installations with a lot of tenants, because the amount of index gateway instances is relatively small compared to the amount of queriers and frontends.

**Special notes for your reviewer**:

The previous approach of tracking requests on the client #9781 has been abandoned.


Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
chaudum added a commit that referenced this pull request Jun 27, 2023
…#9804)

**This is a backport of #9797 to k156**

---

This commit add a counter metric `loki_index_gateway_requests_total`
with labels `operation`, `tenant`, `status` for gRPC requests that are
served by the index gateway.

**What for?**

The per-tenant RPS on the index gateway is used to derive the per-tenant
shard factor.

**Why tracking on the server?**

Unlike tracking index gateway RPS on the client side, tracking on the
server side does not yield that many series, even in multi-tenant
installations with a lot of tenants, because the amount of index gateway
instances is relatively small compared to the amount of queriers and
frontends.

**Special notes for your reviewer**:

The previous approach of tracking requests on the client
#9781 has been abandoned.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@chaudum chaudum closed this Jun 28, 2023
@chaudum chaudum deleted the chaudum/per-tenant-index-gw-request-count branch December 20, 2024 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants