Skip to content

Commit

Permalink
Start emitting disk usage metrics for domains
Browse files Browse the repository at this point in the history
At the moment, it is not possible to destroy instruments that send
metrics. Therefore, when a user removes a domain, there might be still
the metrics about it emitted. A temporary workaround is to restart the
pulpcore-api process to reload meters.

Ref: open-telemetry/opentelemetry-specification#2232

closes #4603
  • Loading branch information
lubosmj committed Feb 20, 2024
1 parent 489c68d commit bfb32e6
Show file tree
Hide file tree
Showing 11 changed files with 160 additions and 4 deletions.
3 changes: 3 additions & 0 deletions CHANGES/4603.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Started emitting metrics that report disk usage within a domain. The metrics are sent to the
collector every 60 seconds. The interval can be adjusted with the ``OTEL_METRIC_EXPORT_INTERVAL``
environemnt variable.
20 changes: 20 additions & 0 deletions docs/components.rst
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ Pulp can produce OpenTelemetry data, like the number of requests, active connect
`pulp-api` and `pulp-content` using OpenTelemetry. You can read more about
`OpenTelemetry here <https://opentelemetry.io>`_.

.. warning:: This feature is provided as a tech preview and could change in backwards incompatible
ways in the future.

If you are using `Pulp in One Container <https://pulpproject.org/pulp-in-one-container/>`_ or `Pulp Operator
<https://docs.pulpproject.org/pulp_operator/>`_ and want to enable it, you will need to set the following
environment variables:
Expand All @@ -184,3 +187,20 @@ and set the following environment variables:

You will need to run an instance of OpenTelemetry Collector. You can read more about the `OpenTelemetry
Collector here <https://opentelemetry.io/docs/collector/>`_.

**At the moment, the following data is recorded by Pulp:**

* Access to every API endpoint (an HTTP method, target URL, status code, and user agent).
* Access to every requested package (an HTTP method, target URL, status code, and user agent).
* Disk usage within a specific domain (total used disk space and the reference to the domain).

The information above is sent to the collector in the form of spans and metrics. Thus, the data is
emitted either based on the user interaction with the system or on a regular basis. Consult
`OpenTelemetry Traces <https://opentelemetry.io/docs/concepts/signals/traces/>`_ and
`OpenTelemetry Metrics <https://opentelemetry.io/docs/concepts/signals/metrics/>`_ to learn more.

.. note::
It is highly recommended to set the `OTEL_METRIC_EXPORT_INTERVAL <https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#periodic-exporting-metricreader>`_
environment variable to ``300000`` (5 minutes) to reduce the frequency of queries executed on
the Pulp's backend. This value represents the interval between emitted metrics and should be
set before runtime.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ Table of Contents
plugin_dev/index
rest_api
client_bindings
tech_preview
contributing/index
bugs-features
troubleshooting
Expand Down
2 changes: 1 addition & 1 deletion docs/release_process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Some possible failures of **Step 2**, above, include:
* If release-tag is for an existing release (by accident) , the workflow won't fail until the docs-pub. Cleaning this up can be Exciting.

Active branches as of 2023-05-16:
====================================
---------------------------------
* pulpcore

* 3.23 (galaxyNG/4.7)
Expand Down
9 changes: 9 additions & 0 deletions docs/tech_preview.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Tech Previews
=============

The following features are currently being released as part of tech preview:

* Support for Open Telemetry
* Upstream replicas
* Domains - Multi-Tenancy
* Complex filtering
23 changes: 22 additions & 1 deletion pulpcore/app/models/domain.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
from opentelemetry.metrics import Observation

from django.core.files.storage import get_storage_class, default_storage
from django.db import models
from django_lifecycle import hook, BEFORE_DELETE, BEFORE_UPDATE
from django_lifecycle import hook, BEFORE_DELETE, BEFORE_UPDATE, AFTER_CREATE

from pulpcore.app.models import BaseModel, AutoAddObjPermsMixin
from pulpcore.exceptions import DomainProtectedError
Expand Down Expand Up @@ -60,7 +62,26 @@ def _cleanup_orphans_pre_delete(self):
# Delete on by one to properly cleanup the storage.
artifact.delete()

@hook(AFTER_CREATE)
def _report_domain_disk_usage(self):
from pulpcore.app.util import DomainMetricsEmitterBuilder

DomainMetricsEmitterBuilder.build(self)

class Meta:
permissions = [
("manage_roles_domain", "Can manage role assignments on domain"),
]


def disk_usage_callback(domain):
from pulpcore.app.models import Artifact
from pulpcore.app.util import get_url

options = yield # noqa
while True:
distinct_artifacts = Artifact.objects.filter(pulp_domain=domain).distinct()
total_size = distinct_artifacts.aggregate(size=models.Sum("size", default=0))["size"]
options = yield [ # noqa
Observation(total_size, {"pulp_href": get_url(domain), "name": domain.name})
]
66 changes: 66 additions & 0 deletions pulpcore/app/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,18 @@
import gnupg

from django.conf import settings
from django.db.models import Sum
from django.urls import Resolver404, resolve, reverse
from opentelemetry import metrics

from rest_framework.serializers import ValidationError

from pulpcore.app.loggers import deprecation_logger
from pulpcore.app.apps import pulp_plugin_configs
from pulpcore.app import models
from pulpcore.exceptions.validation import InvalidSignatureError


# a little cache so viewset_for_model doesn't have to iterate over every app every time
_model_viewset_cache = {}

Expand Down Expand Up @@ -467,3 +471,65 @@ def cache_key(base_path):
base_path = [f"{domain.name}:{path}" for path in base_path]

return base_path


class DomainMetricsEmitterBuilder:
"""A builder class that initializes an emitter for recording domain's metrics.
If Open Telemetry is enabled, the builder configures a real emitter capable of sending data to
the collector. Otherwise, a no-op emitter is initialized. The real emitter utilizes the global
settings to send metrics.
By default, the emitter sends data to the collector every 60 seconds. Adjust the environment
variable OTEL_METRIC_EXPORT_INTERVAL accordingly if needed.
"""

class _DomainMetricsEmitter:
def __init__(self, domain):
self.domain = domain
self.meter = metrics.get_meter(f"domain.{domain.name}.meter")
self.instrument = self._init_emitting_total_size()

def _init_emitting_total_size(self):
return self.meter.create_observable_gauge(
name="disk_usage",
description="The total disk size by domain.",
callbacks=[self._disk_usage_callback()],
unit="Bytes",
)

def _disk_usage_callback(self):
from pulpcore.app.models import Artifact

options = yield # noqa

while True:
artifacts = Artifact.objects.filter(pulp_domain=self.domain).distinct()
total_size = artifacts.aggregate(size=Sum("size", default=0))["size"]
options = yield [ # noqa
metrics.Observation(
total_size, {"pulp_href": get_url(self.domain), "name": self.domain.name}
)
]

class _NoopEmitter:
def __call__(self, *args, **kwargs):
return self

def __getattr__(self, *args, **kwargs):
return self

@classmethod
def build(cls, domain):
otel_enabled = os.getenv("PULP_OTEL_ENABLED")
if otel_enabled == "true" and settings.DOMAIN_ENABLED:
return cls._DomainMetricsEmitter(domain)
else:
return cls._NoopEmitter()


def init_domain_metrics_exporter():
from pulpcore.app.models.domain import Domain

for domain in Domain.objects.all():
DomainMetricsEmitterBuilder.build(domain)
4 changes: 3 additions & 1 deletion pulpcore/app/wsgi.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@
from opentelemetry.instrumentation.wsgi import OpenTelemetryMiddleware

from pulpcore.app.entrypoint import using_pulp_api_worker

from pulpcore.app.util import init_domain_metrics_exporter

if not using_pulp_api_worker.get(False):
raise RuntimeError("This app must be executed using pulpcore-api entrypoint.")

application = get_wsgi_application()
application = OpenTelemetryMiddleware(application)

init_domain_metrics_exporter()
7 changes: 6 additions & 1 deletion pulpcore/tasking/_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,12 @@
from django_guid.utils import generate_guid
from pulpcore.app.models import Task, TaskSchedule
from pulpcore.app.role_util import get_users_with_perms
from pulpcore.app.util import set_current_user, set_domain, configure_analytics, configure_cleanup
from pulpcore.app.util import (
set_current_user,
set_domain,
configure_analytics,
configure_cleanup,
)
from pulpcore.constants import TASK_FINAL_STATES, TASK_STATES, VAR_TMP_PULP
from pulpcore.exceptions import AdvisoryLockError
from pulpcore.tasking.tasks import dispatch, execute_task
Expand Down
21 changes: 21 additions & 0 deletions staging_docs/admin/learn/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,10 @@ Pulp can produce OpenTelemetry data, like the number of requests, active connect
`pulp-api` and `pulp-content` using OpenTelemetry. You can read more about
[OpenTelemetry here](https://opentelemetry.io).

!!! attention
This feature is provided as a tech preview and could change in backwards incompatible
ways in the future.

If you are using [Pulp in One Container](https://pulpproject.org/pulp-in-one-container/) or [Pulp Operator](https://docs.pulpproject.org/pulp_operator/) and want to enable it, you will need to set the following
environment variables:

Expand Down Expand Up @@ -169,3 +173,20 @@ and set the following environment variables:

You will need to run an instance of OpenTelemetry Collector. You can read more about the [OpenTelemetry
Collector here](https://opentelemetry.io/docs/collector/).

**At the moment, the following data is recorded by Pulp:**

- Access to every API endpoint (an HTTP method, target URL, status code, and user agent).
- Access to every requested package (an HTTP method, target URL, status code, and user agent).
- Disk usage within a specific domain (total used disk space and the reference to the domain).

The information above is sent to the collector in the form of spans and metrics. Thus, the data is
emitted either based on the user interaction with the system or on a regular basis. Consult
[OpenTelemetry Traces](https://opentelemetry.io/docs/concepts/signals/traces/) and
[OpenTelemetry Metrics](https://opentelemetry.io/docs/concepts/signals/metrics/) to learn more.

!!! note
It is highly recommended to set the [`OTEL_METRIC_EXPORT_INTERVAL`](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#periodic-exporting-metricreader)
environment variable to `300000` (5 minutes) to reduce the frequency of queries executed on the
Pulp's backend. This value represents the interval between emitted metrics and should be set
before runtime.
8 changes: 8 additions & 0 deletions staging_docs/admin/learn/tech-preview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Tech Previews

The following features are currently being released as part of tech preview:

- [Support for Open Telemetry](site:pulpcore/docs/admin/learn/architecture/#telemetry-support)
- Upstream replicas
- Domains - Multi-Tenancy
- Complex filtering

0 comments on commit bfb32e6

Please sign in to comment.