Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated example for custom metrics and add backwards compatibility warnings and upgrade guide for metrics APIs #2516

Merged
merged 13 commits into from
Aug 24, 2023
Merged
153 changes: 121 additions & 32 deletions docs/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
* [Custom Metrics API](#custom-metrics-api)
* [Logging custom metrics](#log-custom-metrics)
* [Metrics YAML Parsing and Metrics API example](#Metrics-YAML-File-Parsing-and-Metrics-API-Custom-Handler-Example)
* [Backwards compatibility warnings and upgrade guide](#backwards-compatibility-warnings-and-upgrade-guide)

## Introduction

Expand All @@ -28,7 +29,7 @@ Metrics are collected by default at the following locations in `log` mode:

The location of log files and metric files can be configured in the [log4j2.xml](https://github.com/pytorch/serve/blob/master/frontend/server/src/main/resources/log4j2.xml) file

In `prometheus` mode, all metrics are made available in prometheus format via the [metrics](https://github.com/pytorch/serve/blob/master/docs/metrics_api.md) API endpoint.
In `prometheus` mode, all metrics are made available in prometheus format via the [metrics API endpoint](https://github.com/pytorch/serve/blob/master/docs/metrics_api.md).

## Frontend Metrics

Expand Down Expand Up @@ -187,12 +188,17 @@ model_metrics: # backend metrics
```


Default metrics are provided in the [metrics.yaml](https://github.com/pytorch/serve/blob/master/ts/configs/metrics.yaml) file, but the user can either delete them to their liking / ignore them altogether, because these metrics will not be emitted unless they are edited.
Note that **only** the metrics defined in the **metrics configuration file** can be emitted to logs or made available via the metrics API endpoint. This is done to ensure that the metrics configuration file serves as a central inventory of all the metrics that Torchserve can emit.

Default metrics are provided in the [metrics.yaml](https://github.com/pytorch/serve/blob/master/ts/configs/metrics.yaml) file, but the user can either delete them to their liking / ignore them altogether, because these metrics will not be emitted unless they are edited.\
When adding custom `model_metrics` in the metrics configuration file, ensure to include `ModelName` and `Level` dimension names towards the end of the list of dimensions since they are included by default by the following custom metrics APIs:
[add_metric](#function-api-to-add-generic-metrics-with-default-dimensions), [add_counter](#add-counter-based-metrics),
[add_time](#add-time-based-metrics), [add_size](#add-size-based-metrics) or [add_percent](#add-percentage-based-metrics).


### How it works

Whenever torchserve starts, the [backend worker](https://github.com/pytorch/serve/blob/master/ts/model_service_worker.py) initializes `service.context.metrics` with the [MetricsCache](https://github.com/pytorch/serve/blob/master/ts/metrics/metric_cache_yaml_impl.py) object. The `model_metrics` (backend metrics) section within the specified yaml file will be parsed, and Metric objects will be created based on the parsed section and added that are added to the cache.
Whenever torchserve starts, the [backend worker](https://github.com/pytorch/serve/blob/master/ts/model_service_worker.py) initializes `service.context.metrics` with the [MetricsCache](https://github.com/pytorch/serve/blob/master/ts/metrics/metric_cache_yaml_impl.py) object. The `model_metrics` (backend metrics) section within the specified yaml file will be parsed, and Metric objects will be created based on the parsed section and added to the cache.

This is all done internally, so the user does not have to do anything other than specifying the desired yaml file.

Expand Down Expand Up @@ -243,7 +249,7 @@ When adding any metric via Metrics API, users have the ability to override the m
`metric_type=MetricTypes.[COUNTER/GAUGE/HISTOGRAM]`.

```python
metric1 = metrics.add_metric("GenericMetric", unit=unit, dimension_names=["name1", "name2", ...], metric_type=MetricTypes.GAUGE)
metric1 = metrics.add_metric_to_cache("GenericMetric", unit=unit, dimension_names=["name1", "name2", ...], metric_type=MetricTypes.GAUGE)
metric.add_or_update(value, dimension_values=["value1", "value2", ...])

# Backwards compatible, combines the above two method calls
Expand Down Expand Up @@ -311,31 +317,35 @@ dimN= Dimension(name_n, value_n)

One can add metrics with generic units using the following function.

Function API
#### Function API to add generic metrics without default dimensions

```python
def add_metric(self, metric_name: str, unit: str, idx=None, dimension_names: list = None,
metric_type: MetricTypes = MetricTypes.COUNTER) -> None:
def add_metric_to_cache(
self,
metric_name: str,
unit: str,
dimension_names: list = [],
metric_type: MetricTypes = MetricTypes.COUNTER,
) -> CachingMetric:
"""
Create a new metric and add into cache.
Add a metric which is generic with custom metrics
Create a new metric and add into cache. Override existing metric if already present.

Parameters
----------
metric_name: str
metric_name str
Name of metric
value: int, float
value of metric
unit: str
unit of metric
idx: int
request_id index in batch
dimensions: list
list of dimensions for the metric
metric_type: MetricTypes
Type of metric
unit str
unit can be one of ms, percent, count, MB, GB or a generic string
dimension_names list
list of dimension name strings for the metric
metric_type MetricTypes
Type of metric Counter, Gauge, Histogram
Returns
-------
newly created Metrics object
"""


def add_or_update(
self,
value: int or float,
Expand All @@ -360,10 +370,52 @@ Function API
# Add Distance as a metric
# dimensions = [dim1, dim2, dim3, ..., dimN]
# Assuming batch size is 1 for example
metric = metrics.add_metric('DistanceInKM', unit='km', dimension_names=[...])
metric = metrics.add_metric_to_cache('DistanceInKM', unit='km', dimension_names=[...])
metric.add_or_update(distance, dimension_values=[...])
```

Note that calling `add_metric_to_cache` will not emit the metric, `add_or_update` will need to be called on the metric object as shown above.

#### Function API to add generic metrics with default dimensions

```python
def add_metric(
self,
name: str,
value: int or float,
unit: str,
idx: str = None,
dimensions: list = [],
metric_type: MetricTypes = MetricTypes.COUNTER,
):
"""
Add a generic metric
Default metric type is counter

Parameters
----------
name : str
metric name
value: int or float
value of the metric
unit: str
unit of metric
idx: str
request id to be associated with the metric
dimensions: list
list of Dimension objects for the metric
metric_type MetricTypes
Type of metric Counter, Gauge, Histogram
"""
```

```python
# Add Distance as a metric
# dimensions = [dim1, dim2, dim3, ..., dimN]
metric = metrics.add_metric('DistanceInKM', value=10, unit='km', dimensions=[...])
```


### Add time-based metrics

**Time-based metrics are defaulted to a `GAUGE` metric type**
Expand All @@ -373,15 +425,15 @@ Add time-based by invoking the following method:
Function API

```python
def add_time(self, metric_name: str, value: int or float, idx=None, unit: str = 'ms', dimensions: list = None,
def add_time(self, name: str, value: int or float, idx=None, unit: str = 'ms', dimensions: list = None,
metric_type: MetricTypes = MetricTypes.GAUGE):
"""
Add a time based metric like latency, default unit is 'ms'
Default metric type is gauge

Parameters
----------
metric_name : str
name : str
metric name
value: int
value of metric
Expand Down Expand Up @@ -418,15 +470,15 @@ Add size-based metrics by invoking the following method:
Function API

```python
def add_size(self, metric_name: str, value: int or float, idx=None, unit: str = 'MB', dimensions: list = None,
def add_size(self, name: str, value: int or float, idx=None, unit: str = 'MB', dimensions: list = None,
metric_type: MetricTypes = MetricTypes.GAUGE):
"""
Add a size based metric
Default metric type is gauge

Parameters
----------
metric_name : str
name : str
metric name
value: int, float
value of metric
Expand Down Expand Up @@ -463,15 +515,15 @@ Percentage based metrics can be added by invoking the following method:
Function API

```python
def add_percent(self, metric_name: str, value: int or float, idx=None, dimensions: list = None,
def add_percent(self, name: str, value: int or float, idx=None, dimensions: list = None,
metric_type: MetricTypes = MetricTypes.GAUGE):
"""
Add a percentage based metric
Default metric type is gauge

Parameters
----------
metric_name : str
name : str
metric name
value: int, float
value of metric
Expand All @@ -485,6 +537,8 @@ Function API

```

**Inferred unit**: `percent`

To add custom percentage-based metrics:

```python
Expand All @@ -503,26 +557,25 @@ Counter based metrics can be added by invoking the following method
Function API

```python
def add_counter(self, metric_name: str, value: int or float, idx=None, dimensions: list = None,
metric_type: MetricTypes = MetricTypes.COUNTER):
def add_counter(self, name: str, value: int or float, idx=None, dimensions: list = None):
"""
Add a counter metric or increment an existing counter metric
Default metric type is counter
Parameters
----------
metric_name : str
name : str
metric name
value: int or float
value of metric
idx: int
request_id index in batch
dimensions: list
list of dimensions for the metric
metric_type: MetricTypes
type for defining different operations, defaulted to counter metric type for Counter metrics
"""
```

**Inferred unit**: `count`

### Getting a metric

Users can get a metric from the cache. The Metric object is returned, so the user can access the methods of the Metric: (i.e. `Metric.update(value)`, `Metric.__str__`)
Expand Down Expand Up @@ -622,3 +675,39 @@ class CustomHandlerExample:
# except this time with gauge metric type object
metrics.add_size("GaugeModelMetricNameExample", 42.5)
```

## Backwards compatibility warnings and upgrade guide
1. Starting [v0.6.1](https://github.com/pytorch/serve/releases/tag/v0.6.1), the `add_metric` API signature changed\
from: [add_metric(name, value, unit, idx=None, dimensions=None)](https://github.com/pytorch/serve/blob/61f1c4182e6e864c9ef1af99439854af3409d325/ts/metrics/metrics_store.py#L184)\
to: [add_metric(metric_name, unit, dimension_names=None, metric_type=MetricTypes.COUNTER)](https://github.com/pytorch/serve/blob/35ef00f9e62bb7fcec9cec92630ae757f9fb0db0/ts/metrics/metric_cache_abstract.py#L272).\
In versions greater than v0.8.1 the `add_metric` API signature was updated to support backwards compatibility:\
from: [add_metric(metric_name, unit, dimension_names=None, metric_type=MetricTypes.COUNTER)](https://github.com/pytorch/serve/blob/35ef00f9e62bb7fcec9cec92630ae757f9fb0db0/ts/metrics/metric_cache_abstract.py#L272)\
to: `add_metric(name, value, unit, idx=None, dimensions=[], metric_type=MetricTypes.COUNTER)`\
Usage of the new API is shown [above](#specifying-metric-types).\
**Upgrade paths**:
- **[< v0.6.1] to [v0.6.1 - v0.8.1]**\
There are two approaches available when migrating to the new custom metrics API:
- Replace the call to `add_metric` with calls to the following methods:
```python
metric1 = metrics.add_metric("GenericMetric", unit=unit, dimension_names=["name1", "name2", ...], metric_type=MetricTypes.GAUGE)
metric1.add_or_update(value, dimension_values=["value1", "value2", ...])
```
- Replace the call to `add_metric` in versions prior to v0.6.1 with one of the suitable custom metrics APIs where applicable: [add_counter](#add-counter-based-metrics), [add_time](#add-time-based-metrics),
[add_size](#add-size-based-metrics) or [add_percent](#add-percentage-based-metrics)
- **[< v0.6.1] to [> v0.8.1]**\
The call to `add_metric` is backwards compatible but the metric type is inferred to be `COUNTER`. If the metric is of a different type, an additional argument `metric_type` will need to be provided to the `add_metric`
call shown below
```python
metrics.add_metric(name='GenericMetric', value=10, unit='count', dimensions=[...], metric_type=MetricTypes.GAUGE)
```
- **[v0.6.1 - v0.8.1] to [> v0.8.1]**\
Replace the call to `add_metric` with `add_metric_to_cache`.
2. Starting [v0.8.0](https://github.com/pytorch/serve/releases/tag/v0.8.0), only metrics that are defined in the metrics config file(default: [metrics.yaml](https://github.com/pytorch/serve/blob/master/ts/configs/metrics.yaml))
are either all logged to `ts_metrics.log` and `model_metrics.log` or made available via the [metrics API endpoint](https://github.com/pytorch/serve/blob/master/docs/metrics_api.md)
based on the `metrics_mode` configuration as described [above](#introduction).\
The default `metrics_mode` is `log` mode.\
This is unlike in previous versions where all metrics were only logged to `ts_metrics.log` and `model_metrics.log` except for `ts_inference_requests_total`, `ts_inference_latency_microseconds` and `ts_queue_latency_microseconds`
which were only available via the metrics API endpoint.\
**Upgrade paths**:
- **[< v0.8.0] to [>= v0.8.0]**\
Specify all the custom metrics added to the custom handler in the metrics configuration file as shown [above](#central-metrics-yaml-file-definition).
Loading