Can't understand how to use metrics in 0.18.0 #914

nappa85 · 2022-11-09T18:11:39Z

I kept running with 0.17.0 because I saw there have been a lot of breaking changes and didn't have the time to tackle it.

Today I found the time, but, with horror, I found out there is no metrics documentation, no metrics examples, a kitchen sink example written inside code comments (https://docs.rs/opentelemetry-otlp/latest/opentelemetry_otlp/#kitchen-sink-full-configuration) that is outdated, code comes from 0.17.0 and won't compile on 0.18.0

I'll try to cover single topics, because I understand sending two big blocks of code pretending you to find the differences isn't the correct approach.

Fist of all, I've structured my opentelemetry code in a crate shared between all my applications, this way configuration is centralized and single applications only needs to define metrics.

Configuration

With 0.17.0 I had:

static PUSH_CONTROLLER: OnceCell<sdk::metrics::PushController> = OnceCell::const_new();

fn start_metric_otlp(url: &str, attributes: Option<&HashMap<String, String>>, service: &ServiceConfig) {
    let push_controller = opentelemetry_otlp::new_pipeline()
        .metrics(tokio::spawn, delayed_interval)
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint(url)
                .with_protocol(opentelemetry_otlp::Protocol::Grpc)
                .with_metadata(get_metadata(attributes)),
        )
        .with_aggregator_selector(sdk::metrics::selectors::simple::Selector::from(
            service.aggregator_selector.as_ref().unwrap_or_default(),
        ))
        .with_export_kind(sdk::export::metrics::ExportKindSelector::from(
            service.export_kind.as_ref().unwrap_or_default(),
        ))
        .with_resource(get_resources())
        .build()
        .unwrap();
    PUSH_CONTROLLER.set(push_controller).unwrap();
}

where PUSH_CONTROLLER was here only to avoid controller to be dropped

As aggregator_selector I used "exact" and as export_kind I used "delta" after discussing in #677, and it worked like a charm.

After upgrading to 0.18.0 I've end up with this code:

static PUSH_CONTROLLER: OnceCell<sdk::metrics::controllers::BasicController> = OnceCell::new();

fn start_metric_otlp(url: &str, attributes: Option<&HashMap<String, String>>, service: &ServiceConfig) {
    let controller = opentelemetry_otlp::new_pipeline()
        .metrics(AggregatorSelector, service.export_kind, runtime::Tokio)
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint(url)
                .with_protocol(opentelemetry_otlp::Protocol::Grpc)
                .with_metadata(get_metadata(attributes)),
        )
        .with_resource(get_resources())
        .build()
        .unwrap();
    PUSH_CONTROLLER.set(controller).unwrap();
}

Quite similar, differences are subtle.

My ExportKind enum has become an implementor of sdk::export::metrics::aggregation::TemporalitySelector:

impl sdk::export::metrics::aggregation::TemporalitySelector for ExportKind {
    fn temporality_for(
        &self,
        descriptor: &sdk::metrics::sdk_api::Descriptor,
        kind: &sdk::export::metrics::aggregation::AggregationKind,
    ) -> sdk::export::metrics::aggregation::Temporality {
        match self {
            ExportKind::Cumulative => {
                sdk::export::metrics::aggregation::cumulative_temporality_selector().temporality_for(descriptor, kind)
            }
            ExportKind::Delta => {
                sdk::export::metrics::aggregation::delta_temporality_selector().temporality_for(descriptor, kind)
            }
            ExportKind::Stateless => {
                sdk::export::metrics::aggregation::stateless_temporality_selector().temporality_for(descriptor, kind)
            }
        }
    }
}

I've lost the config for aggregator_selector because now I need to implement trait sdk::export::metrics::AggregatorSelector and decide based on Descriptor.

I've found and example in this file and end up with this code:

#[derive(Debug, Default)]
struct AggregatorSelector;

impl sdk::export::metrics::AggregatorSelector for AggregatorSelector {
    fn aggregator_for(&self, descriptor: &Descriptor) -> Option<Arc<dyn Aggregator + Send + Sync>> {
        match descriptor.name() {
            name if name.ends_with(".hits") => Some(Arc::new(sdk::metrics::aggregators::last_value())),
            name if name.ends_with(".duration") => Some(Arc::new(sdk::metrics::aggregators::histogram(&[]))),
            _ => panic!("Invalid instrument name for test AggregatorSelector: {}", descriptor.name()),
        }
    }
}

Here, for me *.hits are metrics like how many times an API is called, and *.duration are metrics like query duration.

Metrics Definition

With 0.17.0 setting up a metric for me was like

static MYSQL_QUERY_HISTOGRAM: Lazy<ValueRecorder<f64>> = Lazy::new(|| {
    get_meter()// gets a common meter from shared crate
        .f64_value_recorder("mysql.duration")
        .with_description("MySQL query latencies")
        .with_unit(Unit::new("s"))
        .init()
}

Now with 0.18.0 it has become

static MYSQL_QUERY_HISTOGRAM: Lazy<Histogram<f64>> = Lazy::new(|| {
    get_meter()// gets a common meter from shared crate
        .f64_histogram("mysql.duration")
        .with_description("MySQL query latencies")
        .with_unit(Unit::new("s"))
        .init()
});

Metrics Usage

With 0.17.0 using a metric for me was like

    MYSQL_QUERY_HISTOGRAM.record(
        info.elapsed.as_secs_f64(),
        &[
            KeyValue::new("query", info.statement.sql.clone()),
            KeyValue::new("pool", pool),
            KeyValue::new("failed", info.failed),
        ],
    );

Now with 0.18.0 it has become

    let cx = Context::current();
    MYSQL_QUERY_HISTOGRAM.record(
        &cx,
        info.elapsed.as_secs_f64(),
        &[
            KeyValue::new("query", info.statement.sql.clone()),
            KeyValue::new("pool", pool),
            KeyValue::new("failed", info.failed),
        ],
    );

Only the Context has been added, not a big deal until calling Context::current is always the correct approach

Upgrade result

Now, summing everything up, using ExportKind::Delta doesn't seems to work anymore, I get this error on stderr:

OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (Unknown error): , detailed error message: transport error

I have no idea where to look to debug it.

Switching to ExportKind::Stateless, *.hits metrics, that means aggregators::last_value, works, but aggregators::histogram doesn't.

No errors on stderr this time.

I've read the code, I understand aggregators::histogram expects reference values to split values between buckets, but I don't have reference values, I have so many queries with so many different mean times...

Closing up

I've probably done something horribly wrong in this upgrade, but without proper documentation and working examples it's quite hard to understand what's wrong and what's not.

edit: added metric usage section

The text was updated successfully, but these errors were encountered:

nappa85 · 2022-11-10T08:58:23Z

Oh, I was wrong, the error

OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (Unknown error): , detailed error message: transport error

is still there also with ExportKind::Stateless, simply it doesn't appear immediately

jtescher · 2023-07-30T16:48:21Z

resolved in metrics rewrite in #1156

DerekStrickland mentioned this issue Mar 16, 2023

Proposal: BasicController to MeterProvider #976

Closed

cijothomas mentioned this issue May 9, 2023

Docs for Metrics - add basic and advanced examples #1060

Open

jtescher closed this as completed Jul 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't understand how to use metrics in 0.18.0 #914

Can't understand how to use metrics in 0.18.0 #914

nappa85 commented Nov 9, 2022 •

edited

Loading

nappa85 commented Nov 10, 2022

jtescher commented Jul 30, 2023

Can't understand how to use metrics in 0.18.0 #914

Can't understand how to use metrics in 0.18.0 #914

Comments

nappa85 commented Nov 9, 2022 • edited Loading

Configuration

Metrics Definition

Metrics Usage

Upgrade result

Closing up

nappa85 commented Nov 10, 2022

jtescher commented Jul 30, 2023

nappa85 commented Nov 9, 2022 •

edited

Loading