Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't understand how to use metrics in 0.18.0 #914

Closed
nappa85 opened this issue Nov 9, 2022 · 2 comments
Closed

Can't understand how to use metrics in 0.18.0 #914

nappa85 opened this issue Nov 9, 2022 · 2 comments

Comments

@nappa85
Copy link

nappa85 commented Nov 9, 2022

I kept running with 0.17.0 because I saw there have been a lot of breaking changes and didn't have the time to tackle it.

Today I found the time, but, with horror, I found out there is no metrics documentation, no metrics examples, a kitchen sink example written inside code comments (https://docs.rs/opentelemetry-otlp/latest/opentelemetry_otlp/#kitchen-sink-full-configuration) that is outdated, code comes from 0.17.0 and won't compile on 0.18.0

I'll try to cover single topics, because I understand sending two big blocks of code pretending you to find the differences isn't the correct approach.

Fist of all, I've structured my opentelemetry code in a crate shared between all my applications, this way configuration is centralized and single applications only needs to define metrics.

Configuration

With 0.17.0 I had:

static PUSH_CONTROLLER: OnceCell<sdk::metrics::PushController> = OnceCell::const_new();

fn start_metric_otlp(url: &str, attributes: Option<&HashMap<String, String>>, service: &ServiceConfig) {
    let push_controller = opentelemetry_otlp::new_pipeline()
        .metrics(tokio::spawn, delayed_interval)
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint(url)
                .with_protocol(opentelemetry_otlp::Protocol::Grpc)
                .with_metadata(get_metadata(attributes)),
        )
        .with_aggregator_selector(sdk::metrics::selectors::simple::Selector::from(
            service.aggregator_selector.as_ref().unwrap_or_default(),
        ))
        .with_export_kind(sdk::export::metrics::ExportKindSelector::from(
            service.export_kind.as_ref().unwrap_or_default(),
        ))
        .with_resource(get_resources())
        .build()
        .unwrap();
    PUSH_CONTROLLER.set(push_controller).unwrap();
}

where PUSH_CONTROLLER was here only to avoid controller to be dropped

As aggregator_selector I used "exact" and as export_kind I used "delta" after discussing in #677, and it worked like a charm.

After upgrading to 0.18.0 I've end up with this code:

static PUSH_CONTROLLER: OnceCell<sdk::metrics::controllers::BasicController> = OnceCell::new();

fn start_metric_otlp(url: &str, attributes: Option<&HashMap<String, String>>, service: &ServiceConfig) {
    let controller = opentelemetry_otlp::new_pipeline()
        .metrics(AggregatorSelector, service.export_kind, runtime::Tokio)
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint(url)
                .with_protocol(opentelemetry_otlp::Protocol::Grpc)
                .with_metadata(get_metadata(attributes)),
        )
        .with_resource(get_resources())
        .build()
        .unwrap();
    PUSH_CONTROLLER.set(controller).unwrap();
}

Quite similar, differences are subtle.

My ExportKind enum has become an implementor of sdk::export::metrics::aggregation::TemporalitySelector:

impl sdk::export::metrics::aggregation::TemporalitySelector for ExportKind {
    fn temporality_for(
        &self,
        descriptor: &sdk::metrics::sdk_api::Descriptor,
        kind: &sdk::export::metrics::aggregation::AggregationKind,
    ) -> sdk::export::metrics::aggregation::Temporality {
        match self {
            ExportKind::Cumulative => {
                sdk::export::metrics::aggregation::cumulative_temporality_selector().temporality_for(descriptor, kind)
            }
            ExportKind::Delta => {
                sdk::export::metrics::aggregation::delta_temporality_selector().temporality_for(descriptor, kind)
            }
            ExportKind::Stateless => {
                sdk::export::metrics::aggregation::stateless_temporality_selector().temporality_for(descriptor, kind)
            }
        }
    }
}

I've lost the config for aggregator_selector because now I need to implement trait sdk::export::metrics::AggregatorSelector and decide based on Descriptor.

I've found and example in this file and end up with this code:

#[derive(Debug, Default)]
struct AggregatorSelector;

impl sdk::export::metrics::AggregatorSelector for AggregatorSelector {
    fn aggregator_for(&self, descriptor: &Descriptor) -> Option<Arc<dyn Aggregator + Send + Sync>> {
        match descriptor.name() {
            name if name.ends_with(".hits") => Some(Arc::new(sdk::metrics::aggregators::last_value())),
            name if name.ends_with(".duration") => Some(Arc::new(sdk::metrics::aggregators::histogram(&[]))),
            _ => panic!("Invalid instrument name for test AggregatorSelector: {}", descriptor.name()),
        }
    }
}

Here, for me *.hits are metrics like how many times an API is called, and *.duration are metrics like query duration.

Metrics Definition

With 0.17.0 setting up a metric for me was like

static MYSQL_QUERY_HISTOGRAM: Lazy<ValueRecorder<f64>> = Lazy::new(|| {
    get_meter()// gets a common meter from shared crate
        .f64_value_recorder("mysql.duration")
        .with_description("MySQL query latencies")
        .with_unit(Unit::new("s"))
        .init()
}

Now with 0.18.0 it has become

static MYSQL_QUERY_HISTOGRAM: Lazy<Histogram<f64>> = Lazy::new(|| {
    get_meter()// gets a common meter from shared crate
        .f64_histogram("mysql.duration")
        .with_description("MySQL query latencies")
        .with_unit(Unit::new("s"))
        .init()
});

Metrics Usage

With 0.17.0 using a metric for me was like

    MYSQL_QUERY_HISTOGRAM.record(
        info.elapsed.as_secs_f64(),
        &[
            KeyValue::new("query", info.statement.sql.clone()),
            KeyValue::new("pool", pool),
            KeyValue::new("failed", info.failed),
        ],
    );

Now with 0.18.0 it has become

    let cx = Context::current();
    MYSQL_QUERY_HISTOGRAM.record(
        &cx,
        info.elapsed.as_secs_f64(),
        &[
            KeyValue::new("query", info.statement.sql.clone()),
            KeyValue::new("pool", pool),
            KeyValue::new("failed", info.failed),
        ],
    );

Only the Context has been added, not a big deal until calling Context::current is always the correct approach

Upgrade result

Now, summing everything up, using ExportKind::Delta doesn't seems to work anymore, I get this error on stderr:

OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (Unknown error): , detailed error message: transport error

I have no idea where to look to debug it.

Switching to ExportKind::Stateless, *.hits metrics, that means aggregators::last_value, works, but aggregators::histogram doesn't.

No errors on stderr this time.

I've read the code, I understand aggregators::histogram expects reference values to split values between buckets, but I don't have reference values, I have so many queries with so many different mean times...

Closing up

I've probably done something horribly wrong in this upgrade, but without proper documentation and working examples it's quite hard to understand what's wrong and what's not.

edit: added metric usage section

@nappa85
Copy link
Author

nappa85 commented Nov 10, 2022

Oh, I was wrong, the error

OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (Unknown error): , detailed error message: transport error

is still there also with ExportKind::Stateless, simply it doesn't appear immediately

@jtescher
Copy link
Member

resolved in metrics rewrite in #1156

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants