feat: add metrics generator for num_calls #286

kotharironak · 2021-11-12T13:09:22Z

Description

As part of the building metrics pipeline, and as part of this ticket (hypertrace/hypertrace#294), this PR,

add metrics generator component for call count
it extracts the metrics from ingested traces and reads data from raw-service-view-events
added configs for emit and aggregation windowing

Testing

added basic topology test

Checklist:

My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
Any dependent changes have been merged and published in downstream modules

kotharironak · 2021-11-12T13:19:28Z

Fails because of new avro files due to compatibility checks, will remove avro chack.

kotharironak · 2021-11-18T10:11:59Z

@JBAhire Can you help in disabling avro compatibility check for this project - see the diff. I guess we will have to merge a separate PR for pull_request_target first to disable that, right?

JBAhire · 2021-11-18T11:25:49Z

@JBAhire Can you help in disabling avro compatibility check for this project - see the diff. I guess we will have to merge a separate PR for pull_request_target first to disable that, right?

Yup. We will need a separate PR for that one.

aaron-steinfeld · 2021-11-18T12:53:45Z

@JBAhire Can you help in disabling avro compatibility check for this project - see the diff. I guess we will have to merge a separate PR for pull_request_target first to disable that, right?

Yup. We will need a separate PR for that one.

I'd recommend pulling the compatibility checks into a separate job altogether. Require it in the same way, but that way we can override it if needed without code changes - basically what we do with proto.

aaron-steinfeld

Don't really have the context, but a couple high level comments

aaron-steinfeld · 2021-11-18T12:56:07Z

...rtrace-metrics-generator/hypertrace-metrics-generator-api/src/main/avro/metric-identity.avdl

@@ -0,0 +1,7 @@
+@namespace("org.hypertrace.metrics.generator.api")


Why are we still adding avro messages? I thought we had fully switched to proto for new things, even across kafka

Spacing also looks off.

Why are we still adding Avro messages?

Are you referring to all the new pipeline - alerting and metric or something else?

I will check with @laxman-traceable If I can directly go with java based serialization with StateStore and get away with avro messages here.

My understanding - could be wrong - was that we initially used avro over kafka because that's what it had support for, and proto everywhere else. Newer versions have equal levels of proto support, so rather than using two different serialization formats, unifying on just proto for all new messages would be more desirable.

Talked with @laxman-traceable. And as per him, avro goes better with kafka overall, and as you said the newer versions have support for schema registry for proto too. But, as you said, in this metrics pipeline, proto is the main transport layer, so he was also suggesting to go with Proto for consistency. Changing it to proto.

aaron-steinfeld · 2021-11-18T12:58:57Z

...rtrace-metrics-generator/hypertrace-metrics-generator-api/src/main/avro/metric-identity.avdl

+protocol MetricIdentityProtocol {
+	record MetricIdentity {
+		long timestamp_millis = 0;
+		union {null, string} metric_key = null;


I don't have much context, but what is a metric key? where's the tenant id?

tenant_id is part of metric attribute -

hypertrace-ingester/hypertrace-metrics-generator/hypertrace-metrics-generator/src/main/java/org/hypertrace/metrics/generator/MetricsProcessor.java

Line 77 in 9f6ce53

attributes.put(TENANT_ID_ATTR, value.getTenantId());

metric_key is UUID from metric_name and set of metric's attribute name:value pair. This is for reducing serde cost of updating the state store on every input message.

still trying to understand - would this mean there are no actual attributes available for querying, and that everything needs to be queried by ID only? I couldn't find metrics that all have attribute A for example. How would things like filtering work?

Such global queries filtering is possible in PromQL language.
{} refers to filtering expression - https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors

As an example, below query returns all metrics having attribute (or label)
pod with value raw-spans-grouper : {pod="raw-spans-grouper}

However, the filtering expression with a specific metric_name will be able to filter within a group of those series - metric_name{<lable filtering>}.
So, if we are looking for num_calls (is a metric) for a given service (is an attribute), we can express as sum(num_calls{service="1234"})

I had listed out a few mappings here - hypertrace/hypertrace#295 (comment)

Can you give one more example that you are trying to refer to?

aaron-steinfeld · 2021-11-18T13:00:08Z

hypertrace-metrics-generator/hypertrace-metrics-generator-api/src/main/avro/metric.avdl

+@namespace("org.hypertrace.metrics.generator.api")
+protocol MetricProtocol {
+	record Metric {
+		union {null, string} name = null;


is this different than the key? If we're emitting these for every timestamp, wouldn't we want to put as little in the value message as possible - metadata like name description and unit could be stored elsewhere.

Are attributes holding the numbers? If so, why are they typed string?

is this different than the key?
From this <metric_name> and are used form a key.

So, for a given timestamp window, I just store Metric once, and when we emit the metric, that's the time it is used. Reaming, time for each incoming message, to save serde cost, I used metric_key for aggregating value in a different store.

Are attributes holding the numbers? If so, why are they typed string?
This represents the Prometheus metric name and its labels - https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

And, they are strings.

So, for a given timestamp window, I just store Metric once, and when we emit the metric, that's the time it is used. Reaming, time for each incoming message, to save serde cost, I used metric_key for aggregating value in a different store.

Here I meant, that we're storing the attributes and name in every stored metric timewindow, as I understand it. Is that to support querying by attribute and name, not just the id as I was asking about in the other comment thread?

This represents the Prometheus metric name and its labels - https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

OK that makes sense, that's what I expected - but where is the actual metric value itself in this message? or it elsewhere? The only numeric-typed thing I see is the timestamp.

As part of this PR, I am aggregating num_calls metrics to give a time window (say 15s). During this aggregation, I am storing the intermediate result in the state store <MetricIdentity, Value> while punctuating, I am emitting this value after converting to otlp metrics.

hypertrace-ingester/hypertrace-metrics-generator/hypertrace-metrics-generator/src/main/java/org/hypertrace/metrics/generator/MetricEmitPunctuator.java

Line 71 in 5577503

ResourceMetrics resourceMetrics = convertToResourceMetric(this.key, value, metric);

hypertrace-ingester/hypertrace-metrics-generator/hypertrace-metrics-generator/src/main/java/org/hypertrace/metrics/generator/MetricsProcessor.java

Line 120 in 5577503

metricsIdentityStore.put(metricsIdentity, preValue + (long) value.getNumCalls());

aaron-steinfeld · 2021-11-18T13:00:41Z

...trace-metrics-generator/src/main/java/org/hypertrace/metrics/generator/OtlpMetricsSerde.java

+    return new OtlpMetricsSerde.De();
+  }
+
+  public static class Ser implements Serializer<ResourceMetrics> {


Should use real class names if making these public (and make them private if possible)

will do this.

kotharironak · 2021-11-22T08:59:03Z

@JBAhire Can you help in disabling avro compatibility check for this project - see the diff. I guess we will have to merge a separate PR for pull_request_target first to disable that, right?

Yup. We will need a separate PR for that one.

I'd recommend pulling the compatibility checks into a separate job altogether. Require it in the same way, but that way we can override it if needed without code changes - basically what we do with proto.

Created this PR - #289 - for this.

codecov · 2021-11-22T17:02:26Z

Codecov Report

Merging #286 (b7b4a84) into main (fcd88a5) will increase coverage by 0.29%.
The diff coverage is 88.30%.

@@             Coverage Diff              @@
##               main     #286      +/-   ##
============================================
+ Coverage     79.58%   79.87%   +0.29%     
- Complexity     1265     1285      +20     
============================================
  Files           112      116       +4     
  Lines          4937     5108     +171     
  Branches        451      458       +7     
============================================
+ Hits           3929     4080     +151     
- Misses          806      820      +14     
- Partials        202      208       +6

Flag	Coverage Δ
unit	`79.87% <88.30%> (+0.29%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...hypertrace/metrics/generator/MetricsConstants.java	`0.00% <0.00%> (ø)`
...hypertrace/metrics/generator/MetricsGenerator.java	`74.35% <74.35%> (ø)`
...hypertrace/metrics/generator/MetricsProcessor.java	`91.30% <91.30%> (ø)`
...rtrace/metrics/generator/MetricEmitPunctuator.java	`95.16% <95.16%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fcd88a5...b7b4a84. Read the comment docs.

kotharironak · 2021-11-22T17:05:14Z

will look into build-and-test-with-mongo and snyk failures.

findingrish · 2021-11-26T07:18:13Z

...trace-metrics-generator/src/main/java/org/hypertrace/metrics/generator/MetricsProcessor.java

+    attributes.put(SERVICE_ID_ATTR, value.getServiceId());
+    attributes.put(SERVICE_NAME_ATTR, value.getServiceName());
+    attributes.put(API_ID, value.getApiId());
+    attributes.put(API_NAME, value.getApiName());


we are skipping some columns from the view, like status_code etc, https://github.com/hypertrace/hypertrace-ingester/blob/main/hypertrace-view-generator/hypertrace-view-creator/src/main/resources/configs/raw-service-view/application.conf#L21
how will filtering work on those attributes?

Adding status_code and protocol all string-related attributes other than IDs.

@rish691 I had included all the attributes which one mapped in SERVICE and API entities - https://github.com/hypertrace/query-service/blob/main/query-service/src/main/resources/configs/common/application.conf#L97

https://github.com/hypertrace/query-service/blob/main/query-service/src/main/resources/configs/common/application.conf#L275

However, protocol and status_code make sense, so extended it.

…extractor

kotharironak · 2021-11-26T15:31:40Z

...s-generator-api/src/main/java/org/hypertrace/metrics/generator/api/v1/serde/MetricSerde.java

+import org.apache.kafka.common.serialization.Serializer;
+import org.hypertrace.metrics.generator.api.v1.Metric;
+
+public class MetricSerde implements Serde<Metric> {


For this PR, I have added custom protobuf serde for Metric, MetricIdentity, and OtlpMetrics. I am facing some issues while using kafka stream protobuf serde - had logged the ticket - hypertrace/hypertrace#312

Working on this as a follow-up. This will remove these custom objects

findingrish · 2021-11-29T07:40:19Z

...trace-metrics-generator/src/main/java/org/hypertrace/metrics/generator/MetricsProcessor.java

+  @Override
+  public KeyValue<byte[], ResourceMetrics> transform(String key, RawServiceView value) {
+    // construct metric attributes & metric
+    Map<String, String> attributes = new HashMap<>();


As discussed we are doing application level aggregation versus using inbuilt aggregation by prometheus, since prometheus polls metrics at fixed duration instead of realtime consumption

findingrish

Approving as this is blocker for subsequent changes, any further changes can be taken up as followup

kotharironak · 2021-11-29T07:46:08Z

@aaron-steinfeld Pl. let me know if you have any other comments, will take care of them in follow-up.

github-actions · 2021-11-29T07:48:27Z

Unit Test Results

  75 files +1   75 suites +1 1m 9s ⏱️ +8s
393 tests +1 393 ✔️ +1 0 💤 ±0 0 ❌ ±0

Results for commit 503b369. ± Comparison against base commit fcd88a5.

feat: add metrics generator for num_calls

e1abece

kotharironak added 6 commits November 12, 2021 19:12

fix the avro compatibility issue

64d4ffc

test: add topology test

6f83dcf

Merge remote-tracking branch 'origin/main' into metrics-extractor

262e3ef

test: add the topology tests and fixed it

a791f4d

adds more assertions

12b62d3

disabling avro compatibility check, and adding configs for time outs

4274c3e

kotharironak requested review from findingrish, surajpuvvada, JBAhire and laxman-traceable November 18, 2021 10:10

kotharironak marked this pull request as ready for review November 18, 2021 10:12

kotharironak requested a review from aaron-steinfeld as a code owner November 18, 2021 10:12

kotharironak requested a review from a team November 18, 2021 10:12

removed un-used variable from emitter

6d23be0

aaron-steinfeld reviewed Nov 18, 2021

View reviewed changes

Merge branch 'main' into metrics-extractor

9f6ce53

kotharironak and others added 2 commits November 22, 2021 22:12

Merge branch 'main' into metrics-extractor

c944c55

updated service framework library

55ddb93