-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add metrics generator for num_calls #286
Conversation
Fails because of new avro files due to compatibility checks, will remove avro chack. |
@JBAhire Can you help in disabling avro compatibility check for this project - see the diff. I guess we will have to merge a separate PR for |
Yup. We will need a separate PR for that one. |
I'd recommend pulling the compatibility checks into a separate job altogether. Require it in the same way, but that way we can override it if needed without code changes - basically what we do with proto. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't really have the context, but a couple high level comments
@@ -0,0 +1,7 @@ | |||
@namespace("org.hypertrace.metrics.generator.api") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we still adding avro messages? I thought we had fully switched to proto for new things, even across kafka
Spacing also looks off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we still adding Avro messages?
Are you referring to all the new pipeline - alerting and metric or something else?
I will check with @laxman-traceable If I can directly go with java based serialization with StateStore and get away with avro messages here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding - could be wrong - was that we initially used avro over kafka because that's what it had support for, and proto everywhere else. Newer versions have equal levels of proto support, so rather than using two different serialization formats, unifying on just proto for all new messages would be more desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked with @laxman-traceable. And as per him, avro goes better with kafka overall, and as you said the newer versions have support for schema registry for proto too. But, as you said, in this metrics pipeline, proto is the main transport layer, so he was also suggesting to go with Proto for consistency. Changing it to proto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
protocol MetricIdentityProtocol { | ||
record MetricIdentity { | ||
long timestamp_millis = 0; | ||
union {null, string} metric_key = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much context, but what is a metric key? where's the tenant id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tenant_id is part of metric attribute -
Line 77 in 9f6ce53
attributes.put(TENANT_ID_ATTR, value.getTenantId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metric_key is UUID from metric_name
and set of metric's attribute name:value
pair. This is for reducing serde cost of updating the state store on every input message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still trying to understand - would this mean there are no actual attributes available for querying, and that everything needs to be queried by ID only? I couldn't find metrics that all have attribute A for example. How would things like filtering work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such global queries filtering is possible in PromQL language.
{}
refers to filtering expression - https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors
As an example, below query returns all metrics having attribute (or label)
pod
with value raw-spans-grouper
: {pod="raw-spans-grouper}
However, the filtering expression with a specific metric_name will be able to filter within a group of those series - metric_name{<lable filtering>}
.
So, if we are looking for num_calls
(is a metric) for a given service
(is an attribute), we can express as sum(num_calls{service="1234"})
I had listed out a few mappings here - hypertrace/hypertrace#295 (comment)
Can you give one more example that you are trying to refer to?
@namespace("org.hypertrace.metrics.generator.api") | ||
protocol MetricProtocol { | ||
record Metric { | ||
union {null, string} name = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this different than the key? If we're emitting these for every timestamp, wouldn't we want to put as little in the value message as possible - metadata like name description and unit could be stored elsewhere.
Are attributes holding the numbers? If so, why are they typed string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this different than the key?
From this <metric_name> and are used form a key.
So, for a given timestamp window, I just store Metric
once, and when we emit the metric, that's the time it is used. Reaming, time for each incoming message, to save serde cost, I used metric_key
for aggregating value in a different store.
Are attributes holding the numbers? If so, why are they typed string?
This represents the Prometheus metric name and its labels - https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
And, they are strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, for a given timestamp window, I just store Metric once, and when we emit the metric, that's the time it is used. Reaming, time for each incoming message, to save serde cost, I used metric_key for aggregating value in a different store.
Here I meant, that we're storing the attributes and name in every stored metric timewindow, as I understand it. Is that to support querying by attribute and name, not just the id as I was asking about in the other comment thread?
This represents the Prometheus metric name and its labels - https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
OK that makes sense, that's what I expected - but where is the actual metric value itself in this message? or it elsewhere? The only numeric-typed thing I see is the timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of this PR, I am aggregating num_calls
metrics to give a time window (say 15s). During this aggregation, I am storing the intermediate result in the state store <MetricIdentity, Value> while punctuating, I am emitting this value after converting to otlp metrics.
return new OtlpMetricsSerde.De(); | ||
} | ||
|
||
public static class Ser implements Serializer<ResourceMetrics> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use real class names if making these public (and make them private if possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Created this PR - #289 - for this. |
This comment has been minimized.
This comment has been minimized.
Codecov Report
@@ Coverage Diff @@
## main #286 +/- ##
============================================
+ Coverage 79.58% 79.87% +0.29%
- Complexity 1265 1285 +20
============================================
Files 112 116 +4
Lines 4937 5108 +171
Branches 451 458 +7
============================================
+ Hits 3929 4080 +151
- Misses 806 820 +14
- Partials 202 208 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
will look into |
This comment has been minimized.
This comment has been minimized.
attributes.put(SERVICE_ID_ATTR, value.getServiceId()); | ||
attributes.put(SERVICE_NAME_ATTR, value.getServiceName()); | ||
attributes.put(API_ID, value.getApiId()); | ||
attributes.put(API_NAME, value.getApiName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are skipping some columns from the view, like status_code etc, https://github.com/hypertrace/hypertrace-ingester/blob/main/hypertrace-view-generator/hypertrace-view-creator/src/main/resources/configs/raw-service-view/application.conf#L21
how will filtering work on those attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding status_code and protocol all string-related attributes other than IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rish691 I had included all the attributes which one mapped in SERVICE and API entities - https://github.com/hypertrace/query-service/blob/main/query-service/src/main/resources/configs/common/application.conf#L97
However, protocol and status_code make sense, so extended it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
import org.apache.kafka.common.serialization.Serializer; | ||
import org.hypertrace.metrics.generator.api.v1.Metric; | ||
|
||
public class MetricSerde implements Serde<Metric> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, I have added custom protobuf serde for Metric
, MetricIdentity
, and OtlpMetrics
. I am facing some issues while using kafka stream protobuf serde - had logged the ticket - hypertrace/hypertrace#312
Working on this as a follow-up. This will remove these custom objects
@Override | ||
public KeyValue<byte[], ResourceMetrics> transform(String key, RawServiceView value) { | ||
// construct metric attributes & metric | ||
Map<String, String> attributes = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed we are doing application level aggregation versus using inbuilt aggregation by prometheus, since prometheus polls metrics at fixed duration instead of realtime consumption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as this is blocker for subsequent changes, any further changes can be taken up as followup
@aaron-steinfeld Pl. let me know if you have any other comments, will take care of them in follow-up. |
Description
As part of the building metrics pipeline, and as part of this ticket (hypertrace/hypertrace#294), this PR,
raw-service-view-events
Testing
Checklist: