service metrics: add numeric outcome #5243

dgieselaar · 2021-05-09T16:20:41Z

Add a numeric outcome to

transaction events (mapped as byte)
transaction metrics (mapped as byte as it has an event.outcome dimension)
service metrics (mapped as aggregate_metric_double, as it has no event.outcome dimension).

All fields should be mapped as doc-value only fields.

The concrete field name needs to be defined. The working title is event.outcome_numeric.

This will help with reducing dimensions while keeping the structure of transaction metrics and service metrics similar.

Mapping of event.outcome to event.outcome_numeric

`event.outcome`	`event.outcome_numeric`
`unknown`	`null`
`failure`	0
`success`	1

For the aggregate_metric_double, the following sub-fields should be populated:

sum: the number of success events
value_count: the number of success + error events

We can then simply do an average aggregation over the event.outcome_numeric to get the success rate. We need to validate that the avg aggregation works correctly on both the byte and aggregate_metric_double mapped fields.

Original request:
Right now we use a terms aggregation or multiple filter aggregations on event.outcome to calculate transaction error rate. If we store failure as 0 and success in 1 in a single field (and not populate it for unknown), we can get the percentage by doing an avg aggregation on that single field. This would simplify our queries and the post processing that we are doing, will probably be a small performance improvement, and would make it easier for users to build custom dashboards.

The text was updated successfully, but these errors were encountered:

axw · 2021-05-10T01:04:11Z

event.outcome is an ECS field with a prescribed set of allowed values, so I don't think we should be recording its value as a number. We could provide a separate field.

dgieselaar · 2021-05-10T01:16:15Z

@axw yes, sorry, that is the idea. I don't really have a good suggestion for the name of the field though 😀

simitt · 2022-11-23T07:19:43Z

@dgieselaar I updated the original description to what we have planned now for 8.7.

axw · 2022-11-23T08:14:15Z

@felixbarny @dgieselaar would either of you be able to help clarify the definition a bit please? What values do we populate, and when?

It might help if we knew how the fields would be aggregated, for each of transaction events, transaction metrics, and service metrics.

felixbarny · 2022-11-23T13:48:13Z

done, let me know if something is missing or unclear

carsonip · 2022-12-07T17:42:04Z

For service metrics, are we using the same field event.outcome_numeric as transaction events and transaction metrics? Does it mean that the field will have multiple types or will we have a separate field for this aggregate_metric_double type field?

felixbarny · 2022-12-07T18:03:03Z

For service metrics, are we using the same field event.outcome_numeric as transaction events and transaction metrics?

correct

Does it mean that the field will have multiple types

That's also correct. This lets us re-use the same query across all three. That works because the avg aggregation can work both on on numeric fields as well as aggregate metric fields.

But please validate that assumption by trying out the avg aggregation on event.outcome_numeric and contrast the results with the current way of calculating the error/success rate.

carsonip · 2022-12-08T10:45:41Z

@axw mentioned that since transaction metrics and service metrics are sharing the same internal_metrics data stream, the field event.outcome_numeric cannot be of type byte and aggregate_metric_double at the same time. Therefore, this work will involve splitting the service metrics into a separate data stream to make it possible.

carsonip · 2022-12-08T16:21:10Z

@felixbarny A question about the Count in aggregate_metric_double:

For the aggregate_metric_double, the following sub-fields should be populated:

sum: the number of success events
value_count: the number of success + error events

Should value_count be number of success + failure, or success + failure + unknown? I assume it is the former (which is what you wrote) but just wanted to confirm.

felixbarny · 2022-12-08T16:48:24Z

Correct, it's the former. See https://github.com/elastic/apm/blob/main/specs/agents/tracing-transactions.md#error-rate for more details.

dgieselaar · 2022-12-09T08:29:35Z

@carsonip @axw is there a reason why we are not also making event.outcome_numeric an aggregate_metric_double for transaction metrics? are there significant storage concerns around doing so? can we quantify that?

axw · 2022-12-11T09:40:23Z

@dgieselaar IIANM the main reason for storing as a byte is that transaction metrics have an event.outcome field. So you can group by event.outcome.

I would strongly prefer that service metrics remain a subset of transaction metrics so that we could later derive them in Elasticsearch by just dropping dimensions. So I'm +1 on storing an aggregate_metric_double for transaction metrics too.

WDYT @felixbarny?

felixbarny · 2022-12-12T07:01:42Z

I would strongly prefer that service metrics remain a subset of transaction metrics so that we could later derive them in Elasticsearch by just dropping dimensions.

I definitely agree with this statement. I just wasn't sure if it's feasible to use aggregate_metric_double for transaction metrics because it also has a event.outcome dimension.

I played around with the mapping a bit and it seems like it is feasible and actually better than mapping it to byte. We can map it like this:

`event.outcome`	`event.outcome_numeric.sum`	`event.outcome_numeric.value_count`
`unknown`	`null`	`null`
`failure`	0	`doc_count`
`success`	`doc_count`	`doc_count`

Click to expand example

DELETE felixbarny-test-outcome
PUT felixbarny-test-outcome
{
  "mappings": {
    "properties": {
      "event.outcome": {
        "type": "keyword"
      },
      "event.outcome_numeric": {
        "type": "aggregate_metric_double",
        "metrics": ["value_count", "sum"],
        "default_metric": "sum"
      }
    }
  }
}

POST felixbarny-test-outcome/_doc?refresh
{
  "_doc_count": 10,
  "event.outcome": "success",
  "event.outcome_int": 1,
  "event.outcome_numeric": {
    "value_count": 10,
    "sum": 10
  }
}

POST felixbarny-test-outcome/_doc?refresh
{
  "_doc_count": 5,
  "event.outcome": "failure",
  "event.outcome_int": 0,
  "event.outcome_numeric": {
    "value_count": 5,
    "sum": 0
  }
}


POST felixbarny-test-outcome/_doc?refresh
{
  "_doc_count": 5,
  "event.outcome": "unknown"
}


POST felixbarny-test-outcome/_search?size=0
{
  "aggs": {
    "success_rate_aggregate_metric_double": {
      "avg": {
        "field": "event.outcome_numeric"
      }
    }
  }
}

As the metrics will be so similar in structure, we need to take care that the UI isn't accidentally querying both transaction metrics and service metrics at the same time. Did you already think about what's the best way to do that? Just a query for the metricset.name? Or should service metrics better be in a different data stream? Either way, we need to make sure the UI selects the correct set of metrics.

axw · 2022-12-12T07:45:45Z

Thanks @felixbarny, I think that looks good.

The other thing that's been making me feel uncomfortable is the field name. It feels a bit too abstract, and implies that it is equivalent to event.outcome -- but that only holds for the event documents. So maybe that could be renamed too.

Seems similar to transaction.duration.summary, so maybe transaction.successes? The sum is the number of successful transactions, and the value_count is the total number of transactions with a known outcome.

`event.outcome`	`transaction.successes.sum`	`transaction.successes.value_count`
`unknown`	`null`	`null`
`failure`	0	`doc_count`
`success`	`doc_count`	`doc_count`

WDYT? Not my favourite name, but I think a little better...

As the metrics will be so similar in structure, we need to take care that the UI isn't accidentally querying both transaction metrics and service metrics at the same time. Did you already think about what's the best way to do that? Just a query for the metricset.name? Or should service metrics better be in a different data stream? Either way, we need to make sure the UI selects the correct set of metrics.

I would prefer to split them out, but I would query on metricset.name either way - so it's not a requirement.

dgieselaar · 2022-12-12T07:53:39Z

@axw @felixbarny we will likely use metricset.name but it'd be great if that can be a constant keyword.

axw · 2022-12-12T08:56:27Z

@felixbarny and I just had a chat about the metrics. We're thinking that we should skip adding them to transaction events and transaction metrics for now, and only add them to service metrics. That's because we would still need a fallback for old documents, and we don't yet have a solution for schema evolution. @dgieselaar does that sound reasonable?

Regarding the name:

the name should be the same across transaction events, transaction metrics, and service metrics
I identified part of the source of my discomfort with event.outcome_numeric: that name implies it is a simple mapping from event.outcome (success=1, failed=0), which only holds true for transaction events and not for the metrics. OTOH, if the field describes "successful transactions", then the meaning of the field doesn't change depending on whether it's an event or a metric.

So for a concrete suggestion, I'll amend the above from transaction.successes to transaction.success_count. This will represent the number of transactions that are successful. For events it'll always be 0 (failed), 1 (success), or omitted (unknown). For metrics, sum will be success and value_count will be all the transactions that were considered (success+failed).

carsonip · 2022-12-12T12:34:34Z

@axw There's already a success_count (as well as a failure_count) under transaction which is an int: https://github.com/elastic/apm-server/blob/main/apmpackage/apm/data_stream/internal_metrics/fields/fields.yml#L132

Do we need a new name or a migration for the existing field? Am I missing anything here?

axw · 2022-12-12T12:50:12Z

@carsonip that field is only used by the existing service metrics implementation, which is technical preview and off by default. We can make a breaking change to it if we decide on that name. (If we don't decide on that name, then we would just remove the field definition.)

carsonip · 2022-12-12T13:02:55Z

Thanks for the explanation. I'll proceed to implement it with transaction.success_count. In that case, do we want to remove transaction.failure_count? It will be weird to see 2 similarly named fields with different types.

axw · 2022-12-12T23:15:41Z

@carsonip yes, we would remove transaction.failure_count since the information will be captured in the value_count sub-field of transaction.success_count.

felixbarny · 2022-12-13T11:46:31Z

@felixbarny and I just had a chat about the metrics. We're thinking that we should skip adding them to transaction events and transaction metrics for now, and only add them to service metrics. That's because we would still need a fallback for old documents, and we don't yet have a solution for schema evolution. @dgieselaar does that sound reasonable?

I just had a chat with @sqren about this and we concluded that it might be better to add the new fields consistently now, even if we don't have a good solution for schema evolution. The sooner we add the changes, the sooner we're able to deprecate the old schema. I think there is a potential path for schema evolution with runtime fields, however, aggregate_metric_double is currently not supported for runtime fields.

carsonip · 2022-12-13T16:01:16Z

Do we have enough information to do the same for transaction events and transaction metrics now? Will we also use an aggregate_metric_double transaction.success_count for both transaction events and transaction metrics, or a byte event.outcome_numeric?

simitt · 2022-12-13T16:39:35Z

My understanding is that everybody was fine with using transaction.success_count with aggregate_metric_double.

felixbarny · 2022-12-13T16:39:36Z

The name should be the same for transaction events, transaction metrics, and service metrics. So it would be transaction.success_count. Metrics use the type aggregate_metric_double and transaction events use the type byte.

I identified part of the source of my discomfort with event.outcome_numeric: that name implies it is a simple mapping from event.outcome (success=1, failed=0), which only holds true for transaction events and not for the metrics. OTOH, if the field describes "successful transactions", then the meaning of the field doesn't change depending on whether it's an event or a metric.

I think the name should be optimized for the usage on individual events. IMHO, it's fine if the name is a little weird in the context of a metric that represent multiple individual events. When you configure rollups in ES, it would also use the aggregate_metric_double field to summarize multiple events into a single doc, using the same field name.

Sorry for the back-and-forth with the naming, Carson. For now, I'd say stick to transaction.success_count but we might want to change the name between now and the release. This should hopefully not be a lot of work.

carsonip · 2022-12-13T17:09:19Z

Thanks @simitt @felixbarny. No problem. It should be straightforward to implement given that the rest of it is similar to #9791.

carsonip · 2022-12-14T13:25:06Z

As pointed out by @axw, index is not a property for aggregate_metric_double: https://www.elastic.co/guide/en/elasticsearch/reference/current/aggregate-metric-double.html#aggregate-metric-double-params
Will proceed with making the byte doc-value only but not aggregate_metric_double.

carsonip · 2022-12-15T15:01:43Z

During APM server weekly, @felixbarny and the team made the decision that we will be using event.success_count for transaction events (as byte), transaction metrics (as aggregate_metric_double) and service metrics (as aggregate_metric_double) as opposed to the proposed names event.outcome_numeric and transaction.success_count. The following PR will rename the field changed in #9791. Please correct me if I missed anything.

Add `event.success_count` to transaction event and transaction metrics. Change `transaction.success_count` to `event.success_count` for service metrics. closes #5243

dgieselaar added the enhancement label May 9, 2021

zube bot added the [zube]: Backlog label Jul 1, 2021

simitt removed the [zube]: Backlog label Dec 31, 2021

axw mentioned this issue Aug 9, 2022

Service Metrics Aggregator #8607

Merged

3 tasks

simitt added this to the 8.7 milestone Nov 23, 2022

simitt changed the title ~~Store event.outcome as 0/1~~ service metrics: add numeric outcome Nov 23, 2022

simitt assigned kruskall and unassigned kruskall Nov 23, 2022

simitt assigned carsonip Nov 23, 2022

carsonip mentioned this issue Dec 7, 2022

Add event.outcome_numeric #9764

Closed

3 tasks

axw mentioned this issue Dec 8, 2022

[APM] Enable service metrics elastic/kibana#146734

Closed

3 tasks

carsonip mentioned this issue Dec 12, 2022

Store outcome values in servicemetrics transaction.success_count #9791

Merged

3 tasks

carsonip mentioned this issue Dec 15, 2022

Service metrics txn events txn metrics #9819

Merged

3 tasks

felixbarny mentioned this issue Dec 21, 2022

service metrics: extend to non-transaction events and add zero counts #9654

Closed

simitt closed this as completed in #9819 Dec 27, 2022

simitt pushed a commit that referenced this issue Dec 27, 2022

Change Service metrics txn events txn metrics (#9819)

b470c3b

Add `event.success_count` to transaction event and transaction metrics. Change `transaction.success_count` to `event.success_count` for service metrics. closes #5243

kpatticha mentioned this issue Jul 19, 2024

[EEM] Undefined failedTransactionRate metric from service entity definition elastic/kibana#188765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

service metrics: add numeric outcome #5243

service metrics: add numeric outcome #5243

dgieselaar commented May 9, 2021 •

edited by felixbarny

Loading

axw commented May 10, 2021

dgieselaar commented May 10, 2021

simitt commented Nov 23, 2022

axw commented Nov 23, 2022 •

edited

Loading

felixbarny commented Nov 23, 2022

carsonip commented Dec 7, 2022

felixbarny commented Dec 7, 2022

carsonip commented Dec 8, 2022 •

edited

Loading

carsonip commented Dec 8, 2022

felixbarny commented Dec 8, 2022

dgieselaar commented Dec 9, 2022

axw commented Dec 11, 2022

felixbarny commented Dec 12, 2022

axw commented Dec 12, 2022 •

edited

Loading

dgieselaar commented Dec 12, 2022

axw commented Dec 12, 2022

carsonip commented Dec 12, 2022

axw commented Dec 12, 2022

carsonip commented Dec 12, 2022

axw commented Dec 12, 2022

felixbarny commented Dec 13, 2022

carsonip commented Dec 13, 2022

simitt commented Dec 13, 2022

felixbarny commented Dec 13, 2022

carsonip commented Dec 13, 2022

carsonip commented Dec 14, 2022

carsonip commented Dec 15, 2022

service metrics: add numeric outcome #5243

service metrics: add numeric outcome #5243

Comments

dgieselaar commented May 9, 2021 • edited by felixbarny Loading

axw commented May 10, 2021

dgieselaar commented May 10, 2021

simitt commented Nov 23, 2022

axw commented Nov 23, 2022 • edited Loading

felixbarny commented Nov 23, 2022

carsonip commented Dec 7, 2022

felixbarny commented Dec 7, 2022

carsonip commented Dec 8, 2022 • edited Loading

carsonip commented Dec 8, 2022

felixbarny commented Dec 8, 2022

dgieselaar commented Dec 9, 2022

axw commented Dec 11, 2022

felixbarny commented Dec 12, 2022

axw commented Dec 12, 2022 • edited Loading

dgieselaar commented Dec 12, 2022

axw commented Dec 12, 2022

carsonip commented Dec 12, 2022

axw commented Dec 12, 2022

carsonip commented Dec 12, 2022

axw commented Dec 12, 2022

felixbarny commented Dec 13, 2022

carsonip commented Dec 13, 2022

simitt commented Dec 13, 2022

felixbarny commented Dec 13, 2022

carsonip commented Dec 13, 2022

carsonip commented Dec 14, 2022

carsonip commented Dec 15, 2022

dgieselaar commented May 9, 2021 •

edited by felixbarny

Loading

axw commented Nov 23, 2022 •

edited

Loading

carsonip commented Dec 8, 2022 •

edited

Loading

axw commented Dec 12, 2022 •

edited

Loading