Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

Stop storing derived metrics #183

Merged
merged 1 commit into from
Jul 30, 2020
Merged

Stop storing derived metrics #183

merged 1 commit into from
Jul 30, 2020

Conversation

axw
Copy link
Member

@axw axw commented Jul 30, 2020

We can derive them in Elasticsearch instead; delete the "numbers" package, use inline arithmetic.

Also, fixed a bug with the "total events sent - success %" metric reported to stdout, which mistakenly used an errors count instead of events count.

We can derive them in Elasticsearch instead.
Delete the "numbers" package, inline simple arithmetic.
@apmmachine
Copy link

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #183 opened]

  • Start Time: 2020-07-30T05:44:09.738+0000

  • Duration: 4 min 44 sec

Test stats 🧪

Test Results
Failed 0
Passed 9
Skipped 0
Total 9

@axw axw merged commit 799c1df into elastic:master Jul 30, 2020
@axw axw deleted the no-derived-metrics branch July 30, 2020 06:03
@jalvz
Copy link
Contributor

jalvz commented Aug 4, 2020

Thinking it trough, I think I disagree with this change.
It is true we can get all that from Elasticsearch, it was actually considered initially. But given the (imo negligible) complexity of indexing those fields and storage costs, it pays off at query time.
A trivial _search query would give me all the data I want without having to craft those calculations in a query body (accounting for optional values, zero divisions, etc).
Now if I need to dig beyond dashboards I need to write arbitrarily complicated queries instead.

@simitt
Copy link
Contributor

simitt commented Aug 4, 2020

I might not be the heaviest user of the hey-apm benchmarking, but I have barely used the derived metrices; if so I had to look up how they were calculated as it wasn't necessarily clear to me what the difference/timeunit etc for the metrics were.

@jalvz could you share where/how you were using these values?

@jalvz
Copy link
Contributor

jalvz commented Aug 4, 2020

Any time I queried the actual docs instead of using the dashboards. That is every time I want granular data or I am looking for something specific. Last time was during the recent spike caused by enabling the instrumentation

@axw
Copy link
Member Author

axw commented Aug 4, 2020

A trivial _search query would give me all the data I want without having to craft those calculations in a query body (accounting for optional values, zero divisions, etc).
...
Any time I queried the actual docs instead of using the dashboards. That is every time I want granular data or I am looking for something specific. Last time was during the recent spike caused by enabling the instrumentation

I've been thinking of these metrics as only useful for comparisons, rather than looking at in isolation. Can you describe how you were using them in isolation? Or if you were using them for comparison, where the dashboards fell short and looking at raw docs helped.

Just trying to understand why we can't use dashboards + kuery bar.

@jalvz
Copy link
Contributor

jalvz commented Aug 4, 2020

how you were using them in isolation?

If I have a single run from a PR I can't just use the dashboards because results are averaged. So I'd query individual fields from an individual document (I grab a reportId from the log to filter by) and then compare it with the latest average from the related dashboard.

why we can't use dashboards + kuery bar

I think we can do that the same now as before this change. The difference is that now we can't query individual documents (or rather we can, but it is quite more difficult)

I also don't understand what specific problem is this solving, besides doing the same thing in a different way (and with less lines of code, but lines of code are not a problem per se)

@axw
Copy link
Member Author

axw commented Aug 5, 2020

If I have a single run from a PR I can't just use the dashboards because results are averaged.
...
I think we can do that the same now as before this change. The difference is that now we can't query individual documents (or rather we can, but it is quite more difficult)

We can't do it the exact same way (which implies no change at all), but we can do this with a data table visualisation and scripted fields (until Lens supports bucket_script - elastic/kibana#67405). I've added one to the dashboard, let me know if this does what you need or what's missing.

I also don't understand what specific problem is this solving, besides doing the same thing in a different way (and with less lines of code, but lines of code are not a problem per se)

There were a couple of things that set this off:

  • Non-idiomatic code, e.g. func Div(i1, i2 interface{}) *float64, and unnecessary code, e.g. func Sum(xs ...uint64) uint64
  • Many metrics with no clear value, making it more difficult (for me) to identify the ones that provide the greatest signal

If I knew you were using the derived metrics at the individual report level (code comments explaining why we're not using ES aggs would have helped) then I wouldn't have gone and removed them; I would have just cleaned up the way they were calculated.

Anyway, as above, please let me know what's missing. We can add them back in if really needed.

@jalvz
Copy link
Contributor

jalvz commented Aug 5, 2020

we can do this with a data table visualisation and scripted fields

I think that will work - and if not we will figure something else out, thanks

v1v added a commit to v1v/hey-apm that referenced this pull request Dec 17, 2021
* upstream/main: (37 commits)
  Update to assume use of data streams (elastic#198)
  JJBB: disable periodic trigger (elastic#195)
  es: only count workload events (elastic#193)
  Use specific docker compose context (elastic#191)
  define failure threshould from average instead of max (elastic#190)
  update go agent version (elastic#189)
  Add destination service to spans (elastic#188)
  worker: reinstate Events{Generated,Sent,Indexed} (elastic#187)
  Remove types, conv, strcoll packages (elastic#185)
  agent/tracer cleanup (elastic#184)
  Stop storing derived metrics (elastic#183)
  Wait for active output events before proceeding (elastic#181)
  worker: send spans in a separate goroutine (elastic#180)
  benchmark: drop duplicate benchmark (elastic#179)
  Refactoring, context propagation, graceful stop (elastic#177)
  Keep self instrumentation (elastic#174)
  Disable apm-server instrumentation (elastic#173)
  Upgrade go version and dependencies (elastic#172)
  Allow configuring multiple worker instances (elastic#171)
  Adds a nil check before querying libbeat metrics. (elastic#169)
  ...
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants