Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add usage collection for savedObject tagging #83160
Add usage collection for savedObject tagging #83160
Changes from 5 commits
f62ea18
a6ae4a9
59716f8
af1135f
70d58d3
93ae79d
4042358
0235025
7b6d76e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we collect tags usage data for more than just these two saved object types. I think we should try to avoid having this kind of dynamic fields in our schema and it will also increase the field count on our telemetry cluster.
What about:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a
nested
orarray
type in that case?That would work. However I would be sure that is would be alright to the way telemetry consumes / uses the data.
When I look at the other collectors, none seems to be used nested fields, which is why I'm asking:
kibana/src/plugins/kibana_usage_collection/server/collectors/application_usage/schema.ts
Lines 52 to 59 in d67a421
note that this could only be a design flaw when the collector was implemented. @elastic/kibana-telemetry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! This is a legacy issue that no longer applies: in the past, before having the schema implementation, the indexer in the remote service indexed the data in the same format it received it from Kibana. This meant that we had to keep the mappings in mind when developing the collectors (and
nested
was never an option because Kibana does not natively support viz with that type yet).New collectors/fields shouldn't have those contraints anymore because the new indexer actually reshapes the payloads and splits the info into multiple indices based on the different needs to consume the information by our PMs and Analysts.
If we foresee the number of keys growing dynamically, I think the
array
approach would be best.cc @mindbat so he can share his thoughts from the indexer-maintainer POV 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afharo: We're not sending Kibana data to Telemetry Next yet and the legacy indexer doesn't reshape data by default. We would still need to update the indexer to consume the new data, reshape it and index it into a custom extended index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should we stick with the static, per type, mappings for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we just index data as-is, can you elaborate why we couldn't use an array field type for the mappings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed @afharo's comment and I assume this applies to array fields too #58175
It makes sense that we want to do some processing inside Telemetry Next, but I think it would be much more flexible if Kibana had more control around which documents are created. If telemetry is uploaded as NDJSON then usage collectors could return several documents instead of always having a single large payload that needs to be handled and split by the server. This is probably a big effort, but maintaining a static schema for dynamic data like applications and saved object types that change every release feels like a maintenance nightmare.
So I guess for this PR we just need to add all the known saved object types to the mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to add types that currently supports the tagging feature. That would just be dashboard, visualizations and maps. Will add maps. Further additions should only be done when the app/type implements the tagging feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 7b6d76e: added
map
to the schema, and updated the readme to state that developers should update the schema when adding tagging support to additional SO types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rudolf
Without any transformations on the remote service indexer, we'd end up with data mapped as an array in elasticsearch and face the usual problem of not being able to meaningfully query or visualize that data. It's a situation we're trying to fix with
ui_metric
right now, for example.