-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Fix duplicate data streams being shown in UI #89812
Conversation
…s expected to fail due to reliance on number of backing indices
…and only query against backing indices afterwards
Pinging @elastic/fleet (Team:Fleet) |
Pinging @elastic/fleet (Feature:Fleet) |
const dataStreamsInfoByName = keyBy(dataStreamsInfo, 'name'); | ||
|
||
// Get data stream stats | ||
const { data_streams: dataStreamStats } = (await callCluster('transport.request', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we can do the callCluster
calls in parallel instead of serially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated! I tossed the packaged SO retrieval in there too
metric: ['store'], | ||
}); | ||
// Get matching data streams | ||
const { data_streams: dataStreamsInfo } = (await callCluster('transport.request', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem better defined in a service than a http handler.
|
||
// Return final data streams objects sorted by last activity, decending | ||
// After filtering out data streams that are missing dataset/namespace/type fields | ||
body.data_streams = (await Promise.all(dataStreamPromises)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handler has more logic/statements than most others. What can we move to the service layer?
I think we can move lines 53-177 into a service allowing something like
body.data_streams = (await Promise.all(dataStreamPromises)) | |
body.data_streams = await service.method() |
Perhaps it's more than that, but I think we should move as much of this code as possible into a service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this code was written originally, it was an intentional decision to not create a data stream service yet and just leave the code in the handler because 1) there's only 1 data stream API (list), 2) no other place in the plugins needs access to this logic, and 3) no downstream plugins needs access either. I'd like to keep this simple for now until one of these situations change. Is this blocking for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not a blocker. Service or not, I think that code should be pulled into functions to keep the handler easier to read/test, but that can be done later
{ | ||
exists: { | ||
field: 'data_stream.namespace', | ||
// Query backing indices to extract data stream dataset, namespace, and type values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could find a better way here for doing this. Instead of checking all the indices for it we could check what the values are in the mapping of the write index? Or if we run the query, could we ensure it only touches one index (most recent one) and not all of them? But then, what if this index was rollover but has no data inside yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if this index was rollover but has no data inside yet
At some point I wrote the index name to be the last backing index (write index), but changed it for this exact reason :) I couldn't think of a better way to get these values but am open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would tend to trust ES here to be efficient in its aggregations, and leave it like this, until we know that it's a problem. Probing several indices until we find the newest one that has data is probably slower. Do we have a system with lots of rolled-over backing indices we could test with?
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]Async chunks
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this locally with these steps:
- set up Fleet & Agents
- use the default policy, install & run an agent
- verify data streams are created and data is coming in
- roll over one data stream manually
- on
master
, I can reproduce the bug: the rolled-over data stream appears twice in the list of data streams, and the number of data streams on the overview page increases - with this change, I can verify the bug is fixed: every data stream only appears once in the list. Also, the number of data streams on the overview page stays constant.
One comment inline about the aggregation to find dataset, namespace and type -- I think ES is in general more efficient than we can be by trying to be clever, so I'd leave it like this until we know it becomes a problem.
During a recent customer call it was pointed out that the data stream view is slow. It's probably worth investigating, but rather in a separate issue than in this PR.
Thanks @skh for the thorough testing and review! |
* Add API integration tests for data streams list, including one that is expected to fail due to reliance on number of backing indices * Use ES data streams API as source of truth for list of data streams, and only query against backing indices afterwards * Get package name from data stream meta info * Increate retry timeout * Move initial info requests inside Promise.all Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> # Conflicts: # x-pack/test/fleet_api_integration/apis/index.js
…-ml-jobs * 'master' of github.com:elastic/kibana: (254 commits) [Security Solution] [Detections] Remove allow_no_indices to prevent error being thrown in response of field capabilities (elastic#89927) Skip test for cloud (elastic#89450) [Fleet] Fix duplicate data streams being shown in UI (elastic#89812) Bump package dependencies (elastic#90034) [App Search] DRY helper for encoding/decoding routes that can have special characters in params (elastic#89811) TypeScript project references for Observability plugin (elastic#89320) [SearchSource] Combine sort and parent fields when serializing (elastic#89808) Made imports static (elastic#89935) [ml] migrate file_data_visualizer/import route to file_upload plugin (elastic#89640) [Discover] Adapt default column behavior (elastic#89826) Round start and end values (elastic#89030) Rename getProxyAgents to getCustomAgents (elastic#89813) [Form lib] UseField `onError` listener (elastic#89895) [APM] use latency sum instead of avg for impact (elastic#89990) migrate more core-owned plugins to tsproject ref (elastic#89975) [Logs UI] Load <LogStream> entries via async searches (elastic#86899) [APM] Abort browser requests when appropriate (elastic#89557) [Alerting] Allow user to select existing connector of same type when fixing broken connector (elastic#89062) [Data Table] Use shared CSV export mechanism (elastic#89702) chore(NA): improve logic check when installing Bazel tools (elastic#89634) ...
* Add API integration tests for data streams list, including one that is expected to fail due to reliance on number of backing indices * Use ES data streams API as source of truth for list of data streams, and only query against backing indices afterwards * Get package name from data stream meta info * Increate retry timeout * Move initial info requests inside Promise.all Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* Add API integration tests for data streams list, including one that is expected to fail due to reliance on number of backing indices * Use ES data streams API as source of truth for list of data streams, and only query against backing indices afterwards * Get package name from data stream meta info * Increate retry timeout * Move initial info requests inside Promise.all Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> # Conflicts: # x-pack/test/fleet_api_integration/apis/index.js
Summary
Resolves #87082. Duplicate data streams were being shown in UI once there are multiple backing indices per stream (backing indices are created due to rollover, either as a natural result of the ILM policy over time, or when packages are upgraded). This PR fixes that by:
_index
fielddataset
,namespace
, andtype
fields_meta
info, rather than coercing it from the dataset or index nameScreenshots
Response showing multiple backing indices for some data streams:
Correct data streams show in Fleet, no duplication per backing index: