[Fleet] Fix duplicate data streams being shown in UI #89812

jen-huang · 2021-01-30T00:46:18Z

Summary

Resolves #87082. Duplicate data streams were being shown in UI once there are multiple backing indices per stream (backing indices are created due to rollover, either as a natural result of the ILM policy over time, or when packages are upgraded). This PR fixes that by:

Querying against ES data stream APIs as source of truth for list of data streams, rather than querying backing indices and aggregating on _index field
Retaining the query against backing indices to figure out the data stream dataset, namespace, and type fields
Get package information based on package name stored in the data stream's _meta info, rather than coercing it from the dataset or index name
Adding an API integration tests for the data stream list endpoint that includes a multiple backing indices scenario

Screenshots

Response showing multiple backing indices for some data streams:

Correct data streams show in Fleet, no duplication per backing index:

…s expected to fail due to reliance on number of backing indices

…and only query against backing indices afterwards

elasticmachine · 2021-01-30T00:46:20Z

Pinging @elastic/fleet (Team:Fleet)

elasticmachine · 2021-01-30T00:46:20Z

Pinging @elastic/fleet (Feature:Fleet)

…eams

jfsiii · 2021-01-31T14:35:20Z

x-pack/plugins/fleet/server/routes/data_streams/handlers.ts

+    const dataStreamsInfoByName = keyBy(dataStreamsInfo, 'name');
+
+    // Get data stream stats
+    const { data_streams: dataStreamStats } = (await callCluster('transport.request', {


It seems we can do the callCluster calls in parallel instead of serially.

updated! I tossed the packaged SO retrieval in there too

jfsiii · 2021-01-31T14:46:34Z

x-pack/plugins/fleet/server/routes/data_streams/handlers.ts

-      metric: ['store'],
-    });
+    // Get matching data streams
+    const { data_streams: dataStreamsInfo } = (await callCluster('transport.request', {


These seem better defined in a service than a http handler.

jfsiii · 2021-01-31T15:04:12Z

x-pack/plugins/fleet/server/routes/data_streams/handlers.ts

-
+    // Return final data streams objects sorted by last activity, decending
+    // After filtering out data streams that are missing dataset/namespace/type fields
+    body.data_streams = (await Promise.all(dataStreamPromises))


This handler has more logic/statements than most others. What can we move to the service layer?

I think we can move lines 53-177 into a service allowing something like

Suggested change

body.data_streams = (await Promise.all(dataStreamPromises))

body.data_streams = await service.method()

Perhaps it's more than that, but I think we should move as much of this code as possible into a service.

When this code was written originally, it was an intentional decision to not create a data stream service yet and just leave the code in the handler because 1) there's only 1 data stream API (list), 2) no other place in the plugins needs access to this logic, and 3) no downstream plugins needs access either. I'd like to keep this simple for now until one of these situations change. Is this blocking for you?

No, it's not a blocker. Service or not, I think that code should be pulled into functions to keep the handler easier to read/test, but that can be done later

ruflin · 2021-02-01T11:59:39Z

x-pack/plugins/fleet/server/routes/data_streams/handlers.ts

-              {
-                exists: {
-                  field: 'data_stream.namespace',
+      // Query backing indices to extract data stream dataset, namespace, and type values


I wonder if we could find a better way here for doing this. Instead of checking all the indices for it we could check what the values are in the mapping of the write index? Or if we run the query, could we ensure it only touches one index (most recent one) and not all of them? But then, what if this index was rollover but has no data inside yet?

what if this index was rollover but has no data inside yet

At some point I wrote the index name to be the last backing index (write index), but changed it for this exact reason :) I couldn't think of a better way to get these values but am open to suggestions.

I would tend to trust ES here to be efficient in its aggregations, and leave it like this, until we know that it's a problem. Probing several indices until we find the newest one that has data is probably slower. Do we have a system with lots of rolled-over backing indices we could test with?

…eams

jen-huang · 2021-02-01T16:36:12Z

@elasticmachine merge upstream

jen-huang · 2021-02-01T18:28:52Z

@elasticmachine merge upstream

kibanamachine · 2021-02-01T20:34:30Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: c3d876e

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`fleet`	752.1KB	752.1KB	+3.0B

History

💔 Build #102631 failed 36c152e
💔 Build #102608 failed 3364d72
💚 Build #102435 succeeded a61c218
💔 Build #102410 failed b0c955d

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

skh

I've tested this locally with these steps:

set up Fleet & Agents
use the default policy, install & run an agent
verify data streams are created and data is coming in
roll over one data stream manually
on master, I can reproduce the bug: the rolled-over data stream appears twice in the list of data streams, and the number of data streams on the overview page increases
with this change, I can verify the bug is fixed: every data stream only appears once in the list. Also, the number of data streams on the overview page stays constant.

One comment inline about the aggregation to find dataset, namespace and type -- I think ES is in general more efficient than we can be by trying to be clever, so I'd leave it like this until we know it becomes a problem.

During a recent customer call it was pointed out that the data stream view is slow. It's probably worth investigating, but rather in a separate issue than in this PR.

jen-huang · 2021-02-02T18:11:10Z

Thanks @skh for the thorough testing and review!

* Add API integration tests for data streams list, including one that is expected to fail due to reliance on number of backing indices * Use ES data streams API as source of truth for list of data streams, and only query against backing indices afterwards * Get package name from data stream meta info * Increate retry timeout * Move initial info requests inside Promise.all Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> # Conflicts: # x-pack/test/fleet_api_integration/apis/index.js

…-ml-jobs * 'master' of github.com:elastic/kibana: (254 commits) [Security Solution] [Detections] Remove allow_no_indices to prevent error being thrown in response of field capabilities (elastic#89927) Skip test for cloud (elastic#89450) [Fleet] Fix duplicate data streams being shown in UI (elastic#89812) Bump package dependencies (elastic#90034) [App Search] DRY helper for encoding/decoding routes that can have special characters in params (elastic#89811) TypeScript project references for Observability plugin (elastic#89320) [SearchSource] Combine sort and parent fields when serializing (elastic#89808) Made imports static (elastic#89935) [ml] migrate file_data_visualizer/import route to file_upload plugin (elastic#89640) [Discover] Adapt default column behavior (elastic#89826) Round start and end values (elastic#89030) Rename getProxyAgents to getCustomAgents (elastic#89813) [Form lib] UseField `onError` listener (elastic#89895) [APM] use latency sum instead of avg for impact (elastic#89990) migrate more core-owned plugins to tsproject ref (elastic#89975) [Logs UI] Load <LogStream> entries via async searches (elastic#86899) [APM] Abort browser requests when appropriate (elastic#89557) [Alerting] Allow user to select existing connector of same type when fixing broken connector (elastic#89062) [Data Table] Use shared CSV export mechanism (elastic#89702) chore(NA): improve logic check when installing Bazel tools (elastic#89634) ...

* Add API integration tests for data streams list, including one that is expected to fail due to reliance on number of backing indices * Use ES data streams API as source of truth for list of data streams, and only query against backing indices afterwards * Get package name from data stream meta info * Increate retry timeout * Move initial info requests inside Promise.all Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

* Add API integration tests for data streams list, including one that is expected to fail due to reliance on number of backing indices * Use ES data streams API as source of truth for list of data streams, and only query against backing indices afterwards * Get package name from data stream meta info * Increate retry timeout * Move initial info requests inside Promise.all Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> # Conflicts: # x-pack/test/fleet_api_integration/apis/index.js

jen-huang added 3 commits January 29, 2021 14:13

Add API integration tests for data streams list, including one that i…

c7c82c5

…s expected to fail due to reliance on number of backing indices

Use ES data streams API as source of truth for list of data streams, …

6af9670

…and only query against backing indices afterwards

Get package name from data stream meta info

b0c955d

jen-huang added bug Fixes for quality problems that affect the customer experience release_note:fix v8.0.0 Feature:Fleet Fleet team's agent central management project Team:Fleet Team label for Observability Data Collection Fleet team v7.11.0 v7.12.0 labels Jan 30, 2021

jen-huang self-assigned this Jan 30, 2021

jen-huang requested review from a team and removed request for a team January 30, 2021 00:58

jen-huang added 2 commits January 30, 2021 12:15

Increate retry timeout

40d1d79

Merge remote-tracking branch 'upstream/master' into fix/dupe-data-str…

a61c218

…eams

jfsiii reviewed Jan 31, 2021

View reviewed changes

ruflin reviewed Feb 1, 2021

View reviewed changes

jen-huang added 2 commits February 1, 2021 07:20

Merge remote-tracking branch 'upstream/master' into fix/dupe-data-str…

6422511

…eams

Move initial info requests inside Promise.all

3364d72

Merge branch 'master' into fix/dupe-data-streams

36c152e

jen-huang requested a review from skh February 1, 2021 17:35

Merge branch 'master' into fix/dupe-data-streams

c3d876e

skh approved these changes Feb 2, 2021

View reviewed changes

jen-huang merged commit 19effe2 into elastic:master Feb 2, 2021

jen-huang deleted the fix/dupe-data-streams branch February 2, 2021 18:11

jen-huang added the v7.10.3 label Feb 2, 2021

This was referenced Feb 2, 2021

[7.x] [Fleet] Fix duplicate data streams being shown in UI (#89812) #90059

Merged

[7.11] [Fleet] Fix duplicate data streams being shown in UI (#89812) #90063

Merged

jen-huang removed the v7.10.3 label Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Fix duplicate data streams being shown in UI #89812

[Fleet] Fix duplicate data streams being shown in UI #89812

jen-huang commented Jan 30, 2021 •

edited

Loading

elasticmachine commented Jan 30, 2021

elasticmachine commented Jan 30, 2021

jfsiii Jan 31, 2021

jen-huang Feb 1, 2021 •

edited

Loading

jfsiii Jan 31, 2021

jfsiii Jan 31, 2021

jen-huang Feb 1, 2021

jfsiii Feb 1, 2021

ruflin Feb 1, 2021

jen-huang Feb 1, 2021

skh Feb 2, 2021

jen-huang commented Feb 1, 2021

jen-huang commented Feb 1, 2021

kibanamachine commented Feb 1, 2021

skh left a comment

jen-huang commented Feb 2, 2021

	body.data_streams = (await Promise.all(dataStreamPromises))
	body.data_streams = await service.method()

[Fleet] Fix duplicate data streams being shown in UI #89812

[Fleet] Fix duplicate data streams being shown in UI #89812

Conversation

jen-huang commented Jan 30, 2021 • edited Loading

Summary

Screenshots

elasticmachine commented Jan 30, 2021

elasticmachine commented Jan 30, 2021

Choose a reason for hiding this comment

jen-huang Feb 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jen-huang commented Feb 1, 2021

jen-huang commented Feb 1, 2021

kibanamachine commented Feb 1, 2021

💚 Build Succeeded

Metrics [docs]

Async chunks

History

skh left a comment

Choose a reason for hiding this comment

jen-huang commented Feb 2, 2021

jen-huang commented Jan 30, 2021 •

edited

Loading

jen-huang Feb 1, 2021 •

edited

Loading