Search Sessions Stabilization Stage I #134983

Dosant · 2022-06-23T09:14:35Z

Summary

This pr changes search service/search session infrastructure to improve performance, stability, and resiliency by ensuring that search sessions don’t add additional load on a cluster when the feature is not used

Details on motivation and implementation details in the RFC

Comparison of old vs. new implementation. More details in the RFC

	Old implementation	New implementation
Number of task manager tasks	3	0
Initial search keep_alive	1 week	1 minute, extended while polling
Search cancelation	Task manager task	Automatically deleted due to short keep_alive. In some case expedited faster from browser
Search-session saved object creation	For every search search even if user doesn’t save it	After user saves a session
Search-session status calculation	Task manager task with 10 second interval. Status is calculated and saved in saved object	Status is calculated on-the-fly when user retrieves search-sessions objects
Search-session clean-up	Sessions that are not saved by user cleaned-up by task manager	None. Because we create saved object only for sessions that saved by user
Saving session after search requests complete	5 minutes limit because task manager might have already deleted searches and session object	5 minutes artificial client-side limit to prevent keeping alive searches from "forgotten" tabs. The client continues polling on non-frequent intervals to keep the search alive while the user stays on the screen

Advantages of the new implementation:

No task manager usage
- Less Kibana server load
Short default keep_alive
- Less risk to end up with unintentional long-running search
Saved objects created only when user saves a session
- Less cluster load. Less storage needs.
No need for “touched” property is search-session saved object
- No more object updates on every search poll.
On-the-fly calculated search-session status
- No need for task manager to keep state in sync
- Less synchronization delays / potential bugs

Performace benchmark
The trend will be more visible in nightly performance tests after we merge, but I run kibana-load-test dashboard scenario a couple of times and I believe the difference is more then just a margin of error:

	Total	OK	KO	% KO	Cnt/s	Min	50th pct	75th pct	95th pct	99th pct	Max	Mean	Std Dev
Main	52088	52088	0	0%	108.066	9	1775	2384	6201	8895	11273	1968	1738
Branch	56899	56899	0	0%	123.693	9	1192	1972	4089	7057	8886	1414	1337

In this scenario, the branch performs better because there are no search-session monitoring tasks and no redundant search-session saved object creations and updates.

Minor user-facing changes:

Management screen:
Sessions list auto-refresh is disabled by default because the list is more expensive to fetch as we calculate session status on the fly
For the same reason, we fetch only the last 100 sessions of the current user
Saving the session is disabled when we continue the session from lens (or another editor). This should be revised with Separate architecture for client side cache sharing #121543
Errored session will be in "expired" status if current time > then expired time. We do this to shortcut search status checks if the session is expired.

Nice to have follows-ups (Out of scope):

ES should add completion_time to search status so we can add back completion time when restoring a session Add completion_time time field to async_search get or status response elasticsearch#88640
IsRestore / IsStored / isStore search options are confusing. Make them clearer consider splitting what is session related and what is search related.
Can simplify updateOrCreate logic server-side because now we create and update separately. This will improve performance and number of update retries. TODO create an improvement issue
New searches added to a session when restoring (other bucket) would use default expiration, so they wouldn't match the session's expiration.

Progress tracking

Testing Scenarios

Some advanced scenarios I've went through and to give you an idea what to look for:

Extend from mgmt works and extends sessions and searches
Deleting a saved session also deletes/cancels related async searches
When restoring a session, searches aren't extended further
When restoring a session and new searches are added - they are added to a session and warning is shown
Searches start with 1 minutes keep_alive, but after the session is saved are extended to 1 week (by default)
Everything works in a different space
Everything works when user doesn't have access to save a session
Searches work when the session feature is disabled (data.search.session.enabled:false). The keep_alive is 1 minute, and no client-side polling to extend complete searches
Migrations: old non-persisted sessions are dropped; persisted are migrated

Testing tips

Get search sessions in dev tools

GET .kibana*/_search
{
  "query": {
    "match": {
      "type": "search-session"
    }
  }
}

Check async search status

Look at idMapping object in search session attributes. Values would have id which is async_search_id from elasticsearch.

Use status API to check search's expiration_time and completion status:

GET /_async_search/status/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=

Checklist

Delete any items that are not applicable to this PR.

Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)
Any UI touched in this PR does not create any new axe failures (run axe in browser: FF, Chrome)
If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
This renders correctly on smaller devices using a responsive layout. (You can test this in your browser)
This was checked for cross-browser compatibility

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk	Probability	Severity	Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space.	Low	High	Worth testing in multiple spaces
Search session migrations	Medium	High	Old, not persisted sessions should be dropped. New should be migrated

For maintainers

This was checked for breaking API changes and was labeled appropriately

Dosant · 2022-06-23T11:00:56Z

@elasticmachine merge upstream

Dosant · 2022-06-24T14:05:07Z

@elasticmachine merge upstream

Dosant · 2022-06-27T09:08:04Z

@elasticmachine merge upstream

…35129)

Dosant · 2022-07-06T13:38:56Z

@elasticmachine merge upstream

Dosant · 2022-07-11T12:28:45Z

@elasticmachine merge upstream

…-stabilization-stage-1

…-fix'

Dosant · 2022-07-19T11:40:11Z

@elasticmachine merge upstream

Dosant · 2022-07-25T10:53:19Z

@elasticmachine merge upstream

…#136296)

Dosant · 2022-07-26T13:59:00Z

@elasticmachine merge upstream

Dosant · 2022-07-27T11:19:09Z

@elasticmachine merge upstream

elasticmachine · 2022-09-22T13:33:01Z

Pinging @elastic/kibana-app-services (Team:AppServicesSv)

Dosant · 2022-09-26T12:32:29Z

@elasticmachine merge upstream

watson

Kibana Platform Security changes LGTM 👍

Dosant · 2022-09-27T09:57:12Z

@elasticmachine merge upstream

ymao1

Response Ops changes LGTM. Saw the tasks get marked as "unrecognized".

ThomThomson

Functional test changes LGTM! Code review only.

Dosant · 2022-09-28T09:48:24Z

@elasticmachine merge upstream

flash1293 · 2022-09-29T11:18:10Z

I checked the timelion integration and just making sure I'm getting this correctly - if the session is saved, then a new request is kicked off so the timelion server can store it, but as long as the user stays on the dashboard, it's still "waiting" for the original request which isn't cancelled, right? If that's the case, then LGTM

ppisljar

lgtm

Dosant · 2022-10-05T08:22:59Z

@elasticmachine merge upstream

Dosant · 2022-10-05T08:25:26Z

I checked the timelion integration and just making sure I'm getting this correctly - if the session is saved, then a new request is kicked off so the timelion server can store it, but as long as the user stays on the dashboard, it's still "waiting" for the original request which isn't cancelled, right?

Right. This is expected.
I think this is OK because of a smaller bug surface as the implementation is simpler. Also potentially users may get the on-screen results earlier (without waiting for the second request to finish)

kibana-ci · 2022-10-05T09:50:43Z

💚 Build Succeeded

Buildkite Build
Commit: e77c66f

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`data`	521	520	-1

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`data`	2509	2513	+4

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`data`	52.7KB	52.4KB	-291.0B
`inspector`	15.9KB	16.0KB	+55.0B
`visTypeTimelion`	109.1KB	109.2KB	+92.0B
total			-144.0B

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`data`	23	24	+1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`data`	433.2KB	437.0KB	+3.9KB
`visTypeTimeseries`	19.4KB	19.5KB	+97.0B
total			+4.0KB

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/saved-objects-service.html#_mappings

id	before	after	diff
`search-session`	15	12	-3

Unknown metric groups

API count

id	before	after	diff
`data`	3213	3221	+8

ESLint disabled line counts

id	before	after	diff
`data`	46	52	+6

Total ESLint disabled count

id	before	after	diff
`data`	48	54	+6

History

💔 Build #77822 failed 46366c5
💚 Build #76384 succeeded 6941f6f
💚 Build #75944 succeeded 280428f
💚 Build #75603 succeeded afe8dfb
💛 Build #74793 was flaky ba240d3

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Changes search service/search session infrastructure to improve performance, stability, and resiliency by ensuring that search sessions don’t add additional load on a cluster when the feature is not used

Search Sessions Stabilization Stage 1 – Initial PR (#132823)

84137c7

Dosant added Feature:Search Querying infrastructure in Kibana Team:AppServicesSv Feature:Search Sessions labels Jun 23, 2022

Dosant changed the title ~~Search Sessions Stabilization Stage 1 – Initial PR (#132823)~~ [WIP] Search Sessions Stabilization Stage I Jun 23, 2022

Merge branch 'main' into search-sessions-stabilization-stage-1

af21b68

Dosant mentioned this pull request Jun 24, 2022

fix disableSaveAfterSessionCompleteTimedOut$, add back unit tests #135129

Merged

9 tasks

Merge branch 'main' into search-sessions-stabilization-stage-1

f745c27

kibanamachine and others added 2 commits June 27, 2022 05:08

Merge branch 'main' into search-sessions-stabilization-stage-1

2a89ac6

fix disableSaveAfterSessionCompleteTimedOut$, add back unit tests (#1…

548e4d1

…35129)

Merge branch 'main' into search-sessions-stabilization-stage-1

ac10c56

kibanamachine and others added 4 commits July 11, 2022 08:28

Merge branch 'main' into search-sessions-stabilization-stage-1

3729617

use isSearchStored param instead of server-side checkId (#135036)

a985fb9

Merge branch 'main' of github.com:elastic/kibana into search-sessions…

f05fba2

…-stabilization-stage-1

[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…

7dc02e7

…-fix'

Merge branch 'main' into search-sessions-stabilization-stage-1

4503157

Dosant mentioned this pull request Jul 20, 2022

[TSVB][Timelion] Restart request on saved session #132406

Closed

kibanamachine and others added 2 commits July 25, 2022 06:53

Merge branch 'main' into search-sessions-stabilization-stage-1

c7c2ff3

Refactor search session status on-the-fly calculation and other fixes (…

810428c

…#136296)

Merge branch 'main' into search-sessions-stabilization-stage-1

ec8da3e

Merge branch 'main' into search-sessions-stabilization-stage-1

0730295

Dosant requested review from a team as code owners September 22, 2022 13:32

Merge branch 'main' into search-sessions-stabilization-stage-1

afe8dfb

Dosant added the release_note:skip Skip the PR/issue when compiling release notes label Sep 26, 2022

watson approved these changes Sep 26, 2022

View reviewed changes

Merge branch 'main' into search-sessions-stabilization-stage-1

280428f

ymao1 approved these changes Sep 27, 2022

View reviewed changes

ThomThomson approved these changes Sep 27, 2022

View reviewed changes

Merge branch 'main' into search-sessions-stabilization-stage-1

6941f6f

flash1293 approved these changes Sep 29, 2022

View reviewed changes

ppisljar approved these changes Oct 4, 2022

View reviewed changes

Merge branch 'main' into search-sessions-stabilization-stage-1

46366c5

Merge branch 'main' into search-sessions-stabilization-stage-1

e77c66f

Dosant merged commit 5f3d439 into main Oct 5, 2022

Dosant deleted the search-sessions-stabilization-stage-1 branch October 5, 2022 09:52

kibanamachine added v8.6.0 backport:skip This commit does not require backporting labels Oct 5, 2022

sakurai-youhei mentioned this pull request Apr 19, 2023

Error while updating search session <guid> Saved object <id> conflict #101310

Closed

lukasolson mentioned this pull request Aug 14, 2023

[Search Session] Saved object is created when changing time range or filters in Lens #104378

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Sessions Stabilization Stage I #134983

Search Sessions Stabilization Stage I #134983

Dosant commented Jun 23, 2022 •

edited

Loading

Dosant commented Jun 23, 2022

Dosant commented Jun 24, 2022

Dosant commented Jun 27, 2022

Dosant commented Jul 6, 2022

Dosant commented Jul 11, 2022

Dosant commented Jul 19, 2022

Dosant commented Jul 25, 2022

Dosant commented Jul 26, 2022

Dosant commented Jul 27, 2022

elasticmachine commented Sep 22, 2022

Dosant commented Sep 26, 2022

watson left a comment

Dosant commented Sep 27, 2022

ymao1 left a comment

ThomThomson left a comment

Dosant commented Sep 28, 2022

flash1293 commented Sep 29, 2022

ppisljar left a comment

Dosant commented Oct 5, 2022

Dosant commented Oct 5, 2022 •

edited

Loading

kibana-ci commented Oct 5, 2022

API count

ESLint disabled line counts

Total ESLint disabled count

Search Sessions Stabilization Stage I #134983

Search Sessions Stabilization Stage I #134983

Conversation

Dosant commented Jun 23, 2022 • edited Loading

Summary

Testing Scenarios

Testing tips

Get search sessions in dev tools

Check async search status

Checklist

Risk Matrix

For maintainers

Dosant commented Jun 23, 2022

Dosant commented Jun 24, 2022

Dosant commented Jun 27, 2022

Dosant commented Jul 6, 2022

Dosant commented Jul 11, 2022

Dosant commented Jul 19, 2022

Dosant commented Jul 25, 2022

Dosant commented Jul 26, 2022

Dosant commented Jul 27, 2022

elasticmachine commented Sep 22, 2022

Dosant commented Sep 26, 2022

watson left a comment

Choose a reason for hiding this comment

Dosant commented Sep 27, 2022

ymao1 left a comment

Choose a reason for hiding this comment

ThomThomson left a comment

Choose a reason for hiding this comment

Dosant commented Sep 28, 2022

flash1293 commented Sep 29, 2022

ppisljar left a comment

Choose a reason for hiding this comment

Dosant commented Oct 5, 2022

Dosant commented Oct 5, 2022 • edited Loading

kibana-ci commented Oct 5, 2022

💚 Build Succeeded

Metrics [docs]

Module Count

Public APIs missing comments

Async chunks

Public APIs missing exports

Page load bundle

Saved Objects .kibana field count

API count

ESLint disabled line counts

Total ESLint disabled count

History

Dosant commented Jun 23, 2022 •

edited

Loading

Dosant commented Oct 5, 2022 •

edited

Loading