Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev/server] improve startup time and prevent OOMs on shutdown #78710

Conversation

spalger
Copy link
Contributor

@spalger spalger commented Sep 29, 2020

returned to draft

I'm hoping that #79052 + #79235 #79358 will make this unnecessary.


In development the kibana CLI morphs into a different tool that is designed to be helpful when working on Kibana. It's current responsibilities include:

  • start Kibana server in a child process
  • watch for changes to the source code for the server and restart the Kibana server process on any changes
  • start @kbn/optimizer in the same process, which launches webpack workers in child processes
  • observe the status of the optimizer and kibana server, proxy HTTP/HTTPS requests to the Kibana server as long as the server is running and the optimizer isn't currently building assets.

These responsibilities are managed by the ClusterManager class, except for the proxy server which is managed by the core so that the HTTP interface mimics the Kibana server as closely as possible even while all requests are actually being proxied.

Over the process of migrating to the KP and deprecating the legacy process the amount of work happening in the parent process has grown substantially and lead to a serious slowdown of the server. Additionally, since we use @babel/register and the code base has grown substantially, stopping the dev CLI often triggers an OOM unless you set the NODE_OPTIONS environment variable to increase the max-old-space-size of the node process. This is caused by the fact that @babel/register maintains a cache of all transpiled code, and then synchronously at shutdown will serialize that object to JSON and write it to the file system. The massive amount of code we're asking it to transpile on the fly simply doesn't fit in memory with the default max-old-space-size of the Kibana process.

In order to improve the dev CLI experience I've made three changes:

  • Prevent the Kibana Platform from discovering any plugins in the parent process
  • Disable the babel/register cache in the parent process
  • Default to setting the NODE_OPTIONS to "--max-old-space-size=4096" in the environment of the Kibana server child process

These changes greatly reduce the memory requirements of the parent process, prevent the babel/register cache from being read or set in the parent process (which saves a surprising amount of time now that we don't need the cache and only load the core platform code), and increase the server processes ability to write the babel/register cache without OOMing.

Here are some stats that I collected on my computer, stats on other machines will surely vary and I'd love to see what you get.

This PR:

  • parent process idle memory usage: ~200mb
  • server process idle memory usage: ~1.3GB
  • time until server process is listening with primed optimizer cache: ~30 seconds

master:

  • parent process idle memory usage: ~700mb
  • server process idle memory usage: ~900mb
  • time until server process is listening with primed optimizer cache: ~55 seconds

@spalger spalger added Team:Operations Team label for Operations Team v8.0.0 release_note:skip Skip the PR/issue when compiling release notes v7.10.0 labels Sep 29, 2020
@spalger spalger force-pushed the implement/default-kibana-memory-limit-dev branch from 1f46c37 to 2ea87d5 Compare September 30, 2020 19:27
@spalger
Copy link
Contributor Author

spalger commented Sep 30, 2020

@elasticmachine merge upstream

@spalger spalger force-pushed the implement/default-kibana-memory-limit-dev branch from 87082f8 to 5766350 Compare September 30, 2020 23:17
@spalger spalger changed the title [dev/server] default the max-old-space-size to 4gb [dev/server] improve startup time and prevent OOMs on shutdown Sep 30, 2020
@spalger spalger marked this pull request as ready for review September 30, 2020 23:44
@spalger spalger requested review from a team as code owners September 30, 2020 23:44
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Operations Team label for Operations Team v7.10.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants