-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KQL expression parsing is slow #76811
Comments
We could evaluate other alternatives as a last resort. There is an interesting benchmark available https://sap.github.io/chevrotain/performance/ |
Don't have much experience in this area but we could probably try to unblock event loop or actually speed up parsing: Client:
Node:
|
Before implementing either web-workers or worker-threads, I think we should see if we can optimize the implementation itself. Even if it's not blocking the main event-loop, it's still consuming CPU. It's common for Kibana to be deployed to servers with only one or two CPUs that are shared for all operations, so we're still consuming resources that are used for other operations. |
@elastic/kibana-app-arch are there any short-term improvements planned for KQL parsing? If there aren't any improvements we can make, it's unsuitable for use in a lot of scenarios. At a minimum, we'll likely want to remove the ability for consumers of the |
We intend to dive further into this, but there are not currently plans to take this on in the short-term (7.11). If this is something that is blocking you on a project that needs to happen in a 7.11 timeframe, please let us know -- otherwise we will follow up as soon as we have more concrete info on what to expect. |
## Summary Resolves #143335. Some history: A similar issue was reported a few years back (#76811). The solution (#93319) was to use the `--cache` PEG.js [parameter](https://pegjs.org/documentation#generating-a-parser) when generating the parser. Back when this was added, we were still manually building the parser on demand when it was changed. Eventually we added support for dynamically building the parser during the build process (#145615). I'm not sure where along the process the `cache` parameter got lost but it didn't appear to be used when we switched. This PR re-adds this parameter which increases performance considerably (metrics shown in ops/sec): ``` Before using cache: ● kuery AST API › fromKueryExpression › performance › with simple expression Received: 7110.68990544415 ● kuery AST API › fromKueryExpression › performance › with complex expression Received: 40.51361746242248 ● kuery AST API › fromKueryExpression › performance › with many subqueries Received: 17.071767133068473 After using cache: ● kuery AST API › fromKueryExpression › performance › with simple expression Received: 8275.49109867502 ● kuery AST API › fromKueryExpression › performance › with complex expression Received: 447.0459218892934 ● kuery AST API › fromKueryExpression › performance › with many subqueries Received: 115852.43643466769 ``` ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
Parsing even a simple KQL expression string is slow and a complex KQL expression is disastrously slow:
@elastic/kibana-app-arch it appears that a majority of the slowness is coming directly from the code which pegjs is generating. Have you all investigated the performance of KQL previously? It also looks like we're on version
0.9.0
of pegjs, is there any chance that an upgrade to0.10.0
would improve the performance? There's also a--cache
option which has fixed performance issues for at least one person; however, this contradicts the documentation:If we're unable to drastically improve the performance of KQL, it'll be unsuitable to use in a lot of scenarios. When parsing KQL expressions server-side, this can lead to the event-loop being blocked and preventing all other operations from running. The Fleet team has had to switch from using a KQL string to building the KueryNode manually and we can potentially adapt this approach elsewhere. However, we can't use this everywhere if we want end-users to be able to specify the KQL strings.
CC'ing a few people...
@elastic/kibana-platform because
SavedObjectsClient#find
supports a KQL expression string@elastic/kibana-alerting-services because your authorization is building a complex KQL expression string here
@elastic/siem because you all are using KQL to query Elasticsearch data-indices
Related: #69649, #89473
The text was updated successfully, but these errors were encountered: