KQL expression parsing is slow #76811

kobelb · 2020-09-04T18:05:54Z

Parsing even a simple KQL expression string is slow and a complex KQL expression is disastrously slow:

const suite = new Benchmark.Suite();

suite
  .add('parse simple KQL', function () {
    esKuery.fromKueryExpression(
      'not fleet-agent-actions.attributes.sent_at: * and fleet-agent-actions.attributes.agent_id:1234567'
    );
  })
  .add('parse complex KQL', function () {
    esKuery.fromKueryExpression(
      `((alert.attributes.alertTypeId:.index-threshold and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:siem.signals and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:siem.notifications and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:metrics.alert.threshold and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:metrics.alert.inventory.threshold and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:logs.alert.document.count and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_cluster_health and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_license_expiration and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_cpu_usage and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_nodes_changed and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_logstash_version_mismatch and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_kibana_version_mismatch and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_elasticsearch_version_mismatch and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:apm.transaction_duration and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:apm.transaction_duration_anomaly and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:apm.error_rate and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:xpack.uptime.alerts.monitorStatus and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:xpack.uptime.alerts.tls and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:xpack.uptime.alerts.durationAnomaly and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)))`
    );
  })
  .on('cycle', function (event) {
    console.log(String(event.target));
  })
  .run({ async: true, minSamples: 10 });

parse simple KQL x 999 ops/sec ±2.00% (74 runs sampled)
parse complex KQL x 4.08 ops/sec ±2.22% (15 runs sampled)

@elastic/kibana-app-arch it appears that a majority of the slowness is coming directly from the code which pegjs is generating. Have you all investigated the performance of KQL previously? It also looks like we're on version 0.9.0 of pegjs, is there any chance that an upgrade to 0.10.0 would improve the performance? There's also a --cache option which has fixed performance issues for at least one person; however, this contradicts the documentation:

--cache — makes the parser cache results, avoiding exponential parsing time in pathological cases but making the parser slower

If we're unable to drastically improve the performance of KQL, it'll be unsuitable to use in a lot of scenarios. When parsing KQL expressions server-side, this can lead to the event-loop being blocked and preventing all other operations from running. The Fleet team has had to switch from using a KQL string to building the KueryNode manually and we can potentially adapt this approach elsewhere. However, we can't use this everywhere if we want end-users to be able to specify the KQL strings.

CC'ing a few people...

@elastic/kibana-platform because SavedObjectsClient#find supports a KQL expression string
@elastic/kibana-alerting-services because your authorization is building a complex KQL expression string here
@elastic/siem because you all are using KQL to query Elasticsearch data-indices

Related: #69649, #89473

The text was updated successfully, but these errors were encountered:

mshustov · 2020-09-06T15:43:22Z

We could evaluate other alternatives as a last resort. There is an interesting benchmark available https://sap.github.io/chevrotain/performance/

Dosant · 2020-09-08T15:30:18Z

Don't have much experience in this area but we could probably try to unblock event loop or actually speed up parsing:

Client:

using JS implementation in web-worker ?
or take c++ / rust PEG implementation and make a wasm module out of it ?

Node:

unblock the loop with worker thread? (experimental in node 10, but stable in 12 :( )
native c++/rust node js module with PEG implementation? maybe would be faster?

kobelb · 2020-09-08T15:56:22Z

Before implementing either web-workers or worker-threads, I think we should see if we can optimize the implementation itself. Even if it's not blocking the main event-loop, it's still consuming CPU. It's common for Kibana to be deployed to servers with only one or two CPUs that are shared for all operations, so we're still consuming resources that are used for other operations.

kobelb · 2020-09-23T18:38:20Z

@elastic/kibana-app-arch are there any short-term improvements planned for KQL parsing?

If there aren't any improvements we can make, it's unsuitable for use in a lot of scenarios. At a minimum, we'll likely want to remove the ability for consumers of the SavedObjectsClient::find to specify a KQL string and instead specify a KueryNode to reduce the likelihood of someone else tripping on this hazard.

lukeelmers · 2020-10-19T22:11:13Z

We intend to dive further into this, but there are not currently plans to take this on in the short-term (7.11).

If this is something that is blocking you on a project that needs to happen in a 7.11 timeframe, please let us know -- otherwise we will follow up as soon as we have more concrete info on what to expect.

## Summary Resolves #143335. Some history: A similar issue was reported a few years back (#76811). The solution (#93319) was to use the `--cache` PEG.js [parameter](https://pegjs.org/documentation#generating-a-parser) when generating the parser. Back when this was added, we were still manually building the parser on demand when it was changed. Eventually we added support for dynamically building the parser during the build process (#145615). I'm not sure where along the process the `cache` parameter got lost but it didn't appear to be used when we switched. This PR re-adds this parameter which increases performance considerably (metrics shown in ops/sec): ``` Before using cache: ● kuery AST API › fromKueryExpression › performance › with simple expression Received: 7110.68990544415 ● kuery AST API › fromKueryExpression › performance › with complex expression Received: 40.51361746242248 ● kuery AST API › fromKueryExpression › performance › with many subqueries Received: 17.071767133068473 After using cache: ● kuery AST API › fromKueryExpression › performance › with simple expression Received: 8275.49109867502 ● kuery AST API › fromKueryExpression › performance › with complex expression Received: 447.0459218892934 ● kuery AST API › fromKueryExpression › performance › with many subqueries Received: 115852.43643466769 ``` ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

kobelb added the discuss label Sep 4, 2020

wylieconlon added Feature:KQL KQL Team:AppArch labels Sep 4, 2020

kobelb mentioned this issue Sep 8, 2020

Alerting RBAC - manually build KueryNode #76960

Closed

dhurley14 mentioned this issue Sep 9, 2020

[Security Solution][Detections] Performance enhancement for readRules function #76398

Closed

2 tasks

gmmorris mentioned this issue Sep 9, 2020

[Alerting] Improves performance of the authorization filter in AlertsClient.find by skipping KQL parsing #77040

Merged

2 tasks

stacey-gammon added the performance label Sep 16, 2020

kobelb mentioned this issue Sep 23, 2020

SavedObjectClient#find with KQL expression strings #78348

Closed

Dosant assigned lukasolson Mar 1, 2021

Dosant added the SharedUX/fix-it-week Bugs that have been groomed and queued up for the team's next fix it week label Mar 1, 2021

lukasolson mentioned this issue Mar 2, 2021

[KQL] Use cache and other performance improvements #93319

Merged

lukasolson added the PR sent label Mar 2, 2021

lukasolson mentioned this issue Mar 4, 2021

[KQL] Remove Lucene syntax warnings #93619

Merged

3 tasks

lukasolson closed this as completed in #93319 Mar 8, 2021

lukasolson mentioned this issue Apr 24, 2024

[KQL] Fix performance issue with nested subqueries #181208

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KQL expression parsing is slow #76811

KQL expression parsing is slow #76811

kobelb commented Sep 4, 2020 •

edited by lukasolson

Loading

mshustov commented Sep 6, 2020

Dosant commented Sep 8, 2020

kobelb commented Sep 8, 2020

kobelb commented Sep 23, 2020

lukeelmers commented Oct 19, 2020

KQL expression parsing is slow #76811

KQL expression parsing is slow #76811

Comments

kobelb commented Sep 4, 2020 • edited by lukasolson Loading

mshustov commented Sep 6, 2020

Dosant commented Sep 8, 2020

kobelb commented Sep 8, 2020

kobelb commented Sep 23, 2020

lukeelmers commented Oct 19, 2020

kobelb commented Sep 4, 2020 •

edited by lukasolson

Loading