KQL parser generates deeply nested boolean clauses #89473

pmuellr · 2021-01-27T19:31:33Z

PR elastic/elasticsearch#66204 was recently backported to 7.x - it changes the limit on nested boolean clauses in ES queries. The new default max depth is 20.

This affects alerting, which creates fairly elaborate queries with deeply nested booleans - and we opened PR #89345 to fix this. We assumed at the time this was just a problem with the node_builder code we use to programmatically add to a KQL AST, to add our own filtering on an ES query.

The problem with node_builder was that it took some KQL like a:(B or C or D), but then treated it like a:B or (a:C or a:D), in terms of the ES query generated. Specifically, it took something that was "linear", and returned it as an equivalent, but nested form. This is specifically happening with and and or boolean causes. We fixed node_builder to linearize this, so it would be treated as (a:B or a:C or a:D), flattening out the "recursive" aspect of the expression interpretation.

That worked, and seemed to fix the deeply nested queries, but then some tests failed that were comparing output generated with node_builder with a pure KQL string parsed by the KQL PEG parser. And at that point, I realized that the KQL PEG parser is also generating these deeply nested "recursive" AST itself.

Here's the test where I figured that out:

kibana/x-pack/plugins/alerts/server/authorization/alerts_authorization.test.ts

Lines 630 to 634 in ba1e795

    
           expect((await alertAuthorization.getFindAuthorizationFilter()).filter).toEqual( 
        
             esKuery.fromKueryExpression( 
        
               `((alert.attributes.alertTypeId:myAppAlertType and alert.attributes.consumer:(alerts or myApp or myOtherApp or myAppWithSubFeature)) or (alert.attributes.alertTypeId:myOtherAppAlertType and alert.attributes.consumer:(alerts or myApp or myOtherApp or myAppWithSubFeature)) or (alert.attributes.alertTypeId:mySecondAppAlertType and alert.attributes.consumer:(alerts or myApp or myOtherApp or myAppWithSubFeature)))` 
        
             ) 
        
           );

The argument to expect() is the new output of the "fixed" node_builder, and esKuery.fromKueryExppression() is the output from the KQL PEG parser. The output is no longer "equal", with the KQL PEG parser output looking like the pre-fixed node_builder output. Some further experimenting with esKuery.fromKueryExppression() made it clear that the parser also generates the deeply nested booleans, just like the pre-fixed node_builder code. In fact, it seems likely to me that node_builder was probably structured to return the same shapes as the KQL parser, even though it didn't really have to.

I'm not sure how bad this is - but presumably a customer with KQL query a:(B or C or D or ... Z) (25 or'd clauses), they'll hit the nesting limit. Seems like something we need to fix.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-01-27T19:31:35Z

Pinging @elastic/kibana-app-services (Team:AppServices)

pmuellr · 2021-01-27T19:42:35Z

Thinking about this last night, one possible "fix" would be to perform an optimization on the KQL AST before it's turned into query DSL. Basically, look for nested or clauses that can be combined. And the same for and clauses. Seems fairly straight-forward. One question is where would we do this though. Could be as a back-end to the parser, where we "optimize" the AST before returning it from the caller. Alternatively, it could be on the front-end of the query DSL generation.

Related: while "fixing" the node_builder code, I happened to make another optimization: changing the code that gets generated like this today:

{ bool: { should: [{single-object-here}], minimum_should_match: 1 } }

to this:

{ bool: { must: {single-object-here} } }

That broke ~100 tests - some using jest snapshots, which were easy to fix :-), but some where doing jest equal tests of the AST to literal JSON objects in the test cases. These required quite a bit more surgery. I'm not sure how many tests check generated query DSL from KQL, but there are definitely quite a few checking the KQL AST output. Hence the suggestion that it might be better to "fix" this in the query DSL generation, if there would be fewer test changes required.

pmuellr · 2021-01-29T14:00:21Z

If we end up making changes to the parser, we should look to see if we can change some of the tests that were affected when we changed node_builder, in #89345 . That assumes the parser would end up generating the same shapes as node_builder - but I think there are some additional optimizations possible, so that may require some additional changes to node_builder to keep it in sync.

lukasolson · 2021-01-29T18:54:03Z

Related: #69649

pmuellr added bug Fixes for quality problems that affect the customer experience Team:AppServices labels Jan 27, 2021

pmuellr mentioned this issue Jan 29, 2021

[data] change KQL node builder to not generate recursive and/or clauses #89345

Merged

9 tasks

lukasolson mentioned this issue Jan 29, 2021

KQL expression parsing is slow #76811

Closed

lizozom added impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:large Large Level of Effort triaged labels Mar 1, 2021

lukasolson self-assigned this Mar 1, 2021

lukasolson added the SharedUX/fix-it-week Bugs that have been groomed and queued up for the team's next fix it week label Mar 1, 2021

lukasolson mentioned this issue Mar 3, 2021

[KQL] Do not generate nested AST/Elasticsearch queries for same-level AND/OR clauses #93506

Merged

2 tasks

lukasolson added the PR sent label Mar 3, 2021

lukasolson closed this as completed in #93506 Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KQL parser generates deeply nested boolean clauses #89473

KQL parser generates deeply nested boolean clauses #89473

pmuellr commented Jan 27, 2021

elasticmachine commented Jan 27, 2021

pmuellr commented Jan 27, 2021

pmuellr commented Jan 29, 2021

lukasolson commented Jan 29, 2021

KQL parser generates deeply nested boolean clauses #89473

KQL parser generates deeply nested boolean clauses #89473

Comments

pmuellr commented Jan 27, 2021

elasticmachine commented Jan 27, 2021

pmuellr commented Jan 27, 2021

pmuellr commented Jan 29, 2021

lukasolson commented Jan 29, 2021