ESQL: add conditional runtime parsing through semantic predicates #111995

costin · 2024-08-20T00:22:58Z

~~WIP~~ branch for introducing condition parsing (and versioning) in the grammar (both lexer and parser).

That is make sure that the underlying grammar gets parsed only if certain conditions are met such as :

in development code is picked up only in the snapshot branch
new commands are not available on older versions (clusters with old nodes)
deprecated grammar is no longer allowed once removed

The goal of this branch is to test the concept and see whether it work both on the backend (Java) and front-end (Javascript) since the underlying predicates(aka conditions) are language specific.

costin · 2024-08-20T00:25:49Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

+@header {
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+}


Unrelated improvement to add the copyright in the generated code.

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

costin · 2024-08-20T00:31:13Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

+}
+
+options {
+  superClass=LexerConfig;


The main option that matters here is the superClass which is used to hook up the conditional code - the generated parser and lexer will invoke this code which is defined in the LexerConfig (for the lexer) and ParserConfig for the parser.

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

costin · 2024-08-20T00:33:15Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

-    : [A-Za-z]
+    : [a-z]


The file is case-insensitive, no need to define upper case (doing that throws an error which is a nice benefit).

Might be worth adding this as a comment.

costin · 2024-08-20T00:33:23Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

@@ -80,7 +105,7 @@ fragment UNESCAPED_CHARS
    ;

 fragment EXPONENT
-    : [Ee] [+-]? DIGIT+
+    : [e] [+-]? DIGIT+


same here - the file is case-insensitive, no need to define upper case (doing that throws an error which is a nice benefit).

costin · 2024-08-20T00:35:12Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.tokens

The tokens are auto-generated which means changing their declaration order changes their assigned number.
Right now we don't use these numbers however I won't be surprised if in the future, different tools pick this up hence why I propose to own this file and not rely on the auto-generation.

A nice side-effect of that is in-development commands can be assigned their own range (e.g. 999_xxx) making them easy to skip inside the parser.

costin · 2024-08-20T00:35:39Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.g4

    | showCommand
-    | metaCommand
+    // in development
+    | {devVersion()}? metricsCommand


gate metrics command by a semantic predicate.
Note this can extended to include language versioning in the future as well:
such as {devVersion() && hasFeature("inlinestats") .
There's a question on whether using hasFeature with strings is manageable versus adding some kind of versioning only for the language (whether that gets tied to an internal counter, ES public versioning or something else is a separate discussion):
{devVersion() && languageVersion() >= 12345}

Since this declaration is used verbatim in the generated parser/lexer code, the declaration should be portable and minimal.

Small problem - some things are gated behind feature flags and those flags can be enabled by setting a jvm parameter OR using a snapshot build. I don't remember precisely what is gated, but it's pretty close to this list.

Doing it with a feature flag is a little more convenient that the SNAPSHOT version directly because, on a release, folks can set the jvm parameter and test against that.

With the mechanism in place, it's easy to extend this to include feature flags - the main issue is handling this outside the server (e.g. Kibana).

costin · 2024-08-20T00:35:55Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.g4

+    // in development
+    | {devVersion()}? inlinestatsCommand
+    | {devVersion()}? lookupCommand
+    | {devVersion()}? matchCommand


gate also the in-development commands.

Why do these need to be gated in the lexer and parser? Wouldn't one of the two suffice?

costin · 2024-08-20T00:36:24Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.g4

-inlinestatsCommand
-    : INLINESTATS stats=fields (BY grouping=fields)?
-    ;
-
-


Move them to the end under the "// in development" section

costin · 2024-08-20T00:36:37Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.tokens

Same comment as above regarding the token file.

costin · 2024-08-20T00:37:14Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParserListener.java

+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */


And now the copyright appears.

costin · 2024-08-20T00:38:26Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlConfig.java

+class EsqlConfig {
+
+    // versioning information
+    boolean devVersion = Build.current().isSnapshot();
+
+    public void setDevVersion(boolean dev) {
+        this.devVersion = dev;
+    }
+
+    // not great because other grammar parser (Kibana) need the implement this method
+    boolean hasFeature(String featureName) {
+        return false;
+    }


The point of this class is to hook the dev boolean flag and programatically set it for testing.
I've also added a potential hasFeature method to check if the underlying cluster supports certain command or not and disable it accordingly.

Errr - do we expose feature flags through an API? How would Kibana determine if a feature flag is set on a given cluster?

costin · 2024-08-20T00:38:58Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LexerConfig.java

+/**
+ * Base class for hooking versioning information into the ANTLR parser.
+ */
+public abstract class LexerConfig extends Lexer {


Base class used now by the generated lexer.

costin · 2024-08-20T00:39:54Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LexerConfig.java

+    boolean devVersion() {
+        return config.devVersion;
+    }


The Java method called at runtime during lexing - it's just checking a boolean that is passed on through the EsqlConfig (to make it easier to share updates of the primitives across classes).

costin · 2024-08-20T00:40:33Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LexerConfig.java

+    boolean releaseVersion() {
+        return devVersion() == false;
+    }


Alternative method used to keep the semantic predicate inside the grammar to a minimum.

costin · 2024-08-20T00:40:46Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/ParserConfig.java

+import org.antlr.v4.runtime.Parser;
+import org.antlr.v4.runtime.TokenStream;
+
+public abstract class ParserConfig extends Parser {


Equivalent base class for the parser .

costin · 2024-08-20T00:41:36Z

.../esql/src/test/java/org/elasticsearch/xpack/esql/parser/GrammarInDevelopmentParsingTest.java

+
+import static org.hamcrest.Matchers.containsString;
+
+public class GrammarInDevelopmentParsingTest extends ESTestCase {


Test that checks the in-development commands fail, if the devVersion is set to false - that is, if the build is not a snapshot.

costin · 2024-08-20T00:42:13Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlParser.java

@@ -50,11 +60,14 @@ private <T> T invokeParser(
        BiFunction<AstBuilder, ParserRuleContext, T> result
    ) {
        try {
-            EsqlBaseLexer lexer = new EsqlBaseLexer(new CaseChangingCharStream(CharStreams.fromString(query)));


No need to wrap the inputstream to CaseChangingCharStream - ANTLR already handles the case insensitivity for us.

costin · 2024-08-27T00:25:20Z

@alex-spies

Why do these need to be gated in the lexer and parser? Wouldn't one of the two suffice?

If there's no gating in the lexer, any new keyword introduced can and will clash with other parts of the grammar such as field name. See for example the error message that reports 'match' and the rest of the dev commands in case of an error.

costin · 2024-08-27T00:28:16Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

+/*
+ * Before modifying this file, please read the section above as changes here
+ * have significant impact in the ANTLR generated code and its consumption upstream
+ * (including Kibana).
+ *
+ * A. To add a new (production-ready) token
+ *
+ * Since tokens types (numbers) are generated by ANTLR in a continuous fashion,
+ * it is desirable to avoid changing these values.
+ * However the use of lexing modes prevents this since any addition to a mode
+ * (regardless where it occurs) shifts all the declarations that follow in other modes.
+ *
+ * B. To add a development token (only available behind in snapshot/dev builds)
+ *
+ * Since the tokens/modes are in development, simply define them under the
+ * "// in development section" and follow the section comments in that section.
+ * That is use the DEV_ prefix and use the {isDevVersion()}? conditional.
+ * Make sure to remove the prefix and conditional before promoting the tokens in
+ * production.
+ * They are defined at the end of the file, to minimize the impact on the existing
+ * token types.
+ *
+ * C. Renaming a token
+ *
+ * Avoid renaming the token. But if you really have to, please check with the
+ * Kibana team as they might be using consuming the dictionary.
+ *
+ * D. To remove a token
+ *
+ * If the tokens haven't made it to production (and make sure to double check),
+ * simply remove them from the grammar.
+ * If the tokens have made it, check with the Kibana team the impact the new
+ * tokens on it.
+ */


Added section on how to evolve the file moving forward - it introduces certain conventions in order to drop the DEV_ tokens from error messages (since the conditional cannot prevent the declaration itself).

costin · 2024-08-27T00:28:39Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

+DEV_INLINESTATS : {isDevVersion()}? 'inlinestats'   -> pushMode(EXPRESSION_MODE);
+DEV_LOOKUP :      {isDevVersion()}? 'lookup'        -> pushMode(LOOKUP_MODE);
+DEV_MATCH :       {isDevVersion()}? 'match'         -> pushMode(EXPRESSION_MODE);
+DEV_METRICS :     {isDevVersion()}? 'metrics'       -> pushMode(METRICS_MODE);


The DEV_ prefix is used to drop the token in a non-dev environment.

costin · 2024-08-27T00:29:56Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

-//
-// Explain
-//
-mode EXPLAIN_MODE;
-EXPLAIN_OPENING_BRACKET : OPENING_BRACKET -> type(OPENING_BRACKET), pushMode(DEFAULT_MODE);
-EXPLAIN_PIPE : PIPE -> type(PIPE), popMode;
-EXPLAIN_WS : WS -> channel(HIDDEN);
-EXPLAIN_LINE_COMMENT : LINE_COMMENT -> channel(HIDDEN);
-EXPLAIN_MULTILINE_COMMENT : MULTILINE_COMMENT -> channel(HIDDEN);


I've moved this down since EXPLAIN mode is still in limbo - no need to have it high in the file, before the EXPRESSION mode.
The modes were initially done in alphabetical order however I think it's best to do them in chronological form.

costin · 2024-08-27T00:31:04Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

+// move it in the main section if the feature gets promoted
+DEV_MATCH_OP : {isDevVersion()}? DEV_MATCH -> type(DEV_MATCH);
+


@ioanatia @carlosdelest identical tokens should be aliased as mentioned above (through the type() function) instead of being repeated.

costin · 2024-08-27T00:33:21Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.g4

-    : qualifiedName MATCH_OPERATOR queryString=string
+    : valueExpression DEV_MATCH queryString=string


@ioanatia try to stick with the existing grammar pattern; use valueExpression instead of qualifiedName. Perform validation if necessary in the parser .

costin · 2024-08-27T00:34:42Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/DelegatingTokenSource.java

+/**
+ * Utility class for filtering/processing TokenSource.
+ */
+abstract class DelegatingTokenSource implements TokenSource {


Extracted this base class since I had another TokenSource implementation - I've removed it but kept it for future reuse (or maybe ANTLR will offer it).

costin · 2024-08-27T00:35:13Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlConfig.java

+
+import org.elasticsearch.Build;
+
+class EsqlConfig {


Simplified the base class - no check for features.

costin · 2024-08-27T00:37:56Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlParser.java

+            if (recognizer instanceof EsqlBaseParser parser && parser.isDevVersion() == false) {
+                Matcher m = REPLACE_DEV.matcher(message);
+                message = m.replaceAll(StringUtils.EMPTY);
+            }
+


Hacky way to remove the DEV_ tokens - if anybody else has better ideas, let me know.
FTR, I've tried removing the token from the stream, moving them to a hidden channel however it doesn't work since the error is generated by the recognizer who's inspecting the ATN (the transition between states) and the DEV_ token show up.
The code above is much easier than manipulating the ATN, which gets generated by ANTLR - though it's probably quite fragile.

elasticsearchmachine · 2024-08-27T00:39:03Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-08-27T00:39:03Z

Pinging @elastic/kibana-esql (ES|QL-ui)

costin · 2024-08-27T00:40:35Z

@drewdaemon I've updated the PR with the flags and added some code on how to drop the DEV_ tokens when running on the stable branch when encountering a syntax error ("... expected on of {'from', 'eval' ...DEV_INLINE...").

alex-spies · 2024-08-27T12:28:05Z

@alex-spies

Why do these need to be gated in the lexer and parser? Wouldn't one of the two suffice?

If there's no gating in the lexer, any new keyword introduced can and will clash with other parts of the grammar such as field name. See for example the error message that reports 'match' and the rest of the dev commands in case of an error.

I understand the gating in the lexer, but wonder about the gating in the parser.

I just removed the isDevVersion predicate for inlinestats from the parser, locally, and ran an inlinestats query; still resulted in the expected parsing exception:

{"error":{"root_cause":[{"type":"parsing_exception","reason":"line 1:20: mismatched input 'inlinestats' expecting {'dissect', 'drop', 'enrich', 'eval', 'grok', 'keep', 'limit', 'mv_expand', 'rename', 'sort', 'stats', 'where'}"}],"type":"parsing_exception","reason":"line 1:20: mismatched input 'inlinestats' expecting {'dissect', 'drop', 'enrich', 'eval', 'grok', 'keep', 'limit', 'mv_expand', 'rename', 'sort', 'stats', 'where'}","caused_by":{"type":"input_mismatch_exception","reason":null}},"status":400}%

If the lexer doesn't accept the token, there's no reason for additional special treatment in the parser, no?

astefan · 2024-08-27T14:54:58Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

+ * C. Renaming a token
+ *
+ * Avoid renaming the token. But if you really have to, please check with the
+ * Kibana team as they might be using consuming the dictionary.


Minor typo: might be using consuming ....

astefan · 2024-08-27T14:57:55Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

+ * If the tokens haven't made it to production (and make sure to double check),
+ * simply remove them from the grammar.
+ * If the tokens have made it, check with the Kibana team the impact the new
+ * tokens on it.


This sentence doesn't sound right, I think it's missing a verb...
check with the Kibana team the impact the new tokens have on it

astefan · 2024-08-27T14:59:40Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

+// DEV_MYCOMMAND : {isDevVersion()}? 'mycommand' -> ...
+//
+// Once the command has been stabilized, remove the DEV_ prefix and the {}? conditional and move the command to the
+// main section while preserving alphabetical order.


Since you added an example with DEV_MYCOMMAND I think it's worth adding another example for the situation when the command is promoted to production.

costin · 2024-08-27T15:34:02Z

I understand the gating in the lexer, but wonder about the gating in the parser.

The conditionals make sense for consistency - the rules can use 'released' tokens in which case they'll match. While in this PR, the tokens and rules go hand in hand, the goal is to provide a template that works in all cases (such as introducing a new rule that gets applied for existing tokens or that enables a feature only from a certain version onwards).

bpintea

👍

bpintea · 2024-08-27T15:59:01Z

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4

-    : [A-Za-z]
+    : [a-z]


Might be worth adding this as a comment.

bpintea · 2024-08-27T16:02:30Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlConfig.java

+
+import org.elasticsearch.Build;
+
+class EsqlConfig {


Could it be a record?

No because records are immutable.

Might be worth adding this as a comment.

No need, if one tries to define upper case characters, the grammar will complain.

astefan

LGTM

## Summary Pulls in the changes ES team introduced in the ES|QL grammar in elastic/elasticsearch#111995. Nothing should change as far as our functionality except that commands the ES team marks with the `DEV_` prefix will no longer show up in our validation errors. **Before** <img width="859" alt="Screenshot 2024-09-03 at 3 11 21 PM" src="https://github.com/user-attachments/assets/69dee5f1-dd26-4d85-b83b-a0b4689a3c09"> **After** <img width="848" alt="Screenshot 2024-09-03 at 3 10 35 PM" src="https://github.com/user-attachments/assets/31c07a0a-4e59-4e11-af72-a1eb7b7f1235"> Successful ES|QL grammar sync run: https://buildkite.com/elastic/kibana-es-ql-grammar-sync/builds/53

…astic#111995) Introducing condition parsing (and versioning) in the grammar (both lexer and parser). That is make sure that the underlying grammar gets parsed only if certain conditions are met such as : - in development code is picked up only in the snapshot branch - new commands are not available on older versions (clusters with old nodes) - deprecated grammar is no longer allowed once removed The goal of this branch is to test the concept and see whether it work both on the backend (Java) and front-end (Javascript) since the underlying predicates(aka conditions) are language specific.

costin added WIP :Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI labels Aug 20, 2024

costin requested review from drewdaemon, astefan, bpintea, alex-spies and fang-xing-esql August 20, 2024 00:22

elasticsearchmachine added the v8.16.0 label Aug 20, 2024

costin commented Aug 20, 2024

View reviewed changes

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4 Outdated Show resolved Hide resolved

costin commented Aug 20, 2024

View reviewed changes

x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4 Show resolved Hide resolved

costin commented Aug 20, 2024

View reviewed changes

x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.tokens Outdated

Copy link

Member Author

costin Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above regarding the token file.

costin commented Aug 20, 2024

View reviewed changes

wip

c0b3864

costin force-pushed the esql/semantic-predicates branch from 840662e to c0b3864 Compare August 20, 2024 00:46

costin added 3 commits August 26, 2024 15:28

Polish

947e0fb

Merge branch 'main' into esql/semantic-predicates

045a6ed

Fix parser error

fea9824

costin commented Aug 27, 2024

View reviewed changes

costin removed the WIP label Aug 27, 2024

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 27, 2024

costin added the >refactoring label Aug 27, 2024

astefan reviewed Aug 27, 2024

View reviewed changes

bpintea approved these changes Aug 27, 2024

View reviewed changes

astefan approved these changes Aug 27, 2024

View reviewed changes

Fix typos

7078108

costin merged commit fff5c8f into elastic:main Aug 27, 2024
2 of 15 checks passed

costin deleted the esql/semantic-predicates branch August 27, 2024 20:51

drewdaemon mentioned this pull request Sep 3, 2024

[ES|QL] adapt to dev mode grammar gating elastic/kibana#192027

Merged

astefan mentioned this pull request Sep 12, 2024

ES|QL: Fix error messages for non-snapshot tests #112824

Merged


		import static org.hamcrest.Matchers.containsString;

		public class GrammarInDevelopmentParsingTest extends ESTestCase {

		// move it in the main section if the feature gets promoted
		DEV_MATCH_OP : {isDevVersion()}? DEV_MATCH -> type(DEV_MATCH);

		: qualifiedName MATCH_OPERATOR queryString=string
		: valueExpression DEV_MATCH queryString=string

ESQL: add conditional runtime parsing through semantic predicates #111995

ESQL: add conditional runtime parsing through semantic predicates #111995

Conversation

costin commented Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costin Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costin commented Aug 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Aug 27, 2024

elasticsearchmachine commented Aug 27, 2024

costin commented Aug 27, 2024

alex-spies commented Aug 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costin commented Aug 27, 2024

bpintea left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costin Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

costin commented Aug 20, 2024 •

edited

Loading

costin Aug 20, 2024 •

edited

Loading

costin Aug 27, 2024 •

edited

Loading