-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: add conditional runtime parsing through semantic predicates #111995
Conversation
@header { | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated improvement to add the copyright in the generated code.
} | ||
|
||
options { | ||
superClass=LexerConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main option that matters here is the superClass which is used to hook up the conditional code - the generated parser and lexer will invoke this code which is defined in the LexerConfig (for the lexer) and ParserConfig for the parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
: [A-Za-z] | ||
: [a-z] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file is case-insensitive, no need to define upper case (doing that throws an error which is a nice benefit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding this as a comment.
@@ -80,7 +105,7 @@ fragment UNESCAPED_CHARS | |||
; | |||
|
|||
fragment EXPONENT | |||
: [Ee] [+-]? DIGIT+ | |||
: [e] [+-]? DIGIT+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here - the file is case-insensitive, no need to define upper case (doing that throws an error which is a nice benefit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokens are auto-generated which means changing their declaration order changes their assigned number.
Right now we don't use these numbers however I won't be surprised if in the future, different tools pick this up hence why I propose to own this file and not rely on the auto-generation.
A nice side-effect of that is in-development commands can be assigned their own range (e.g. 999_xxx) making them easy to skip inside the parser.
| showCommand | ||
| metaCommand | ||
// in development | ||
| {devVersion()}? metricsCommand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gate metrics command by a semantic predicate.
Note this can extended to include language versioning in the future as well:
such as {devVersion() && hasFeature("inlinestats")
.
There's a question on whether using hasFeature with strings is manageable versus adding some kind of versioning only for the language (whether that gets tied to an internal counter, ES public versioning or something else is a separate discussion):
{devVersion() && languageVersion() >= 12345}
Since this declaration is used verbatim in the generated parser/lexer code, the declaration should be portable and minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small problem - some things are gated behind feature flags and those flags can be enabled by setting a jvm parameter OR using a snapshot build. I don't remember precisely what is gated, but it's pretty close to this list.
Doing it with a feature flag is a little more convenient that the SNAPSHOT version directly because, on a release, folks can set the jvm parameter and test against that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the mechanism in place, it's easy to extend this to include feature flags - the main issue is handling this outside the server (e.g. Kibana).
// in development | ||
| {devVersion()}? inlinestatsCommand | ||
| {devVersion()}? lookupCommand | ||
| {devVersion()}? matchCommand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gate also the in-development commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these need to be gated in the lexer and parser? Wouldn't one of the two suffice?
inlinestatsCommand | ||
: INLINESTATS stats=fields (BY grouping=fields)? | ||
; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move them to the end under the "// in development" section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above regarding the token file.
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And now the copyright appears.
class EsqlConfig { | ||
|
||
// versioning information | ||
boolean devVersion = Build.current().isSnapshot(); | ||
|
||
public void setDevVersion(boolean dev) { | ||
this.devVersion = dev; | ||
} | ||
|
||
// not great because other grammar parser (Kibana) need the implement this method | ||
boolean hasFeature(String featureName) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this class is to hook the dev boolean flag and programatically set it for testing.
I've also added a potential hasFeature method to check if the underlying cluster supports certain command or not and disable it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errr - do we expose feature flags through an API? How would Kibana determine if a feature flag is set on a given cluster?
/** | ||
* Base class for hooking versioning information into the ANTLR parser. | ||
*/ | ||
public abstract class LexerConfig extends Lexer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base class used now by the generated lexer.
boolean devVersion() { | ||
return config.devVersion; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Java method called at runtime during lexing - it's just checking a boolean that is passed on through the EsqlConfig (to make it easier to share updates of the primitives across classes).
boolean releaseVersion() { | ||
return devVersion() == false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative method used to keep the semantic predicate inside the grammar to a minimum.
import org.antlr.v4.runtime.Parser; | ||
import org.antlr.v4.runtime.TokenStream; | ||
|
||
public abstract class ParserConfig extends Parser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Equivalent base class for the parser .
|
||
import static org.hamcrest.Matchers.containsString; | ||
|
||
public class GrammarInDevelopmentParsingTest extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test that checks the in-development commands fail, if the devVersion is set to false - that is, if the build is not a snapshot.
@@ -50,11 +60,14 @@ private <T> T invokeParser( | |||
BiFunction<AstBuilder, ParserRuleContext, T> result | |||
) { | |||
try { | |||
EsqlBaseLexer lexer = new EsqlBaseLexer(new CaseChangingCharStream(CharStreams.fromString(query))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to wrap the inputstream to CaseChangingCharStream
- ANTLR already handles the case insensitivity for us.
840662e
to
c0b3864
Compare
If there's no gating in the lexer, any new keyword introduced can and will clash with other parts of the grammar such as field name. See for example the error message that reports 'match' and the rest of the dev commands in case of an error. |
/* | ||
* Before modifying this file, please read the section above as changes here | ||
* have significant impact in the ANTLR generated code and its consumption upstream | ||
* (including Kibana). | ||
* | ||
* A. To add a new (production-ready) token | ||
* | ||
* Since tokens types (numbers) are generated by ANTLR in a continuous fashion, | ||
* it is desirable to avoid changing these values. | ||
* However the use of lexing modes prevents this since any addition to a mode | ||
* (regardless where it occurs) shifts all the declarations that follow in other modes. | ||
* | ||
* B. To add a development token (only available behind in snapshot/dev builds) | ||
* | ||
* Since the tokens/modes are in development, simply define them under the | ||
* "// in development section" and follow the section comments in that section. | ||
* That is use the DEV_ prefix and use the {isDevVersion()}? conditional. | ||
* Make sure to remove the prefix and conditional before promoting the tokens in | ||
* production. | ||
* They are defined at the end of the file, to minimize the impact on the existing | ||
* token types. | ||
* | ||
* C. Renaming a token | ||
* | ||
* Avoid renaming the token. But if you really have to, please check with the | ||
* Kibana team as they might be using consuming the dictionary. | ||
* | ||
* D. To remove a token | ||
* | ||
* If the tokens haven't made it to production (and make sure to double check), | ||
* simply remove them from the grammar. | ||
* If the tokens have made it, check with the Kibana team the impact the new | ||
* tokens on it. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added section on how to evolve the file moving forward - it introduces certain conventions in order to drop the DEV_ tokens from error messages (since the conditional cannot prevent the declaration itself).
DEV_INLINESTATS : {isDevVersion()}? 'inlinestats' -> pushMode(EXPRESSION_MODE); | ||
DEV_LOOKUP : {isDevVersion()}? 'lookup' -> pushMode(LOOKUP_MODE); | ||
DEV_MATCH : {isDevVersion()}? 'match' -> pushMode(EXPRESSION_MODE); | ||
DEV_METRICS : {isDevVersion()}? 'metrics' -> pushMode(METRICS_MODE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DEV_ prefix is used to drop the token in a non-dev environment.
// | ||
// Explain | ||
// | ||
mode EXPLAIN_MODE; | ||
EXPLAIN_OPENING_BRACKET : OPENING_BRACKET -> type(OPENING_BRACKET), pushMode(DEFAULT_MODE); | ||
EXPLAIN_PIPE : PIPE -> type(PIPE), popMode; | ||
EXPLAIN_WS : WS -> channel(HIDDEN); | ||
EXPLAIN_LINE_COMMENT : LINE_COMMENT -> channel(HIDDEN); | ||
EXPLAIN_MULTILINE_COMMENT : MULTILINE_COMMENT -> channel(HIDDEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this down since EXPLAIN mode is still in limbo - no need to have it high in the file, before the EXPRESSION mode.
The modes were initially done in alphabetical order however I think it's best to do them in chronological form.
// move it in the main section if the feature gets promoted | ||
DEV_MATCH_OP : {isDevVersion()}? DEV_MATCH -> type(DEV_MATCH); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ioanatia @carlosdelest identical tokens should be aliased as mentioned above (through the type() function) instead of being repeated.
: qualifiedName MATCH_OPERATOR queryString=string | ||
: valueExpression DEV_MATCH queryString=string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ioanatia try to stick with the existing grammar pattern; use valueExpression instead of qualifiedName. Perform validation if necessary in the parser .
/** | ||
* Utility class for filtering/processing TokenSource. | ||
*/ | ||
abstract class DelegatingTokenSource implements TokenSource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted this base class since I had another TokenSource implementation - I've removed it but kept it for future reuse (or maybe ANTLR will offer it).
|
||
import org.elasticsearch.Build; | ||
|
||
class EsqlConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified the base class - no check for features.
if (recognizer instanceof EsqlBaseParser parser && parser.isDevVersion() == false) { | ||
Matcher m = REPLACE_DEV.matcher(message); | ||
message = m.replaceAll(StringUtils.EMPTY); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hacky way to remove the DEV_ tokens - if anybody else has better ideas, let me know.
FTR, I've tried removing the token from the stream, moving them to a hidden channel however it doesn't work since the error is generated by the recognizer who's inspecting the ATN (the transition between states) and the DEV_ token show up.
The code above is much easier than manipulating the ATN, which gets generated by ANTLR - though it's probably quite fragile.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/kibana-esql (ES|QL-ui) |
@drewdaemon I've updated the PR with the flags and added some code on how to drop the DEV_ tokens when running on the stable branch when encountering a syntax error ("... expected on of {'from', 'eval' ...DEV_INLINE..."). |
I understand the gating in the lexer, but wonder about the gating in the parser. I just removed the
If the lexer doesn't accept the token, there's no reason for additional special treatment in the parser, no? |
* C. Renaming a token | ||
* | ||
* Avoid renaming the token. But if you really have to, please check with the | ||
* Kibana team as they might be using consuming the dictionary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo: might be using consuming ....
* If the tokens haven't made it to production (and make sure to double check), | ||
* simply remove them from the grammar. | ||
* If the tokens have made it, check with the Kibana team the impact the new | ||
* tokens on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence doesn't sound right, I think it's missing a verb...
check with the Kibana team the impact the new tokens have on it
// DEV_MYCOMMAND : {isDevVersion()}? 'mycommand' -> ... | ||
// | ||
// Once the command has been stabilized, remove the DEV_ prefix and the {}? conditional and move the command to the | ||
// main section while preserving alphabetical order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you added an example with DEV_MYCOMMAND
I think it's worth adding another example for the situation when the command is promoted to production.
The conditionals make sense for consistency - the rules can use 'released' tokens in which case they'll match. While in this PR, the tokens and rules go hand in hand, the goal is to provide a template that works in all cases (such as introducing a new rule that gets applied for existing tokens or that enables a feature only from a certain version onwards). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
: [A-Za-z] | ||
: [a-z] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding this as a comment.
|
||
import org.elasticsearch.Build; | ||
|
||
class EsqlConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be a record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because records are immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding this as a comment.
No need, if one tries to define upper case characters, the grammar will complain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
## Summary Pulls in the changes ES team introduced in the ES|QL grammar in elastic/elasticsearch#111995. Nothing should change as far as our functionality except that commands the ES team marks with the `DEV_` prefix will no longer show up in our validation errors. **Before** <img width="859" alt="Screenshot 2024-09-03 at 3 11 21 PM" src="https://github.com/user-attachments/assets/69dee5f1-dd26-4d85-b83b-a0b4689a3c09"> **After** <img width="848" alt="Screenshot 2024-09-03 at 3 10 35 PM" src="https://github.com/user-attachments/assets/31c07a0a-4e59-4e11-af72-a1eb7b7f1235"> Successful ES|QL grammar sync run: https://buildkite.com/elastic/kibana-es-ql-grammar-sync/builds/53
…astic#111995) Introducing condition parsing (and versioning) in the grammar (both lexer and parser). That is make sure that the underlying grammar gets parsed only if certain conditions are met such as : - in development code is picked up only in the snapshot branch - new commands are not available on older versions (clusters with old nodes) - deprecated grammar is no longer allowed once removed The goal of this branch is to test the concept and see whether it work both on the backend (Java) and front-end (Javascript) since the underlying predicates(aka conditions) are language specific.
WIPbranch for introducing condition parsing (and versioning) in the grammar (both lexer and parser).That is make sure that the underlying grammar gets parsed only if certain conditions are met such as :
The goal of this branch is to test the concept and see whether it work both on the backend (Java) and front-end (Javascript) since the underlying predicates(aka conditions) are language specific.