-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QL: Refactor DataType for pluggability #51328
Conversation
Pinging @elastic/es-search (:Search/EQL) |
Change DataType from enum to class Break DataType enums into QL (default) and SQL types Make data type conversion pluggable so that new types can be introduced As part of the process: - static type conversion in QL package (such as Literal) has been removed - several utility classes have been broken into base (QL) and extended (SQL) parts based on type awareness - operators (+,-,/,*) are - due to extensibility, serialization of arithmetic operation has been slightly changed and pushed down to the operator executor itself
0cb5288
to
8bff48b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first set of feedback.
(left == DataType.NULL || right == DataType.NULL) | ||
|| (left.isString() && right.isString()) | ||
(left == NULL || right == NULL) | ||
|| (isString(left) && isString(right)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this mix of styles? isSomething(DataType)
vs. dataType.isSomething()
Is it possible to use only one?
Also, is not obvious to me why isNumeric()
remains on the DataType while isString()
is moved to DataTypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataType
contains only methods that can work on its properties (such as whether it's numeric or not) while everything else is on DataTypes
.
From the usage point of view it would be cleaner to have all methods inside one place, namely DataTypes
(as new types can have new properties such as interval
) but in case of numeric properties these are parts of the DataType
itself.
This can be revisited in the future so that these properties are fully externalized (and essentially hard-coded just like isString
).
SUB | ||
public interface NumericArithmetic extends BiFunction<Number, Number, Number> { | ||
default Object wrap(Object l, Object r) { | ||
if (!(l instanceof Number)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== false
instead of !
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the code as it was without doing any style touches. I'd rather do that separately (similar to organize imports) to avoid the extra noise.
I'm sure there are multiple places where the !=
style is being used.
case TEXT: | ||
DataType esType = typeRegistry.fromEs(typeName); | ||
|
||
if (esType == TEXT) { | ||
return new TextEsField(fieldName, props, false, isAlias); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation issue.
return new UnsupportedEsField(fieldName, typeName, null, props); | ||
default: | ||
} | ||
return new EsField(fieldName, esType, props, isAggregateable, isAlias); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still an indentation issue.
|
||
DataType fromJava(Object value); | ||
|
||
boolean isUnsupported(DataType type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SqlDataTypeRegistry has DataTypes.isUnsupported(type);
for this method implementation, DefaultDataTypeRegistry has return type == UNSUPPORTED;
. Can this interface provide a default implementation and remove the specific ones (that seem to be the same) from the implementing classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A default implementation would tie the contract (the interface) to an DataType
instance (UNSUPPORTED
).
Delegating to *DataTypes
is cleaner and makes each interface implementation clearer in design - it also makes them fully independent (if needed).
I replaced the ==
check with a call to DataTypes.isUnsupported
to maintain the delegation pattern.
public int getPrecision() { | ||
return esDataType.defaultPrecision; | ||
} | ||
// public int getPrecision() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method now available only for Keywords?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's used only there. Having it all over does not provide any value.
Same as DateEsField
which initially was suppose to maintain the date format however that's not the case so essentially it's just another EsField
.
These APIs can be cleaned as not all properties are needed (including precision) and can be gathered into the module that needs them.
this.precision = precision; | ||
this.normalized = normalized; | ||
} | ||
|
||
@Override | ||
public int getPrecision() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the precision
mean in the context of a generic QL keyword field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being a string, it represents the number of characters - namely ignore_above
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left few minor comments.
@@ -81,7 +79,7 @@ public void testDateField() { | |||
EsField field = mapping.get("date"); | |||
assertThat(field.getDataType(), is(DATETIME)); | |||
assertThat(field.isAggregatable(), is(true)); | |||
assertThat(field.getPrecision(), is(3)); | |||
//assertThat(field.getPrecision(), is(3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover comment.
return new UnsupportedEsField(fieldName, typeName, null, props); | ||
default: | ||
} | ||
return new EsField(fieldName, esType, props, isAggregateable, isAlias); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still an indentation issue.
return new TextEsField(fieldName, props, false, isAlias); | ||
} | ||
if (esType == KEYWORD) { | ||
int length = Short.MAX_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove length
and use its value directly in the KeywordEsField consructor.
I would say that normalized
can also be removed, and still keep the TODO
in there?
//FIXME: Taken from sql-proto. | ||
//Ideally it should be shared but the dependencies across projects and and SQL-client make it tricky. | ||
//Maybe a gradle task would fix that... | ||
//NB: Taken from sql-proto. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to have it as information on where the code comes from (though it's not big enough to require some sort of automatic copying over).
case KEYWORD: | ||
int length = intSetting(content.get("ignore_above"), esDataType.defaultPrecision); | ||
} else if (esDataType == KEYWORD) { | ||
int length = intSetting(content.get("ignore_above"), Short.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation issues.
private static final Map<String, DataType> SQL_TO_ES; | ||
|
||
static { | ||
Map<String, DataType> sqlToEs = new HashMap<>(mapSize(45)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the use of mapSize()
method? For situations where a new type is added and the Map initialization is forgotten?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To allocate a map high enough so that, while adding items, it doesn't require reallocation.
In this case the map is a constant and we know before hand its size hence the 'optimization' to use from the start a size high enough that everything fits yet small enough to not waste space.
@@ -1010,7 +1010,7 @@ private Expression propagate(And and) { | |||
if (comp != null) { | |||
// var cannot be equal to two different values at the same time | |||
if (comp != 0) { | |||
return new Literal(and.source(), Boolean.FALSE, DataType.BOOLEAN); | |||
return new Literal(and.source(), Boolean.FALSE, DataTypes.BOOLEAN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation issue. Also, check lines L1030-L1039, L1091, L1006, and L1111.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Nice work! Left some minor mostly formatting related comments.
.map(t -> asList(t.toString(), | ||
t.sqlType().getVendorTypeNumber(), | ||
DataTypes.precision(t), | ||
sqlType(t).getVendorTypeNumber(), precision(t), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I would leave one arg per line as it was.
//no fixed precision scale SQL_FALSE | ||
Boolean.FALSE, | ||
// not auto-incremented | ||
Boolean.FALSE, | ||
null, | ||
DataTypes.metaSqlMinimumScale(t), | ||
DataTypes.metaSqlMaximumScale(t), | ||
metaSqlMinimumScale(t), metaSqlMaximumScale(t), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
// SQL_DATA_TYPE - ODBC wants this to be not null | ||
DataTypes.metaSqlDataType(t), | ||
DataTypes.metaSqlDateTimeSub(t), | ||
metaSqlDataType(t), metaSqlDateTimeSub(t), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
// | ||
// Script generation | ||
// | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: extra empty line
throw new QlIllegalArgumentException("Cannot evaluate script for expression {}", exp); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: extra empty line
|
||
public class SqlTypesTests extends ESTestCase { | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: extra empty line
EsField gs = mapping.get("site"); | ||
assertThat(gs.getDataType().typeName(), is("geo_shape")); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: extra empty line
return new Tuple<>(cmd, session); | ||
} | ||
|
||
public void testSysTypes() { | ||
Command cmd = sql("SYS TYPES").v1(); | ||
|
||
List<String> names = asList("BYTE", "LONG", "BINARY", "NULL", "INTEGER", "SHORT", "HALF_FLOAT", "FLOAT", "DOUBLE", "SCALED_FLOAT", | ||
"KEYWORD", "TEXT", "IP", "BOOLEAN", "DATE", "TIME", "DATETIME", | ||
"IP", "KEYWORD", "TEXT", "BOOLEAN", "DATE", "TIME", "DATETIME", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please share the logic in this re-arrangement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment - reproducing here for the record:
ordered per SQL type id (numeric) and then (for the same id) alphabetically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I reviewed the PR over a couple of days and I forgot some stuff.
@@ -110,7 +108,7 @@ public void testDocValueField() { | |||
assertThat(mapping.size(), is(1)); | |||
EsField field = mapping.get("session_id"); | |||
assertThat(field, instanceOf(KeywordEsField.class)); | |||
assertThat(field.getPrecision(), is(15)); | |||
//assertThat(field.getPrecision(), is(15)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -88,7 +89,7 @@ public Expression replaceChildren(List<Expression> newChildren) { | |||
@Override | |||
public ScriptTemplate asScript() { | |||
ScriptTemplate leftScript = asScript(left); | |||
ScriptTemplate rightScript = asOptionalScript(right); | |||
ScriptTemplate rightScript = asScript(right == null ? Literal.NULL : right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference: I kinda liked more the method asOptionaScript
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it since it was not used anywhere outside the base class.
The script generation needs improvement anyway as it has evolved beyond the initial requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left comment
* Introduce reusable QL plugin for SQL and EQL (#50815) Extract reusable functionality from SQL into its own dedicated project QL. Implemented as a plugin, it provides common components across SQL and the upcoming EQL. While this commit is fairly large, for the most part it's just a big file move from sql package to the newly introduced ql. (cherry picked from commit ec1ac0d) * SQL: Fix incomplete registration of geo NamedWritables (cherry picked from commit e295763) * QL: Extend NodeSubclass to read classes from jars (#50866) As the test classes are spread across more than one project, the Gradle classpath contains not just folders but also jars. This commit allows the test class to explore the archive content and load matching classes from said source. (cherry picked from commit 25ad749) * QL: Remove implicit conversion inside Literal (#50962) Literal constructor makes an implicit conversion for each value given which turns out has some subtle side-effects. Improve MathProcessors to preserve numeric type where possible Fix bug on issue compatibility between date and intervals Preserve the source when folding inside the Optimizer (cherry picked from commit 9b73e22) * QL: Refactor DataType for pluggability (#51328) Change DataType from enum to class Break DataType enums into QL (default) and SQL types Make data type conversion pluggable so that new types can be introduced As part of the process: - static type conversion in QL package (such as Literal) has been removed - several utility classes have been broken into base (QL) and extended (SQL) parts based on type awareness - operators (+,-,/,*) are - due to extensibility, serialization of arithmetic operation has been slightly changed and pushed down to the operator executor itself (cherry picked from commit aebda81) * Compilation fixes for 7.x
Change DataType from enum to class
Break DataType enums into QL (default) and SQL types
Make data type conversion pluggable so that new types can be introduced
As part of the process:
removed
(SQL) parts based on type awareness
slightly changed and pushed down to the operator executor itself