Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QL: Refactor DataType for pluggability #51328

Merged
merged 5 commits into from
Jan 27, 2020

Conversation

costin
Copy link
Member

@costin costin commented Jan 22, 2020

Change DataType from enum to class
Break DataType enums into QL (default) and SQL types
Make data type conversion pluggable so that new types can be introduced

As part of the process:

  • static type conversion in QL package (such as Literal) has been
    removed
  • several utility classes have been broken into base (QL) and extended
    (SQL) parts based on type awareness
  • operators (+,-,/,*) are
  • due to extensibility, serialization of arithmetic operation has been
    slightly changed and pushed down to the operator executor itself

@costin costin added :Analytics/SQL SQL querying :Analytics/EQL EQL querying labels Jan 22, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/EQL)

Change DataType from enum to class
Break DataType enums into QL (default) and SQL types
Make data type conversion pluggable so that new types can be introduced

As part of the process:
- static type conversion in QL package (such as Literal) has been
removed
- several utility classes have been broken into base (QL) and extended
(SQL) parts based on type awareness
- operators (+,-,/,*) are
- due to extensibility, serialization of arithmetic operation has been
slightly changed and pushed down to the operator executor itself
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first set of feedback.

(left == DataType.NULL || right == DataType.NULL)
|| (left.isString() && right.isString())
(left == NULL || right == NULL)
|| (isString(left) && isString(right))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this mix of styles? isSomething(DataType) vs. dataType.isSomething()
Is it possible to use only one?
Also, is not obvious to me why isNumeric() remains on the DataType while isString() is moved to DataTypes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataType contains only methods that can work on its properties (such as whether it's numeric or not) while everything else is on DataTypes.
From the usage point of view it would be cleaner to have all methods inside one place, namely DataTypes (as new types can have new properties such as interval) but in case of numeric properties these are parts of the DataType itself.
This can be revisited in the future so that these properties are fully externalized (and essentially hard-coded just like isString).

SUB
public interface NumericArithmetic extends BiFunction<Number, Number, Number> {
default Object wrap(Object l, Object r) {
if (!(l instanceof Number)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== false instead of ! ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the code as it was without doing any style touches. I'd rather do that separately (similar to organize imports) to avoid the extra noise.
I'm sure there are multiple places where the != style is being used.

case TEXT:
DataType esType = typeRegistry.fromEs(typeName);

if (esType == TEXT) {
return new TextEsField(fieldName, props, false, isAlias);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation issue.

return new UnsupportedEsField(fieldName, typeName, null, props);
default:
}
return new EsField(fieldName, esType, props, isAggregateable, isAlias);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still an indentation issue.


DataType fromJava(Object value);

boolean isUnsupported(DataType type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SqlDataTypeRegistry has DataTypes.isUnsupported(type); for this method implementation, DefaultDataTypeRegistry has return type == UNSUPPORTED;. Can this interface provide a default implementation and remove the specific ones (that seem to be the same) from the implementing classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A default implementation would tie the contract (the interface) to an DataType instance (UNSUPPORTED).
Delegating to *DataTypes is cleaner and makes each interface implementation clearer in design - it also makes them fully independent (if needed).
I replaced the == check with a call to DataTypes.isUnsupported to maintain the delegation pattern.

public int getPrecision() {
return esDataType.defaultPrecision;
}
// public int getPrecision() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this method now available only for Keywords?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's used only there. Having it all over does not provide any value.
Same as DateEsField which initially was suppose to maintain the date format however that's not the case so essentially it's just another EsField.

These APIs can be cleaned as not all properties are needed (including precision) and can be gathered into the module that needs them.

this.precision = precision;
this.normalized = normalized;
}

@Override
public int getPrecision() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the precision mean in the context of a generic QL keyword field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being a string, it represents the number of characters - namely ignore_above.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left few minor comments.

@@ -81,7 +79,7 @@ public void testDateField() {
EsField field = mapping.get("date");
assertThat(field.getDataType(), is(DATETIME));
assertThat(field.isAggregatable(), is(true));
assertThat(field.getPrecision(), is(3));
//assertThat(field.getPrecision(), is(3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover comment.

return new UnsupportedEsField(fieldName, typeName, null, props);
default:
}
return new EsField(fieldName, esType, props, isAggregateable, isAlias);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still an indentation issue.

return new TextEsField(fieldName, props, false, isAlias);
}
if (esType == KEYWORD) {
int length = Short.MAX_VALUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove length and use its value directly in the KeywordEsField consructor.
I would say that normalized can also be removed, and still keep the TODO in there?

//FIXME: Taken from sql-proto.
//Ideally it should be shared but the dependencies across projects and and SQL-client make it tricky.
//Maybe a gradle task would fix that...
//NB: Taken from sql-proto.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have it as information on where the code comes from (though it's not big enough to require some sort of automatic copying over).

case KEYWORD:
int length = intSetting(content.get("ignore_above"), esDataType.defaultPrecision);
} else if (esDataType == KEYWORD) {
int length = intSetting(content.get("ignore_above"), Short.MAX_VALUE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation issues.

private static final Map<String, DataType> SQL_TO_ES;

static {
Map<String, DataType> sqlToEs = new HashMap<>(mapSize(45));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the use of mapSize() method? For situations where a new type is added and the Map initialization is forgotten?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To allocate a map high enough so that, while adding items, it doesn't require reallocation.
In this case the map is a constant and we know before hand its size hence the 'optimization' to use from the start a size high enough that everything fits yet small enough to not waste space.

@@ -1010,7 +1010,7 @@ private Expression propagate(And and) {
if (comp != null) {
// var cannot be equal to two different values at the same time
if (comp != 0) {
return new Literal(and.source(), Boolean.FALSE, DataType.BOOLEAN);
return new Literal(and.source(), Boolean.FALSE, DataTypes.BOOLEAN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation issue. Also, check lines L1030-L1039, L1091, L1006, and L1111.

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Nice work! Left some minor mostly formatting related comments.

.map(t -> asList(t.toString(),
t.sqlType().getVendorTypeNumber(),
DataTypes.precision(t),
sqlType(t).getVendorTypeNumber(), precision(t),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I would leave one arg per line as it was.

//no fixed precision scale SQL_FALSE
Boolean.FALSE,
// not auto-incremented
Boolean.FALSE,
null,
DataTypes.metaSqlMinimumScale(t),
DataTypes.metaSqlMaximumScale(t),
metaSqlMinimumScale(t), metaSqlMaximumScale(t),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

// SQL_DATA_TYPE - ODBC wants this to be not null
DataTypes.metaSqlDataType(t),
DataTypes.metaSqlDateTimeSub(t),
metaSqlDataType(t), metaSqlDateTimeSub(t),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

//
// Script generation
//

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: extra empty line

throw new QlIllegalArgumentException("Cannot evaluate script for expression {}", exp);
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: extra empty line


public class SqlTypesTests extends ESTestCase {


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: extra empty line

EsField gs = mapping.get("site");
assertThat(gs.getDataType().typeName(), is("geo_shape"));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: extra empty line

return new Tuple<>(cmd, session);
}

public void testSysTypes() {
Command cmd = sql("SYS TYPES").v1();

List<String> names = asList("BYTE", "LONG", "BINARY", "NULL", "INTEGER", "SHORT", "HALF_FLOAT", "FLOAT", "DOUBLE", "SCALED_FLOAT",
"KEYWORD", "TEXT", "IP", "BOOLEAN", "DATE", "TIME", "DATETIME",
"IP", "KEYWORD", "TEXT", "BOOLEAN", "DATE", "TIME", "DATETIME",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share the logic in this re-arrangement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment - reproducing here for the record:

ordered per SQL type id (numeric) and then (for the same id) alphabetically

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I reviewed the PR over a couple of days and I forgot some stuff.

@@ -110,7 +108,7 @@ public void testDocValueField() {
assertThat(mapping.size(), is(1));
EsField field = mapping.get("session_id");
assertThat(field, instanceOf(KeywordEsField.class));
assertThat(field.getPrecision(), is(15));
//assertThat(field.getPrecision(), is(15));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -88,7 +89,7 @@ public Expression replaceChildren(List<Expression> newChildren) {
@Override
public ScriptTemplate asScript() {
ScriptTemplate leftScript = asScript(left);
ScriptTemplate rightScript = asOptionalScript(right);
ScriptTemplate rightScript = asScript(right == null ? Literal.NULL : right);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference: I kinda liked more the method asOptionaScript

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it since it was not used anywhere outside the base class.
The script generation needs improvement anyway as it has evolved beyond the initial requirements.

Copy link
Member Author

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comment

@costin costin merged commit aebda81 into elastic:master Jan 27, 2020
@costin costin deleted the ql/datatype-refactor branch January 27, 2020 15:33
costin added a commit that referenced this pull request Jan 27, 2020
* Introduce reusable QL plugin for SQL and EQL (#50815)

Extract reusable functionality from SQL into its own dedicated project QL.
Implemented as a plugin, it provides common components across SQL and the upcoming EQL.

While this commit is fairly large, for the most part it's just a big file move from sql package to the newly introduced ql.

(cherry picked from commit ec1ac0d)

* SQL: Fix incomplete registration of geo NamedWritables

(cherry picked from commit e295763)

* QL: Extend NodeSubclass to read classes from jars (#50866)

As the test classes are spread across more than one project, the Gradle
classpath contains not just folders but also jars.
This commit allows the test class to explore the archive content and
load matching classes from said source.

(cherry picked from commit 25ad749)

* QL: Remove implicit conversion inside Literal (#50962)

Literal constructor makes an implicit conversion for each value given
which turns out has some subtle side-effects.
Improve MathProcessors to preserve numeric type where possible
Fix bug on issue compatibility between date and intervals
Preserve the source when folding inside the Optimizer

(cherry picked from commit 9b73e22)

* QL: Refactor DataType for pluggability (#51328)

Change DataType from enum to class
Break DataType enums into QL (default) and SQL types
Make data type conversion pluggable so that new types can be introduced

As part of the process:
- static type conversion in QL package (such as Literal) has been
removed
- several utility classes have been broken into base (QL) and extended
(SQL) parts based on type awareness
- operators (+,-,/,*) are
- due to extensibility, serialization of arithmetic operation has been
slightly changed and pushed down to the operator executor itself

(cherry picked from commit aebda81)

* Compilation fixes for 7.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants