-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'mvt' field type format to geo fields #75367
Conversation
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the point of view of the general approach of getting access to the Geo formatter I think this PR is good. I have not reviewed the change holistically though, just from that aspect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just scanning this PR coming from #74476 which this seems to superseed and was just wondering about the two copyright notices here and if one of them was maybe added by accident. Sorry for the noise if this was done on purpose, I'd be interested in the policy we have around it in that case.
server/src/main/java/org/elasticsearch/common/geo/GeoFormatterFactory.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/geo/SimpleFeatureFactory.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/geo/SphericalMercatorUtils.java
Outdated
Show resolved
Hide resolved
In this iteration we changed how the modules depends between them, so repeating the main changes of this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. There are a couple of follow ups that I think we should do to make it less brittle and remove some vector-tile specific abstractions from the formatter.
@@ -84,7 +89,7 @@ protected XPackLicenseState getLicenseState() { | |||
Map<String, Mapper.TypeParser> mappers = new HashMap<>(super.getMappers()); | |||
mappers.put(ShapeFieldMapper.CONTENT_TYPE, ShapeFieldMapper.PARSER); | |||
mappers.put(PointFieldMapper.CONTENT_TYPE, PointFieldMapper.PARSER); | |||
mappers.put(GeoShapeWithDocValuesFieldMapper.CONTENT_TYPE, GeoShapeWithDocValuesFieldMapper.PARSER); | |||
mappers.put(GeoShapeWithDocValuesFieldMapper.CONTENT_TYPE, new GeoShapeWithDocValuesFieldMapper.TypeParser(vectorTileExtension)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way elasticsearch bootstrap system is setup the loadExtensions()
should be always called before getMappers() is called. The Plugin class and bootstrap system could have been designed better to avoid temporal coupling, but there is a lot of legacy here that we cannot deal with yet. I don't think should propagate this temporal coupling beyond the Plugin classes to the rest of the system. I think we can just add an assertion here that loadExtensions was indeed called before getMappers() is called just to be sure, and then use VectorTileExtension from here on instead of carrying SetOnce into the mappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I introduced a boolean variable to make sure the extensions are loaded before calling the mappers. I did have the same feeling that SetOnce should not be in the mappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to remove the assertion as some test has custom code to load mappers. I think integration test should still caught the case when extensions are not loaded before mappers so we are good.
* Get the vector tile engine. This is called when user ask for the MVT format on the field API. | ||
* We are only expecting one instance of a vector tile engine coming from the vector tile module. | ||
*/ | ||
GeoFormatterFactory.VectorTileEngine<Geometry> getVectorTileEngine(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are leaking way too much here. I think the right abstraction would be to just extend on the GeometryFromatter layer and hide the rest inside the tile class, but I think we can address it as another iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean here but let's leave it for next iteration
@Override | ||
public void loadExtensions(ExtensionLoader loader) { | ||
// we only expect one vector tile extension that comes from the vector tile module. | ||
loader.loadExtensions(VectorTileExtension.class).forEach(vectorTileExtension::set); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That essentially moves the dependency from the compile time into runtime. It is not explicit, but the whole thing will break if spatial module is present but vector-tile module is not or if we have another alternative implementation for that. We should probably address this as a follow up as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope here is that if the vector-tile module is not present everything should just work. You just end up with an error if you request geometries in mvt format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I stand corrected, it looks like it throws "vector tile format is not supported" exception in this case, which is hardcoded in the spatial module. But I think there is still too much knowledge to vector tile implementation inside spatial, let me try to take a shot at confine more of it into the vector tile module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be awesome. I ma going to push it as it is and we can iterate in a follow up.
In this commit we extend the Fields API formats for geo data in order to produce vector tiles features directly on the data nodes. That helps the vector tile API to reduce the size of the data that needs to pull in order to create the answer.
In this commit we extend the Fields API formats for geo data in order to produce vector tiles features directly on the data nodes. That helps the vector tile API to reduce the size of the data that needs to pull in order to create the answer.
GeoFormatterFactory doesn't need to know that the extension points were created specifically for the purpose of generating vector tiles. We can make it support an arbitrary formats by moving all MVT-specific logic into formatter itself. Follow up for elastic#75367
GeoFormatterFactory doesn't need to know that the extension points were created specifically for the purpose of generating vector tiles. We can make it support an arbitrary formats by moving all MVT-specific logic into formatter itself. Follow up for #75367
GeoFormatterFactory doesn't need to know that the extension points were created specifically for the purpose of generating vector tiles. We can make it support an arbitrary formats by moving all MVT-specific logic into formatter itself. Follow up for #75367
* [DOCS] Document `_mvt` API Documents the `_mvt` API endpoint added with #73872. Relates to #75242. * Reword * Rename API * Fix doc.url in JSON spec * Reword * Reword * Add content type to JSON spec * Edits * Fix typo * Reword * Update docs after meeting * Fix typos * Fix `size` default * Updates for #75522 * Fixes * Clean up JSON spec * Fix extent tag * [DOCS] Add `<field>` constraints * Minor clarification * Update for #75697 * Reword * Update for #75621 * Reword default sort * Update for #75367 * Remove unneeded whitespace * Add experimental admon and if flags * [DOCS] Remove ifdefs Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* [DOCS] Document `_mvt` API Documents the `_mvt` API endpoint added with elastic#73872. Relates to elastic#75242. * Reword * Rename API * Fix doc.url in JSON spec * Reword * Reword * Add content type to JSON spec * Edits * Fix typo * Reword * Update docs after meeting * Fix typos * Fix `size` default * Updates for elastic#75522 * Fixes * Clean up JSON spec * Fix extent tag * [DOCS] Add `<field>` constraints * Minor clarification * Update for elastic#75697 * Reword * Update for elastic#75621 * Reword default sort * Update for elastic#75367 * Remove unneeded whitespace * Add experimental admon and if flags * [DOCS] Remove ifdefs Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* [DOCS] Document `_mvt` API Documents the `_mvt` API endpoint added with #73872. Relates to #75242. * Reword * Rename API * Fix doc.url in JSON spec * Reword * Reword * Add content type to JSON spec * Edits * Fix typo * Reword * Update docs after meeting * Fix typos * Fix `size` default * Updates for #75522 * Fixes * Clean up JSON spec * Fix extent tag * [DOCS] Add `<field>` constraints * Minor clarification * Update for #75697 * Reword * Update for #75621 * Reword default sort * Update for #75367 * Remove unneeded whitespace * Add experimental admon and if flags * [DOCS] Remove ifdefs Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
In #73872 we have introduced a new end point that produces vector tiles. One of the actions on this new end point is to transform geo data expressed in WKT or geojson into vector tiles features. It currently works by sung the fields API to get the geo data in geojson format and then transform it to a vector tile feature while serialising the result of the query into a vector tile.
This is currently not very efficient; We need to parse twice the shape, the first time the fields API reads the shape from source and serialise it into geojson. Then we parse it again when building the vector tile and transform it to a vector tile feature. Hence we are parsing twice the same geometry.
In this PR we propose to extend the Fields API formats for geo data in order to produce vector tiles features directly, instead of using geojson as intermediate format. It is easy to show that this bring much better performance overall and in particular is very beneficial because the compression of complex geometries when using mvt features (up to 400+ times smaller).
The main changes in this PR are:
Move The mvt factory for points to server so it can be used by geo_point fields. This factory has no dependencies with external libraries.
Add a dependency between the spatial module and the vector tile module so we can access the mvt factory for geometries for geo_shape fields.
mvt needs some extra information in order to encode the geometry. That is the current tile and the extent (number of pixels) of the tile. The fields API does not allow to add this information easily so we take the following approach. In order to to support extra information, we allow to declare mvt formats with the following formats: mvt(z/x/y@extent) or mvt(z/x/y) (in the last one, extent is considered the default one: 4096). The logic to parse the new format is in GeoFormatterFactory.
Some results of this PR.
Using a data set that contains few big polygons with thousand of points each, we have run the following query for geojson and mvt(0/0/0) formats.
When running with geojson, the resulting json output has a size of 438.5Mb. In addition I had to increase the heap size of the one node cluster to 4GB in order not to OOM.
When running with mvt(0/0/0), the resulting json output has a size of 1.1MB. It was run with a heap of 1GB and there was no signs of GC pressure.
When running similar queries through the _mvt endpoint, we can observe that with this change the request is twice as fast as the current one (5 min vs 2.5 min). In addition it takes much less memory to complete.
relates to #74476