Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'mvt' field type format to geo fields #75367

Merged
merged 15 commits into from
Jul 28, 2021

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Jul 15, 2021

In #73872 we have introduced a new end point that produces vector tiles. One of the actions on this new end point is to transform geo data expressed in WKT or geojson into vector tiles features. It currently works by sung the fields API to get the geo data in geojson format and then transform it to a vector tile feature while serialising the result of the query into a vector tile.

This is currently not very efficient; We need to parse twice the shape, the first time the fields API reads the shape from source and serialise it into geojson. Then we parse it again when building the vector tile and transform it to a vector tile feature. Hence we are parsing twice the same geometry.

In this PR we propose to extend the Fields API formats for geo data in order to produce vector tiles features directly, instead of using geojson as intermediate format. It is easy to show that this bring much better performance overall and in particular is very beneficial because the compression of complex geometries when using mvt features (up to 400+ times smaller).

The main changes in this PR are:

  • Move The mvt factory for points to server so it can be used by geo_point fields. This factory has no dependencies with external libraries.

  • Add a dependency between the spatial module and the vector tile module so we can access the mvt factory for geometries for geo_shape fields.

  • mvt needs some extra information in order to encode the geometry. That is the current tile and the extent (number of pixels) of the tile. The fields API does not allow to add this information easily so we take the following approach. In order to to support extra information, we allow to declare mvt formats with the following formats: mvt(z/x/y@extent) or mvt(z/x/y) (in the last one, extent is considered the default one: 4096). The logic to parse the new format is in GeoFormatterFactory.

Some results of this PR.

Using a data set that contains few big polygons with thousand of points each, we have run the following query for geojson and mvt(0/0/0) formats.

curl -X GET "localhost:9200/example/_search"  -H 'Content-Type: application/json' -d '
{
  "size" : 10000,
  "_source" : false,
  "fields" : [{"field" :"location", "format" : "{format}"}],
  "runtime_mappings" : {
      "width" : {
         "script" : "emit(doc[\"location\"].getBoundingBox().bottomRight().getLon() - doc[\"location\"].getBoundingBox().topLeft().getLon())",
         "type" : "double"
      }
   },
   "sort" : [
     {
       "width" : {"order" : "desc"}
     }
   ]
}'

When running with geojson, the resulting json output has a size of 438.5Mb. In addition I had to increase the heap size of the one node cluster to 4GB in order not to OOM.
When running with mvt(0/0/0), the resulting json output has a size of 1.1MB. It was run with a heap of 1GB and there was no signs of GC pressure.

When running similar queries through the _mvt endpoint, we can observe that with this change the request is twice as fast as the current one (5 min vs 2.5 min). In addition it takes much less memory to complete.

relates to #74476

@iverase iverase added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes :Core/Infra/Plugins Plugin API and infrastructure v8.0.0 v7.15.0 labels Jul 15, 2021
@iverase iverase requested review from colings86 and imotov July 15, 2021 09:07
@elasticmachine elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Core/Infra Meta label for core/infra team labels Jul 15, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the point of view of the general approach of getting access to the Geo formatter I think this PR is good. I have not reviewed the change holistically though, just from that aspect

x-pack/plugin/spatial/build.gradle Outdated Show resolved Hide resolved
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just scanning this PR coming from #74476 which this seems to superseed and was just wondering about the two copyright notices here and if one of them was maybe added by accident. Sorry for the noise if this was done on purpose, I'd be interested in the policy we have around it in that case.

@breskeby breskeby self-requested a review July 19, 2021 12:31
@iverase
Copy link
Contributor Author

iverase commented Jul 27, 2021

In this iteration we changed how the modules depends between them, so repeating the main changes of this PR:

  • Move The mvt factory for points to server so it can be used by geo_point fields. This factory has no dependencies with external libraries.

  • Add a dependency between the spatial module and the vector tile module so we can access the mvt factory for geometries for geo_shape fields. In particular, the vector tiles module extends now the spatial module and using the extensibility plugin, it registers the vector tile engine in the spatial module (we only allow one vector tile engine to be registered). We have changed the TypeParser for geo_shape so it can take the provided vector tile engine.

  • mvt needs some extra information in order to encode the geometry. That is the current tile and the extent (number of pixels) of the tile. The fields API does not allow to add this information easily so we take the following approach. In order to to support extra information, we allow to declare mvt formats with the following formats: mvt(z/x/y@extent) or mvt(z/x/y) (in the last one, extent is considered the default one: 4096). The logic to parse the new format is in GeoFormatterFactory.

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. There are a couple of follow ups that I think we should do to make it less brittle and remove some vector-tile specific abstractions from the formatter.

@@ -84,7 +89,7 @@ protected XPackLicenseState getLicenseState() {
Map<String, Mapper.TypeParser> mappers = new HashMap<>(super.getMappers());
mappers.put(ShapeFieldMapper.CONTENT_TYPE, ShapeFieldMapper.PARSER);
mappers.put(PointFieldMapper.CONTENT_TYPE, PointFieldMapper.PARSER);
mappers.put(GeoShapeWithDocValuesFieldMapper.CONTENT_TYPE, GeoShapeWithDocValuesFieldMapper.PARSER);
mappers.put(GeoShapeWithDocValuesFieldMapper.CONTENT_TYPE, new GeoShapeWithDocValuesFieldMapper.TypeParser(vectorTileExtension));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way elasticsearch bootstrap system is setup the loadExtensions() should be always called before getMappers() is called. The Plugin class and bootstrap system could have been designed better to avoid temporal coupling, but there is a lot of legacy here that we cannot deal with yet. I don't think should propagate this temporal coupling beyond the Plugin classes to the rest of the system. I think we can just add an assertion here that loadExtensions was indeed called before getMappers() is called just to be sure, and then use VectorTileExtension from here on instead of carrying SetOnce into the mappers.

Copy link
Contributor Author

@iverase iverase Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I introduced a boolean variable to make sure the extensions are loaded before calling the mappers. I did have the same feeling that SetOnce should not be in the mappers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to remove the assertion as some test has custom code to load mappers. I think integration test should still caught the case when extensions are not loaded before mappers so we are good.

* Get the vector tile engine. This is called when user ask for the MVT format on the field API.
* We are only expecting one instance of a vector tile engine coming from the vector tile module.
*/
GeoFormatterFactory.VectorTileEngine<Geometry> getVectorTileEngine();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are leaking way too much here. I think the right abstraction would be to just extend on the GeometryFromatter layer and hide the rest inside the tile class, but I think we can address it as another iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here but let's leave it for next iteration

@Override
public void loadExtensions(ExtensionLoader loader) {
// we only expect one vector tile extension that comes from the vector tile module.
loader.loadExtensions(VectorTileExtension.class).forEach(vectorTileExtension::set);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That essentially moves the dependency from the compile time into runtime. It is not explicit, but the whole thing will break if spatial module is present but vector-tile module is not or if we have another alternative implementation for that. We should probably address this as a follow up as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope here is that if the vector-tile module is not present everything should just work. You just end up with an error if you request geometries in mvt format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I stand corrected, it looks like it throws "vector tile format is not supported" exception in this case, which is hardcoded in the spatial module. But I think there is still too much knowledge to vector tile implementation inside spatial, let me try to take a shot at confine more of it into the vector tile module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be awesome. I ma going to push it as it is and we can iterate in a follow up.

@iverase iverase merged commit 368e2e0 into elastic:master Jul 28, 2021
@iverase iverase deleted the mvtWithDependency branch July 28, 2021 07:51
iverase added a commit that referenced this pull request Jul 28, 2021
In this commit we extend the Fields API formats for geo data in order to produce vector tiles features directly
on the data nodes. That helps the vector tile API to reduce the size of the data that needs to pull in order to
create the answer.
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request Jul 30, 2021
In this commit we extend the Fields API formats for geo data in order to produce vector tiles features directly 
on the data nodes. That helps the vector tile API to reduce the size of the data that needs to pull in order to 
create the answer.
imotov added a commit to imotov/elasticsearch that referenced this pull request Aug 3, 2021
GeoFormatterFactory doesn't need to know that the extension points were created
specifically for the purpose of generating vector tiles. We can make it support
an arbitrary formats by moving all MVT-specific logic into formatter itself.

Follow up for elastic#75367
imotov added a commit that referenced this pull request Aug 5, 2021
GeoFormatterFactory doesn't need to know that the extension points were created
specifically for the purpose of generating vector tiles. We can make it support
an arbitrary formats by moving all MVT-specific logic into formatter itself.

Follow up for #75367
imotov added a commit that referenced this pull request Aug 5, 2021
GeoFormatterFactory doesn't need to know that the extension points were created
specifically for the purpose of generating vector tiles. We can make it support
an arbitrary formats by moving all MVT-specific logic into formatter itself.

Follow up for #75367
elasticsearchmachine pushed a commit that referenced this pull request Aug 5, 2021
* [DOCS] Document `_mvt` API

Documents the `_mvt` API endpoint added with #73872.

Relates to #75242.

* Reword

* Rename API

* Fix doc.url in JSON spec

* Reword

* Reword

* Add content type to JSON spec

* Edits

* Fix typo

* Reword

* Update docs after meeting

* Fix typos

* Fix `size` default

* Updates for #75522

* Fixes

* Clean up JSON spec

* Fix extent tag

* [DOCS] Add `<field>` constraints

* Minor clarification

* Update for #75697

* Reword

* Update for #75621

* Reword default sort

* Update for #75367

* Remove unneeded whitespace

* Add experimental admon and if flags

* [DOCS] Remove ifdefs

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Aug 5, 2021
* [DOCS] Document `_mvt` API

Documents the `_mvt` API endpoint added with elastic#73872.

Relates to elastic#75242.

* Reword

* Rename API

* Fix doc.url in JSON spec

* Reword

* Reword

* Add content type to JSON spec

* Edits

* Fix typo

* Reword

* Update docs after meeting

* Fix typos

* Fix `size` default

* Updates for elastic#75522

* Fixes

* Clean up JSON spec

* Fix extent tag

* [DOCS] Add `<field>` constraints

* Minor clarification

* Update for elastic#75697

* Reword

* Update for elastic#75621

* Reword default sort

* Update for elastic#75367

* Remove unneeded whitespace

* Add experimental admon and if flags

* [DOCS] Remove ifdefs

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
elasticsearchmachine added a commit that referenced this pull request Aug 5, 2021
* [DOCS] Document `_mvt` API

Documents the `_mvt` API endpoint added with #73872.

Relates to #75242.

* Reword

* Rename API

* Fix doc.url in JSON spec

* Reword

* Reword

* Add content type to JSON spec

* Edits

* Fix typo

* Reword

* Update docs after meeting

* Fix typos

* Fix `size` default

* Updates for #75522

* Fixes

* Clean up JSON spec

* Fix extent tag

* [DOCS] Add `<field>` constraints

* Minor clarification

* Update for #75697

* Reword

* Update for #75621

* Reword default sort

* Update for #75367

* Remove unneeded whitespace

* Add experimental admon and if flags

* [DOCS] Remove ifdefs

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
@joegallo joegallo removed the :Core/Infra/Plugins Plugin API and infrastructure label Sep 22, 2021
@elasticmachine elasticmachine removed the Team:Core/Infra Meta label for core/infra team label Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.15.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants