Aggregate/query by geometry-type with geo_shape fields #49569

thomasneirynck · 2019-11-25T21:16:45Z

for geometries indexed into the geo_shape field, it would be helpful to be able to aggregate on the type of geometry.

Example use cases:

(1) for UX-applications that need to present a different UX based on the type-of geometries stored in the index.

e.g. for styling,
- show an icon-editor for indices that have points in geo_shape
- show a fill/outline-editor for indices that have polygons/multipolygons stored.

(2) count/unique counts are especially relevant, but could be appropriate for all aggregations

(3) Similarly, it would be great to be able to specify filters on the data based on geometry-type
- e.g. only query for points for POI-type data.

This would be similar to the ST_GeometryType function in SQL.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-11-25T23:37:59Z

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

thomasneirynck · 2020-03-25T18:15:03Z

This enhancement would also be useful for vector tiling (elastic/kibana#58519). When a geo_shape field contains only points, we can run geotile_grid and geo_centroid aggs. When the grids are small enough, this combination will give good approximate results of the location of the points, especially when zoomed out.

thomasneirynck · 2020-06-08T16:27:05Z

Also useful to determine if Maps can construct point-2-point layers elastic/kibana#68540

nknize · 2020-06-08T18:03:35Z

/cc @talevy @iverase @jpountz

Let's make sure we capture the "ask". I don't know many customers that have specifically requested this capability (even though it is a standard function in Oracle and PostGIS), so lets make sure we document if this is beneficial for either a. a performance boost, or b. enables "power play" functionality in maps.

Technically we can split this into two issues:

geometry_type aggregation - (license: Gold) the geo doc value records the geometry type so I think we can support this relatively easily in aggregations. We should decide the "class" of aggregation (bucket or metric). I think it makes most sense as a bucket agg with bucket key being the geometry_type. e.g.,

            "buckets": [
                {
                    "key": "POLYGON",
                    "doc_count": 3
                },
                {
                    "key": "LINE",
                    "doc_count": 2
                },
                {
                    "key": "POINT",
                    "doc_count": 100
                }
            ]
        }

This way we could nest other metric aggs and run interesting analysis (e.g., attribute stats by geometry type). Note that we won't be able to support Multi types since those are split into multi value documents with individual geometries (can you confirm @iverase?).

either new geometry_type query or geometry_type parameter on geo_shape query - (license: Gold) This one is more complicated as we "lose" geometry type in the lucene index due to tessellation. We have an open Lucene PR for adding triangle type to the ShapeField encoding that will help by identifying lines, points, or triangles but we need to figure out the best way to expose this as a query (if we think it's worth it). I wonder if it's worth considering a "system" index for Maps that records some of these global attributes (e.g., if a geo_shape index contains points only). As a side note, we had a points_only mapping parameter for geo_shape types that optimized the geo_shape index for points only but this went away in favor of strongly encouraging geo_point and having feature parity between geo_point and geo_shape fields for scenarios where users only expect point geometries.

iverase · 2020-06-09T08:07:40Z

Note that we won't be able to support Multi types since those are split into multi value documents with individual geometries.

That is right, we currently do not store information in the index / doc values about how a shape was defined. Note that the following shapes consisting in two points are equivalent for us:

// as a multi-point
MULTIPOINT(0 0, 1 1)
// as a geometry collection
GEOMETRYCOLLECTION(POINT(0 0), POINT(1 1))
// as an array
[POINT(0 0), POINT(1 1)]

On the other hand, we do have information about the shape topological dimensionality as part of the centroid calculation (dim=0 -> point; dim=1->line; dim=2->Polygon). I think exploding this information can provide most of the functionality required.

geometry_type aggregation

It would be straight forward to use is to provide an aggregation by topological dimensions. I would rename the egg accordingly.

either new geometry_type query or geometry_type parameter on geo_shape query

As above we can provide some filter capabilities wrt the topological dimensionality. For a stand alone query, we would have to implement the query on top of the doc values as BKD index is only efficient if we provide a spatial constraint.

jpountz · 2020-06-09T09:54:53Z

for UX-applications that need to present a different UX based on the type-of geometries stored in the index

I wonder if this is something we should resist doing. Getting this information would be very costly, and couldn't be cached since a polygon could be added to a field that only stored points so far at any time.

thomasneirynck · 2020-06-09T16:29:59Z

and couldn't be cached

thx @jpountz. The Maps-app would not need to cache this information anywhere. It would request it when bootstrapping the UX for a layer.

To give some context for this request:

Clients of Elasticsearch-API have no efficient way of determining the types of the shapes stored in the geo_shape field (points, lines, or polygons) without actually pulling all of them.

This affects general purpose visualization tools like Kibana Maps.

Not knowing up front what geometries there are actually stored in an index, cascades in the UX. It results in a UX that has a "grabbag" look&feel.

Consider:

We are showing all 3 options because we don't know the geometry-types of all the documents in that index (e.g. consider there could be millions of documents. In the screenshot example 600k building footprints, too many to pull out for web-apps).

For end-users, this grab-bag UX is less than optimal. Especially since most users will store their data "thematically". e.g. rivers (lines) in one index, building footprints (polygons) in another, points-of-interest (points) in another. This implicit knowledge can be used in the UX.

e.g. Maps could simplify its UX by having knowledge of the geometry-type:

styling editors (see screenshot)
legends (e.g. show line-icons instead of fill-icons for rivers)

As for (3), this is more hypothetical since Maps does not do this today (although we do want Kibana to handle display of large datasets better elastic/kibana#58519). Not being able to filter on geometry-type, makes it harder to build maps where the display of documents is scale-dependent (ie. based on the zoom-level of the map, data gets filtered/simplified). Point-data should be handled differently than lines and polygons. E.g. building footprints should be filtered-out when zoomed-out (since they are invisible at that scale), but points-of-interest should be retained (because points have no size).

jpountz · 2020-06-11T08:45:34Z

For end-users, this grab-bag UX is less than optimal. Especially since most users will store their data "thematically". e.g. rivers (lines) in one index, building footprints (polygons) in another, points-of-interest (points) in another. This implicit knowledge can be used in the UX.

Could we get half-way there by looking at field caps to know whether the field is mapped as a geo_point or as a geo_shape?

I'd really like to avoid making the UI block waiting for the result of an aggregation to know how it should specialize for the type of geometries that are stored in the index. This is something that would work with small amounts of data but would start giving users a bad experience as they start having non-negligible amounts of data and using our slow features (e.g. schema-on-read, searchable snapshots).

To be clear I'm not against adding this aggregation, which can be useful, I'm opposed to making UI loading depend on the result of this aggregation.

thomasneirynck · 2020-06-11T13:48:20Z

Could we get half-way there by looking at field caps to know whether the field is mapped as a geo_point or as a geo_shape?

Yes, Maps is already doing this right now. The "gap" is that for geo_shape itself, a client cannot determine what exactly is being stored (without actually pulling all the documents).

To be clear I'm not against adding this aggregation, which can be useful, I'm opposed to making UI loading depend on the result of this aggregation.

I don't think Maps would "block" the UI. Rather, knowledge about geometry-types would be used to fine-tune some of the presentation in an async-operation.

The potential that Kibana runs an agg on all the data is a generic issue in Kibana (e.g. date-histograms in Discover). A couple example in the Maps-application where this potentially occurs (absent any filter-context constraints, like time-range etc..)

If users add a cluster layer, and zooms out to the entire world, the geotile_grid-agg runs
Retrieving the data-bounds so users can zoom to the location of their data uses geo_bounds

I do agree that for enormous data-sets, this would result in a poor experience. But then likely Kibana is not the right tool to build a map-visualization on top of that data.

Just in general, it is really helpful for a web-app like Kibana to be able to determine relevant meta-data before actually having to query all the documents.

Also, maybe the ask was worded the wrong way. Rather than asking for "can we add an agg that gives us geometry-types", maybe the ask should be more along the lines of "How would clients get useful meta-data about geometries stored in ES, without actually pulling the entire dataset?" (e.g. the bounds of the data, the geometry-type of the shapes, the size of the shapes, ...).

jpountz · 2020-06-11T15:08:09Z

I do agree that for enormous data-sets, this would result in a poor experience. But then likely Kibana is not the right tool to build a map-visualization on top of that data.

I was seeing scale as a competitive advantage, so I would be disappointed if we dropped the objective of making Maps usable with large amounts of data.

Yes, Maps is already doing this right now. The "gap" is that for geo_shape itself, a client cannot determine what exactly is being stored (without actually pulling all the documents).

So maybe we should recommend more strongly to use geo_point for point-only fields to get a better experience in Maps?

A middle ground that would be better than aggregating on the geometry type would be to enhance geo_shape to index the geometry type as a sub keyword field automatically, introduce a new query that allows filtering geo_points and geo_shapes by geometry type, and finally make Maps fire one filter per geometry type with a terminate_after equal to 1 to check whether there is any point, line and polygon in the index without needing to scan all documents.

thomasneirynck · 2020-06-11T15:38:36Z

I would be disappointed if we dropped the objective of making Maps usable with large amounts of data.

Me too. A lot of the focus is on Maps is in working with ES-data at any scale. Blended layers (merged), aggs on geo_shape (merged), vector-tiling (future) are all efforts to display ES geo-data on a map at any scale (in two senses: whether there's few or many documents but also whether user is zoomed-out or zoomed-in). Every once in a while, feature request will trickle down to the ES-level to help Kibana achieve that ;)

So maybe we should recommend more strongly to use geo_point for point-only fields to get a better experience in Maps?

++ can do. I also understand that there is a performance benefit for using geo_point over geo_shape.

new query that allows filtering geo_points and geo_shapes by geometry type

Filtering-by-type would be very useful. Many other geo-tools allow querying geometries by type because the type impacts the styling. So it would be really useful for end-users to be able to structure their layers based on type.

thomasneirynck · 2021-02-25T16:19:37Z

This is the corresponding issue on the Kibana-side, which is blocked by not being able to determine the geometry-type (or dimensionality) of the shapes. elastic/kibana#92672 (comment)

thomasneirynck · 2021-03-02T21:42:19Z

So it seems like this already exists in SQL? https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-geo.html#sql-functions-geo-st-geometrytype

talevy · 2021-03-02T21:51:39Z

not exactly, this SQL function operates on the source, it is not an aggregation

thomasneirynck · 2021-03-03T15:09:28Z

Would it work in a GROUP BY statement? cc @imotov

imotov · 2021-03-03T18:36:58Z

@thomasneirynck you are correct, the function exists and it works with shapes. Unfortunately, as @talevy also correctly pointed out it can only extract the shape type from the shape source, which means it is available only in the contexts where source is available, which basically means we cannot use it for filtering (WHERE clause) nor in aggregations (GROUP BY clause).

iverase · 2021-05-27T07:19:54Z

With the introduction of painless support for geo_shape fields on #72886, this can now be achieved by using runtime fields. For example:

GET /example/_search
{
  "size": 0,
  "runtime_mappings": {
    "type": {
      "script": """
         int type = doc['location'].getDimensionalType();
         if (type == 0) {
           emit('POINT');
         } else if (type == 1) {
           emit('LINE');
         } else if (type == 2) {
           emit('POLYGON');
         }
       """,
      "type": "keyword"
    }
  },
  "aggs" : {
    "type" : {
      "terms": {
        "field": "type"
      }
    }
  }
}

would that fulfil the need?

thomasneirynck · 2021-05-27T13:17:45Z

@iverase - yes I think using the runtime field satisfies the use-case. To confirm, this function is available starting 7.14?

iverase · 2021-05-27T13:25:48Z

yes, 7.14.

thomasneirynck changed the title ~~Aggregate/query by geometry-type in geo_shape fields~~ Aggregate/query by geometry-type with geo_shape fields Nov 25, 2019

jtibshirani added :Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement labels Nov 25, 2019

iverase added the team-discuss label Nov 27, 2019

talevy mentioned this issue Dec 12, 2019

add shape-type metadata to geo_shape's doc-value #50104

Merged

rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020

thomasneirynck mentioned this issue Jun 8, 2020

[Maps] Enable point-2-point sources for geo_shape fields elastic/kibana#68540

Closed

thomasneirynck mentioned this issue Jun 25, 2020

[Maps] Add styling and tooltip support to mapbox mvt vector tile sources elastic/kibana#64488

Merged

4 tasks

thomasneirynck mentioned this issue Aug 18, 2020

[Maps] Add mvt format for ES-doc sources elastic/kibana#74319

Closed

7 tasks

thomasneirynck mentioned this issue Sep 2, 2020

[Maps] Show legend preview icon for vector tile scaling elastic/kibana#76549

Closed

thomasneirynck mentioned this issue Oct 7, 2020

[Maps] [Meta] .mvt feature completeness elastic/kibana#79868

Closed

thomasneirynck mentioned this issue Feb 25, 2021

[Maps] Vector tile layer with only line features shows Polygon styling in Layer Style UI elastic/kibana#92672

Closed

iverase removed the team-discuss label May 27, 2021

iverase closed this as completed Sep 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregate/query by geometry-type with geo_shape fields #49569

Aggregate/query by geometry-type with geo_shape fields #49569

thomasneirynck commented Nov 25, 2019 •

edited

Loading

elasticmachine commented Nov 25, 2019

thomasneirynck commented Mar 25, 2020

thomasneirynck commented Jun 8, 2020

nknize commented Jun 8, 2020 •

edited

Loading

iverase commented Jun 9, 2020

jpountz commented Jun 9, 2020

thomasneirynck commented Jun 9, 2020 •

edited

Loading

jpountz commented Jun 11, 2020

thomasneirynck commented Jun 11, 2020 •

edited

Loading

jpountz commented Jun 11, 2020

thomasneirynck commented Jun 11, 2020

thomasneirynck commented Feb 25, 2021

thomasneirynck commented Mar 2, 2021

talevy commented Mar 2, 2021

thomasneirynck commented Mar 3, 2021

imotov commented Mar 3, 2021

iverase commented May 27, 2021

thomasneirynck commented May 27, 2021

iverase commented May 27, 2021

Aggregate/query by geometry-type with geo_shape fields #49569

Aggregate/query by geometry-type with geo_shape fields #49569

Comments

thomasneirynck commented Nov 25, 2019 • edited Loading

elasticmachine commented Nov 25, 2019

thomasneirynck commented Mar 25, 2020

thomasneirynck commented Jun 8, 2020

nknize commented Jun 8, 2020 • edited Loading

iverase commented Jun 9, 2020

jpountz commented Jun 9, 2020

thomasneirynck commented Jun 9, 2020 • edited Loading

jpountz commented Jun 11, 2020

thomasneirynck commented Jun 11, 2020 • edited Loading

jpountz commented Jun 11, 2020

thomasneirynck commented Jun 11, 2020

thomasneirynck commented Feb 25, 2021

thomasneirynck commented Mar 2, 2021

talevy commented Mar 2, 2021

thomasneirynck commented Mar 3, 2021

imotov commented Mar 3, 2021

iverase commented May 27, 2021

thomasneirynck commented May 27, 2021

iverase commented May 27, 2021

thomasneirynck commented Nov 25, 2019 •

edited

Loading

nknize commented Jun 8, 2020 •

edited

Loading

thomasneirynck commented Jun 9, 2020 •

edited

Loading

thomasneirynck commented Jun 11, 2020 •

edited

Loading