Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Problems with using geocentroid for multi-valued geo_points #49189

Closed
EmilBode opened this issue Nov 15, 2019 · 4 comments · Fixed by #50038
Closed

[Docs] Problems with using geocentroid for multi-valued geo_points #49189

EmilBode opened this issue Nov 15, 2019 · 4 comments · Fixed by #50038
Assignees
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >docs General docs changes

Comments

@EmilBode
Copy link

EmilBode commented Nov 15, 2019

Describe the feature:
Some elastic behaviour bit me, I think a warning in the documentation would be fair:

When making a geohash, elastic aggregates different documents into buckets, not the individual geo_points
This means that a bucket can contain a document with points both inside and outside this bucket. Now when calculating the gecentroid, all points are considered, also those outside of the bucket boundary.

Consider this example:

PUT temp
{
  "mappings": {
    "properties": {
      "places": {
        "type": "geo_point"
      }
    }
  }
}

PUT temp/_doc/1
{
  "places": [
    [0,0],
    [90,0]
    ]
}

GET temp/_search
{
  "aggs": {
    "2": {
      "geohash_grid": {
        "field": "places",
        "precision": 2
      },
      "aggs": {
        "3": {
          "geo_centroid": {
            "field": "places"
          }
        }
      }
    }
  }
}

The result gives us 2 buckets, both with the same geo_centroid at longitude 45.

The same happens when making a coordinate map in Kibana, although luckily Kibana is smart enough to not allow the point to be drawn outside of the box. However, it's clearly visible that our (0,0) point is drawn eastwards and our (0, 90) points dragged westwards.

A workaround could be to make the geo_points nested documents, and use a nested aggregation, but that doesn't work with Kibana.

The very best solution would obviously be to have the geo_centroid only consider those point that are actually inside the bucket, but I don't think that's feasible.

But for now, I think the documentation about this could be clearer.
My suggestion: We could add a warning on https://www.elastic.co/guide/en/elasticsearch/reference/7.4/search-aggregations-metrics-geocentroid-aggregation.html (and the pages for other version), like this:

Warning: When you have multi-valued geo_point-fields, geo_centroid calculates the centroid of all those fields in selected documents. This means that using a geo_centroid in a geohash-aggregation can cause the centroid to be (far) out of the boundaries of your bucket

Elasticsearch version: 7.3.1

JVM version: 1.8.0_231

OS version: Windows 10

I've also filed an issue at Kibana, elastic/kibana#50799

@cbuescher cbuescher added :Analytics/Geo Indexing, search aggregations of geo points and shapes >docs General docs changes labels Nov 19, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

@cbuescher cbuescher added the help wanted adoptme label Nov 19, 2019
@cbuescher
Copy link
Member

@EmilBode adding some warning to the docs looks like a good first step. Since you already added a suggestion, maybe you are interested in opening a PR for it? The team can then discuss any changes they'd like there. Let us know if you are interested and need directions. Otherwise no problem, thanks for raising the issue anyway.

@EmilBode
Copy link
Author

EmilBode commented Nov 22, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >docs General docs changes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants