-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce clusterProperties option for aggregated cluster properties #7584
Conversation
Looks awesome, I hope you'll have time to push this feature through :) |
Maybe off topic, but can i change geojson clusterRadius property without creating new geojson source. |
7c55489
to
f1fc585
Compare
f1fc585
to
6032e41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mourner - In the event that one of the map-reduce expressions are invalid, it is no longer possible to report that in style validation. I think it would be useful to move the validation from the geojson worker to the style-spec module and make it part of geojson source validation.
src/source/geojson_worker_source.js
Outdated
|
||
const initialExpressionParsed = createExpression(initialExpression); | ||
const mapExpressionParsed = createExpression(mapExpression); | ||
const reduceExpressionParsed = createExpression([operator, ['accumulated'], ['get', key]]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the map/reduce expressions need to support the feature-state operator as well? Given that these expressions are evaluated just once it probably doesn't make sense to support that operator - it can change through interactivity on the map. An additional isStateConstant
check would be required.
If we want to support feature-state
, it would require re-computing the map/reduce on every change of a feature's state, or at least the clusters affected by a feature. Not sure that there is a need for it right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think we shouldn't support feature-state
, at least initially, because it would add a lot of complexity for a very obscure use case. I'll add the isStateConstant
check in the validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asheemmamoowala @mourner I'm just wondering if you're looking to support feature-state
on clusterProperties
in the near future?
I've got a use case where I need to highlight the marker cluster for a given feature when a reference of it is selected in a list view outside the map canvas.
This works nicely for non-clustered points, but I need a work around for this situation:
map.addSource('foos', {
type: 'geojson',
data,
cluster: true,
clusterMaxZoom: 14,
clusterRadius: 50,
clusterProperties: {
'has_hovered_foo': ['==', ['feature-state', 'hover'], true]
}
})
map.addLayer({
id: 'clusters',
type: 'circle',
source: 'foos',
filter: ['has', 'point_count'],
paint: {
'circle-color': [
'case',
['get', 'has_hovered_foo'],
'#000',
'#FFF'
]
}
})
"clusterProperties": { | ||
"type": "*", | ||
"doc": "An object defining custom properties on the generated clusters if clustering is enabled, aggregating values from clustered points. Has the form `{\"property_name\": [operator, initial_expression, map_expression]}`. `operator` is any expression function that accepts at least 2 operands (e.g. `\"+\"` or `\"max\"`) — it accumulates the property value from clusters/points the cluster contains; `initial_expression` evaluates the initial value of the property before accummulating other points/clusters; `map_expression` produces the value of a single point.\n\nExample: `{\"sum\": [\"+\", 0, [\"get\", \"scalerank\"]]}`." | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be useful to add an expression support block for each of the expressions - initial,map, reduce ?Something similar to what is in the the layer properties.
mapbox-gl-js/src/style-spec/reference/v8.json
Lines 3500 to 3508 in 1208cfc
"expression": { | |
"interpolated": true, | |
"parameters": [ | |
"zoom", | |
"feature", | |
"feature-state" | |
] | |
}, | |
"property-type": "data-driven" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to separate those into separate properties. There are two expressions (map, initial) and an operator that accepts them as parameters. Map expression would be just ["feature"]
, and initial expression would be []
.
Given the non-standard syntax and that support block is dictated more by property aggregation semantics rather than technical limitations (like in case of many properties), perhaps we should leave that to validation, and maybe add a note clarifying this in the doc string.
return properties; | ||
}; | ||
superclusterOptions.map = (pointProperties) => { | ||
feature.properties = pointProperties; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this (and in reduce
below) using a shared feature
and overriding the properties
of it instead of defining a new const feature = { properties: pointProperties }
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map function and expression evaluation is synchronous, meaning there are no side effects from using mutable state here (which is reset at the beginning). At the same time, we avoid constantly creating new feature objects, improving performance and lots of garbage collection passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK that makes sense. I thought it might have something to do with performance optimization.
superclusterOptions.reduce = (accumulated, clusterProperties) => { | ||
feature.properties = clusterProperties; | ||
for (const key of propertyNames) { | ||
globals.accumulated = accumulated[key]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean a reducer could access the accumulated
property if it wanted to, and do something weird like the following?
"total": [
["case",
[">", ["get", "accumulated"], 100],
"too many",
["+", ["get", "accumulated"], ["get", "total"]
],
0,
["get", "total"]
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently not, but I think this could be added (check if operator
is string and if it's not, consider it as a reduce expression). Would add quite a bit of complexity to the docs, but add some flexibility as well. Not sure how to go here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I see - the reduce expression has to be something that fits into [operator, ['accumulated'], ['get', key]]
. So that's why max
and any
work.
👍
Thanks so much for taking this on! I'm looking forward to having native mapbox support for cluster agg! I left a couple comments regarding some of the code, but overall it looks great and I think this is easier to read than what I had come up with in #7004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I'm looking forward to seeing this in release!
@asheemmamoowala moved the validation logic to style-spec and added the feature-state check — ready for another review. @redbmk also added support custom expressions (referencing |
@@ -199,6 +199,7 @@ for (const name in CompoundExpression.definitions) { | |||
} | |||
|
|||
delete types['error']; | |||
delete types['accumulated']; // skip documenting `accumulated` since it is internal use only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still be deleted, given that it is now usable as part of a custom reduce expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, just readded it.
I'm thinking that maybe I made a mistake of requiring an initial value, since for most cases, we could make it optional — see mapbox/supercluster#114. If it's merged, we could simplify the minimal syntax to: "clusterProperties": {
"max": ["max", ["get", "scalerank"]],
"sum": ["+", ["get", "scalerank"]],
"has_island": ["any", ["==", ["get", "featureclass"], "island"]]
} And make the initial value the third optional argument. We could still make that change if we hurry to get this in before the next GL JS beta. |
Just realized that it would mean we can define or omit initial values on a per-property basis, which doesn't fit the implementation in Supercluster (which is all or nothing). So our options are to either leave things as is, or ditch initial values completely (which I'm increasingly leaning towards). |
I'm inclined to leave |
My motivation behind removing it completely is reducing complexity as much as possible. Given technical details of how Supercluster implements map/reduce, we can either always require the |
when i'm comparing it with point_count the my_point_count property seems to different on some cluster
why ["get", "sum"], "accumulated" need more information on this, an analogy with javascript reduce fn would be great, why not Also, how to calculate average from a field lets say salary. |
Sorry to revive this old thread, I have another interesting reduce problem where the reduce-syntax does not work like I expected it to. Basically, all my markers have a custom property which is a string. Let's stay the property is What I am doing right now, is the following:
Basically in pseudo code:
Basically, add all the ethnicities in the cluster into a String, but do not repeat an ethnicity if it is already in the string. I want to use this to generate a fitting icon for the cluster. However, it appears only the first How can I reduce to properties without repeating? |
Closes #2412. An alternative to #7004.
Finally got my hands on exploring this feature, building on an extremely helpful implementation by @redbmk (sorry for not jumping in earlier!). Here I'm proposing a different API, which is very terse and easy to use, although it might look weird at first:
The syntax is
property: [operator, initialExpression, mapExpression]
. The array is itself a valid expression, though using it for cluster properties adds additional limitations.initialExpression
mapExpression
(that can reference feature properties)accumulated = evaluate([operator, accumulated, value])
.To implement this, I had to introduce an internal-use only
["accumulated"]
expression. An alternative would be to expose it in the syntax, making it more verbose but also maybe more self-descriptive:Regarding
Infinity
and-Infinity
, a point raised in #7004 (it producesnull
because JSON format doesn't supportInfinity
), — I think it would make sense to introduce an["infinity"]
expression, similar to how we have["pi"]
and["e"]
.Regarding access to
["zoom"]
in the accumulating expression — I think this wouldn't be useful because of how map/reduce works: imagine that out of 10 points, 5 merge into one cluster on z16, and 5 remaining get merged into that cluster on z15. The accumulated value in such cluster would depend on multiple zoom values (depending on where it acquired new points), which doesn't semantically makes much sense.Opening up the PR to start a discussion, but it still needs some work before a proper review:
TODO
Launch Checklist
post benchmark scoreswe don't benchmark clustering