-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify group, bin, and reduce. #272
Conversation
still missing: - normalize - automatic scale label (Frequency ↑)
I've fixed (sometimes half-fixed) a few tests; here's what I've noticed:
|
Thanks for the help.
|
Oops, I don’t think this will work, e.g., |
I think we also want to change the design of the bin transform to support reducers and be consistent with the group transform. |
Pretty close now, but I don’t like the way that normalize is implemented. It feels like it should be tied to the reducer, not a separate option, so that some reducers can be normalized while others are not. I’m not sure yet how to express this. |
This should be ready for review now! 👏 I think there’s still room for optimization with the bin transform: we compute the bins across the entire data, and then use them to subset the groups, but I suspect there’s a better data structure we could be using to make this faster. (Maybe sorted indexes?) I don’t think this optimization is necessary to land this PR; I’m just making a note for future work. I also want to land #271 but I think we can do that separately. I will work on that after landing this. (Or you can work on it and I can help!) |
159ceb8
to
779fa07
Compare
I added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work!
I suggest a few changes (but I can send them as a PR if you prefer?):
- rename context to basis
- test defined for selectFirst and selectLast
const value = maybeInput(name, inputs); | ||
const reducer = maybeReduce(reduce, value); | ||
const [output, setOutput] = lazyChannel(labelof(value, reducer.label)); | ||
let V, O, context; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let V, O, context; | |
let V, O, basis; |
rename for consistency with https://github.com/observablehq/plot/pull/272/files#diff-a376dffc7b64d2f0f302cb41caadd7b4d800823b63f90a2a5b16927dfea785daR211 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally gave this a more generic name than basis because it’s up to the reducer to decide what it means. In all the current implementations, it’s a basis for normalization, but it could be anything. It’s an internal-only name though so it shouldn’t matter too much what the name is.
V = valueof(data, value); | ||
O = setOutput([]); | ||
if (reducer.scope === "data") { | ||
context = reducer.reduce(range(data), V); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context = reducer.reduce(range(data), V); | |
basis = reducer.reduce(range(data), V); |
}, | ||
scope(scope, I) { | ||
if (reducer.scope === scope) { | ||
context = reducer.reduce(I, V); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context = reducer.reduce(I, V); | |
basis = reducer.reduce(I, V); |
} | ||
}, | ||
reduce(I) { | ||
O.push(reducer.reduce(I, V, context)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O.push(reducer.reduce(I, V, context)); | |
O.push(reducer.reduce(I, V, basis)); |
|
||
const reduceFirst = { | ||
reduce(I, X) { | ||
return X[I[0]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return X[I[0]]; | |
return X[I.find(i => defined(X[i]))]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a similar issue in selectFirst/selectMinX, which can return a data point with null x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there’s a TODO there:
Line 28 in a217e1d
// TODO If the value (for some required channel) is undefined, scan forward? |
The challenge with the selectFirst and selectLast transform is that there isn’t an associated channel (or channels), so it’s unclear which channel you should be testing for defined-ness. We can test for defined-ness here but that would be inconsistent with the automatic behavior for z, fill, and stroke, and also inconsistent with selectFirst. So I think we should punt on this and solve it separately.
|
||
const reduceLast = { | ||
reduce(I, X) { | ||
return X[I[I.length - 1]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return X[I[I.length - 1]]; | |
return X[I.slice().reverse().find(i => defined(X[i]))]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition the issues mentioned above, this could be made faster. 🙂
I have a doubt about the "proportion-group" reduction, which doesn't correspond to my intuition. I've got a complex example here https://next.observablehq.com/d/61ca1967e419b882#HELP_NEEDED EDIT: Here's a simpler example. https://observablehq.com/d/9727c8201871b9cb |
Z is category in that example (because of the fill channel). |
Re. https://observablehq.com/d/9727c8201871b9cb, there’s no way to get what you want (other than faceting). The proportion-group scope only applies to the {z, fill, stroke} dimension (here fill): Lines 83 to 84 in f08e17f
The proportion you want is relative to the y dimension. Line 85 in f08e17f
It so happens that we group y and then x, so it would be possible to support “proportion-y”, but we’d have to invert the order of the loops to support “proportion-x”. Which… would be possible with some variable name dancing (remapping x to y when invoking groupn), but would be a little tricky to get right. This is part of the reason I was thinking that we should eliminate the secondary group dimension in favor of faceting (but that would also mean that you can’t have additional grouping within the facet). |
I guess we could get rid of “proportion-group”, and just require people to use faceting and “proportion-facet”? I guess another option would be to rename to to “proportion-{channel}” so that it’s more explicit and predictable, but then we’ll have to do some fancy dynamic ordering of the loops to ensure that the proportion channel is the outermost. |
Also, proportion-group has a different meaning for the bin transform, which makes it extra confusing. proportion-group groups by {z, fill, stroke} + y for binX, whereas for groupX it only groups by {z, fill, stroke}. Lines 119 to 120 in f08e17f
|
Yes I was reaching the same conclusion as your comment. For proportion-group the relevant group should be y for groupY, x for groupX, and z for groupZ, or we should be explicit about the group (proportion-x, proportion-y…). When/if we want to do that we'll have to loop in a different order for each case. Thank you for the clarification! |
TODO
(sum + normalize)(proportion with an input channel)Weighted percent(normalize: "facet")(proportion-facet)(normalize: "z")(proportion-z)(proportion-group)Percent (normalize to [0, 100])Respect the scale’s domain(will land fix group domain #271)