Unify group, bin, and reduce. #272

mbostock · 2021-03-25T02:30:59Z

TODO

still missing: - normalize - automatic scale label (Frequency ↑)

Fil · 2021-03-25T08:06:33Z

I've fixed (sometimes half-fixed) a few tests; here's what I've noticed:

the automatic label “Frequency ↑” has disappeared—should we add it to the axis or can we keep it in the transform? (mobyDickLetterFrequency, mobyDickFaceted, wordLengthMobyDick);
penguinSpeciesGroup and wordLengthMobyDick need normalize: true to work;
penguinSpeciesIsland needs to specify z: "island" (not automatically taken from fill: "island") for grouping to work properly
defaults would give the same as now? e.g. groupX => {y: "count"} if y is not specified, and {y: "sum"} if it is? (mobyDickFaceted, mobyDickLetterPosition, penguinSpeciesGroup, penguinSpeciesIsland, seattleTemperatureCell, wordLengthMobyDick)

mbostock · 2021-03-25T14:52:53Z

Thanks for the help.

I think we could add some implicit labels from the reducer but in some cases we may not want it because it’s better to propagate the reduced dimension’s label, e.g., if the y-position is computed as the grouped sum of units, then we probably want “↑ units” as the implicit label. If the source dimension does not have a label then I think it could make sense to fall back to the reducer name, say “↑ Frequency” (or “↑ Count”?) or “↑ Frequency (%)”.
Yep, plan to implement normalization, probably by extending the reducer interface somehow so that it understands scopes (facet or z).
Oops, that’s an oversight I re-introduced in a81ac42. Need to bring back the firstof to compute the group dimension as the first of {z, fill, stroke}.
~~Yep, plan to add an arguments.length === 1 check to allow default outputs like you said.~~

mbostock · 2021-03-25T16:52:54Z

Yep, plan to add an arguments.length === 1 check to allow default outputs like you said.

Oops, I don’t think this will work, e.g., Plot.groupX({y: "count"}) is a single argument, but representing outputs rather than options. I think it’s better to keep the outputs as explicitly required, for clarity.

mbostock · 2021-03-25T17:26:45Z

I think we also want to change the design of the bin transform to support reducers and be consistent with the group transform.

mbostock · 2021-03-25T23:28:26Z

Pretty close now, but I don’t like the way that normalize is implemented. It feels like it should be tied to the reducer, not a separate option, so that some reducers can be normalized while others are not. I’m not sure yet how to express this.

mbostock · 2021-03-26T23:49:46Z

This should be ready for review now! 👏

I think there’s still room for optimization with the bin transform: we compute the bins across the entire data, and then use them to subset the groups, but I suspect there’s a better data structure we could be using to make this faster. (Maybe sorted indexes?) I don’t think this optimization is necessary to land this PR; I’m just making a note for future work.

I also want to land #271 but I think we can do that separately. I will work on that after landing this. (Or you can work on it and I can help!)

mbostock · 2021-03-27T03:35:50Z

I added a percent: true option to scales that applies a transform: x => x * 100 and adds an “(%)” for implicit axis labels.

Fil

Fantastic work!

I suggest a few changes (but I can send them as a PR if you prefer?):

rename context to basis
test defined for selectFirst and selectLast

Fil · 2021-03-27T10:10:52Z

src/transforms/group.js

+    const value = maybeInput(name, inputs);
+    const reducer = maybeReduce(reduce, value);
+    const [output, setOutput] = lazyChannel(labelof(value, reducer.label));
+    let V, O, context;


Suggested change

let V, O, context;

let V, O, basis;

rename for consistency with https://github.com/observablehq/plot/pull/272/files#diff-a376dffc7b64d2f0f302cb41caadd7b4d800823b63f90a2a5b16927dfea785daR211 ?

I intentionally gave this a more generic name than basis because it’s up to the reducer to decide what it means. In all the current implementations, it’s a basis for normalization, but it could be anything. It’s an internal-only name though so it shouldn’t matter too much what the name is.

Fil · 2021-03-27T10:11:30Z

src/transforms/group.js

+        V = valueof(data, value);
+        O = setOutput([]);
+        if (reducer.scope === "data") {
+          context = reducer.reduce(range(data), V);


Suggested change

context = reducer.reduce(range(data), V);

basis = reducer.reduce(range(data), V);

Fil · 2021-03-27T10:11:42Z

src/transforms/group.js

+      },
+      scope(scope, I) {
+        if (reducer.scope === scope) {
+          context = reducer.reduce(I, V);


Suggested change

context = reducer.reduce(I, V);

basis = reducer.reduce(I, V);

Fil · 2021-03-27T10:11:50Z

src/transforms/group.js

+        }
+      },
+      reduce(I) {
+        O.push(reducer.reduce(I, V, context));


Suggested change

O.push(reducer.reduce(I, V, context));

O.push(reducer.reduce(I, V, basis));

Fil · 2021-03-27T10:23:56Z

src/transforms/group.js

+
+const reduceFirst = {
+  reduce(I, X) {
+    return X[I[0]];


Suggested change

return X[I[0]];

return X[I.find(i => defined(X[i]))];

There is a similar issue in selectFirst/selectMinX, which can return a data point with null x.

Yeah, there’s a TODO there:

plot/src/transforms/select.js

Line 28 in a217e1d

// TODO If the value (for some required channel) is undefined, scan forward?

The challenge with the selectFirst and selectLast transform is that there isn’t an associated channel (or channels), so it’s unclear which channel you should be testing for defined-ness. We can test for defined-ness here but that would be inconsistent with the automatic behavior for z, fill, and stroke, and also inconsistent with selectFirst. So I think we should punt on this and solve it separately.

Fil · 2021-03-27T10:24:52Z

src/transforms/group.js

+
+const reduceLast = {
+  reduce(I, X) {
+    return X[I[I.length - 1]];


Suggested change

return X[I[I.length - 1]];

return X[I.slice().reverse().find(i => defined(X[i]))];

In addition the issues mentioned above, this could be made faster. 🙂

…llowing NaN, null and undefined as (ordinal) classes groups respect the domain option fixes #52 fixes #45 fixes #255 supersedes #272

Fil · 2021-03-27T17:06:50Z

I have a doubt about the "proportion-group" reduction, which doesn't correspond to my intuition. I've got a complex example here https://next.observablehq.com/d/61ca1967e419b882#HELP_NEEDED

EDIT: Here's a simpler example. https://observablehq.com/d/9727c8201871b9cb

mbostock · 2021-03-27T22:05:34Z

Z is category in that example (because of the fill channel).

mbostock · 2021-03-28T00:25:25Z

Re. https://observablehq.com/d/9727c8201871b9cb, there’s no way to get what you want (other than faceting). The proportion-group scope only applies to the {z, fill, stroke} dimension (here fill):

plot/src/transforms/group.js

Lines 83 to 84 in f08e17f

    
           for (const [, I] of maybeGroup(facet, G)) { 
        
             for (const o of outputs) o.scope("group", I);

The proportion you want is relative to the y dimension.

plot/src/transforms/group.js

Line 85 in f08e17f

for (const [y, gg] of maybeGroup(I, Y)) {

It so happens that we group y and then x, so it would be possible to support “proportion-y”, but we’d have to invert the order of the loops to support “proportion-x”. Which… would be possible with some variable name dancing (remapping x to y when invoking groupn), but would be a little tricky to get right. This is part of the reason I was thinking that we should eliminate the secondary group dimension in favor of faceting (but that would also mean that you can’t have additional grouping within the facet).

mbostock · 2021-03-28T00:29:19Z

I guess we could get rid of “proportion-group”, and just require people to use faceting and “proportion-facet”? I guess another option would be to rename to to “proportion-{channel}” so that it’s more explicit and predictable, but then we’ll have to do some fancy dynamic ordering of the loops to ensure that the proportion channel is the outermost.

mbostock · 2021-03-28T00:31:01Z

Also, proportion-group has a different meaning for the bin transform, which makes it extra confusing. proportion-group groups by {z, fill, stroke} + y for binX, whereas for groupX it only groups by {z, fill, stroke}.

plot/src/transforms/bin.js

Lines 119 to 120 in f08e17f

    
           for (const [k, g] of maybeGroup(I, K)) { 
        
             for (const o of outputs) o.scope("group", g);

Fil · 2021-03-28T09:02:02Z

Yes I was reaching the same conclusion as your comment. For proportion-group the relevant group should be y for groupY, x for groupX, and z for groupZ, or we should be explicit about the group (proportion-x, proportion-y…). When/if we want to do that we'll have to loop in a different order for each case. Thank you for the clarification!

mbostock and others added 12 commits March 24, 2021 17:47

checkpoint new group

8ae6d9b

group boxplot!

82f9457

check for value-less reduce

4dad4a0

update example

73fcb93

require reduce

e578c54

don’t compute z twice

a81ac42

reorder

75e0b87

missing imports

c2f3af6

bin groups

b80380e

fix reducer test

f0dff40

half-fix plots that need default {x: count} or {fill: count}

1978ee7

still missing: - normalize - automatic scale label (Frequency ↑)

fix plots (reduce => group)

99de814

mbostock added 2 commits March 25, 2021 10:00

firstof {z, fill, stroke}

43fbf3a

reduce proportion

73fcb56

mbostock added 11 commits March 25, 2021 10:30

maybeGroup

ffd3b47

reduce percent

7203487

fix value-less reducers

6cf3fc4

remove unused import

4bcfb86

update tests

16ed1e0

more tests

b66c361

smarter facet axis position

d7dc200

percent-z, percent-facet

c12160f

generalize normalize

cfd94fa

add test output

e66e141

normalize

6353dd2

mbostock changed the title ~~Unify group and reduce.~~ Unify group, bin, and reduce. Mar 26, 2021

apply maybeReduce to data channel

779fa07

mbostock force-pushed the mbostock/group-reduce branch from 159ceb8 to 779fa07 Compare March 27, 2021 02:41

mbostock added 6 commits March 26, 2021 19:46

remove noop

aa785ba

shorten

4bcabf0

binned replaces input

1f70b72

don’t subgroup if output

0e144f8

restore test

7e52ab6

scale percent

f08e17f

Fil approved these changes Mar 27, 2021

View reviewed changes

Fil added a commit that referenced this pull request Mar 27, 2021

compute the scaled coordinates of bars (and cell) before filtering, a…

d4733f3

…llowing NaN, null and undefined as (ordinal) classes groups respect the domain option fixes #52 fixes #45 fixes #255 supersedes #272

Fil mentioned this pull request Mar 27, 2021

Fil/group reduce group domain #275

Closed

mbostock added 2 commits March 27, 2021 17:38

rm proportion-group

b3fc587

update snapshot

5251f50

mbostock merged commit b5b9471 into main Mar 28, 2021

mbostock deleted the mbostock/group-reduce branch March 28, 2021 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify group, bin, and reduce. #272

Unify group, bin, and reduce. #272

mbostock commented Mar 25, 2021 •

edited

Loading

Fil commented Mar 25, 2021

mbostock commented Mar 25, 2021 •

edited

Loading

mbostock commented Mar 25, 2021

mbostock commented Mar 25, 2021

mbostock commented Mar 25, 2021

mbostock commented Mar 26, 2021 •

edited

Loading

mbostock commented Mar 27, 2021

Fil left a comment

Fil Mar 27, 2021

mbostock Mar 27, 2021

Fil Mar 27, 2021

Fil Mar 27, 2021

Fil Mar 27, 2021

Fil Mar 27, 2021

Fil Mar 27, 2021

mbostock Mar 27, 2021

Fil Mar 27, 2021

mbostock Mar 27, 2021

Fil commented Mar 27, 2021 •

edited

Loading

mbostock commented Mar 27, 2021

mbostock commented Mar 28, 2021 •

edited

Loading

mbostock commented Mar 28, 2021

mbostock commented Mar 28, 2021

Fil commented Mar 28, 2021

	context = reducer.reduce(range(data), V);
	basis = reducer.reduce(range(data), V);

	context = reducer.reduce(I, V);
	basis = reducer.reduce(I, V);

	O.push(reducer.reduce(I, V, context));
	O.push(reducer.reduce(I, V, basis));

	return X[I[I.length - 1]];
	return X[I.slice().reverse().find(i => defined(X[i]))];

Unify group, bin, and reduce. #272

Unify group, bin, and reduce. #272

Conversation

mbostock commented Mar 25, 2021 • edited Loading

Fil commented Mar 25, 2021

mbostock commented Mar 25, 2021 • edited Loading

mbostock commented Mar 25, 2021

mbostock commented Mar 25, 2021

mbostock commented Mar 25, 2021

mbostock commented Mar 26, 2021 • edited Loading

mbostock commented Mar 27, 2021

Fil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fil commented Mar 27, 2021 • edited Loading

mbostock commented Mar 27, 2021

mbostock commented Mar 28, 2021 • edited Loading

mbostock commented Mar 28, 2021

mbostock commented Mar 28, 2021

Fil commented Mar 28, 2021

mbostock commented Mar 25, 2021 •

edited

Loading

mbostock commented Mar 25, 2021 •

edited

Loading

mbostock commented Mar 26, 2021 •

edited

Loading

Fil commented Mar 27, 2021 •

edited

Loading

mbostock commented Mar 28, 2021 •

edited

Loading