Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A transform to consolidate ordinal values outside the top n into an “other” category, perhaps in conjunction with the group transform. #144

Open
mbostock opened this issue Feb 24, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@mbostock
Copy link
Member

mbostock commented Feb 24, 2021

e.g., https://next.observablehq.com/d/0e0c0dcb66d6714e

function other(valueof, domain, unknown) {
  if (typeof valueof !== "function") valueof = field(valueof);
  domain = new Set(domain);
  return (d, i, data) => {
    const value = valueof(d, i, data);
    return domain.has(value) ? value : unknown;
  };
}

function field(x) {
  return d => d[x];
}
@mbostock mbostock added the enhancement New feature or request label Feb 24, 2021
@mbostock mbostock added this to the Friends Preview milestone Feb 24, 2021
@mbostock
Copy link
Member Author

I wonder if this should be something you specify as a scale transform rather than a mark transform? Seems handy…

@Fil
Copy link
Contributor

Fil commented Feb 25, 2021

(I don't have access to https://observablehq.com/d/0e0c0dcb66d6714e)

Doing it on the scale would just be like specifying .unknown("Others")? Could be interesting for individual marks (like dot), but we need it as a data transform, I think, for aggregate operations (bars).

@mbostock
Copy link
Member Author

Sorry, that’s an internal dashboard. But I can try to make another example for you that uses the “other” transform.

@Fil
Copy link
Contributor

Fil commented Feb 25, 2021

My own use case for nominal "Others" is detailed in this “modalities” notebook.

@Fil Fil self-assigned this Mar 3, 2021
@mbostock mbostock removed this from the Friends Preview milestone Mar 10, 2021
@Fil
Copy link
Contributor

Fil commented Mar 25, 2021

I've made some progress on this idea; seems to work with facets https://observablehq.com/d/0bca2cad63c75fe1

@Fil
Copy link
Contributor

Fil commented Mar 26, 2021

something you specify as a scale transform rather than a mark transform

I've tried a few things to achieve this, by passing the domain to the scale transform in plot.js#38, but my conclusion is it's a dead end. The scale transform is invoked too late, after the grouping, when the aggregation (count) is already done; so, even if we map all the individual groups to the same place on the screen, they will not be aggregated. For counts, we could maybe recount (sum the sums in the aggregated channel, but which one is it?), and this would not work for other types of aggregation.

@Fil
Copy link
Contributor

Fil commented Mar 26, 2021

This solution works on X, where others+k are an option of the group reducer.

--- a/src/transforms/group.js
+++ b/src/transforms/group.js
@@ -67,7 +67,7 @@ function groupn(
   // The z, fill, and stroke channels (if channels and not constants) are
   // greedily materialized by the transform so that we can reference them for
   // subdividing groups without having to compute them more than once.
-  const {z, fill, stroke, ...options} = inputs;
+  const {z, fill, stroke, others, k = 10, ...options} = inputs;
   const [BZ, setBZ] = maybeLazyChannel(z);
   const [vfill] = maybeColor(fill);
   const [vstroke] = maybeColor(stroke);
@@ -84,6 +84,15 @@ function groupn(
     ...Object.fromEntries(outputs.map(({name, output}) => [name, output])),
     transform: maybeTransform(options, (data, facets) => {
       const X = valueof(data, x);
+      if (others && X) {
+        const domain0 = sort(grouper(X, d => d), ([,{length}]) => -length);
+        if (domain0.length > k + 1) {
+          const domain = new Set(domain0.slice(0, k).map(d => d[0]));
+          for (let i = 0; i < X.length; i++) {
+            if (!domain.has(X[i])) X[i] = others;
+          }
+        }
+      }
       const Y = valueof(data, y);

Capture d’écran 2021-03-26 à 15 59 19

EDIT I don't think we should pursue in this direction, since the modalities function defined in this notebook returns both the channel and a domain that we can use in the scale definition. This is enough for the purpose and in line with #271 (comment) .

@Fil Fil removed their assignment Apr 5, 2021
@Fil
Copy link
Contributor

Fil commented Aug 19, 2021

We now have sort:{ fx: { value: …, limit } } in #442 ; the only thing missing is "others".

@tophtucker
Copy link
Contributor

Some more pairing on this, led by Fil: https://observablehq.com/d/f3aac7d647ef1c9e

image

@Fil
Copy link
Contributor

Fil commented Jul 20, 2023

A more advanced experiment here https://observablehq.com/@observablehq/plot-stacking-others-144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants