Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decimate transforms #1966

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

decimate transforms #1966

wants to merge 3 commits into from

Conversation

Fil
Copy link
Contributor

@Fil Fil commented Jan 2, 2024

A data decimation transform can be used to simplify dense line charts by removing many of the points that don't add visual information to a line path.

The decimation strategy is inspired by M4 [1]: cluster the values by grouping them on the main axis (say, x = date for time series) for each given pixel, and in each cluster retain the points that give the minimum and maximum x and y values.

This implementation goes a bit further, as it does not assume that the points are ordered along x, and we want to support curves (such as catmull-rom) that might need to use more control points than these 4 inside a given cluster. So we retain not only argminX, argmaxX, argminY, and argmaxY —this is M4—, but also the first, last, and for some curves the second and next-to-last points. Also, we keep them in the order they appear in the index.

This extension of M4 brings the number of points per pixel from a maximum of 4 to a maximum of 6 for regular (monotone) curves, and 8 for irregular (quadratic, etc) curves. This seems like a modest price to pay to have a generic transform that we can apply systematically.

The areaY, lineY, and differenceY marks now transparently call decimateX. The areaX, lineX (and differenceX in the future, cf. #1920) marks now transparently call decimateY.

The only supported option is pixelSize, which gives the step of the quantization on x (in pixels), and defaults to 0.5. Setting this option to 0 makes the transform return early, effectively neutralizing it.

I would also recommend to call the decimate transform on the tip mark for very heavy datasets, to make it faster, but it would not be a good idea to do it systematically since the user might be interested in all the intermediate points that are aligned on a same x pixel.

todo:

  • documentation
  • maybe replace the automatic selection of the main channel x (vs x2 or x1) by explicit function names such as decimateX2 etc.?

closes #1707

[1] https://www.vldb.org/pvldb/vol7/p797-jugel.pdf ; see also @jheer’s notebook https://observablehq.com/@uwdata/m4-scalable-time-series-visualization for a nice walk-through and implementation of M4 with Plot.

@Fil Fil requested a review from mbostock January 2, 2024 15:56
Fil added 2 commits January 2, 2024 17:15
…the midpoint of x2 and x1, and might be rendered null if x1 is defined as -x2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

data decimation transform
1 participant