Add zoomable histograms #40

spfrommer · 2023-12-23T12:29:21Z

Not sure what's the best way to implement. There's a few difficult things to handle:

The binning: would probably have to do a super-fine binning (100s of bins), then aggregate in the vega interface dynamically using the bin transform. Might be slow. Could also do a tensorboard-style exponential binning. This could be a little tricky as for things like grad norm it's not obvious that we want many more bins around zero... Maybe around the median tensor value? As an alternative, could just do the tensorboard histogram view, which isn't zoomable.
On reflection, I think exponential binning of some kind would be necessary. Usually you want to "zoom in" when there's very tightly clustered values around zero with a few outliers, and just linear binning wouldn't cut it.
Could no longer use the string histogram min/max vals, would have to be dynamic. Uglier formatting.
Would have to give up the "correctness" of the binning interpolation when going from exponential to linear scale.

spfrommer · 2023-12-26T19:53:52Z

On further reflection, this is probably the idea way:

Exponential binning in the backend around the median. No outlier rejection. Serialize the bin bounds as well for the frontend. -> Actually should maybe do around zero, since even nonzero things tend to go to zero pretty often?
In the front end, have a slider that controls the outlier rejection rate, and all the plots update dynamically.

Benefits:

Prevent user from "zooming in" on areas with little data

Issues:

Most users are going to do outlier rejection, resulting in ugly vega-formatted floats...
If something changes dramatically over training, the exponential binning is going to really hurt.
Not sure how to "rebin". I'm tempted to not do so at all, just keep adding bins as necessary. Shouldn't be too bad if exponentially scaled.
Not sure if we should linearize the bins in the vega interface...
We could totally solve this with a "smoothed" histogram, using something like: https://stackoverflow.com/questions/75885679/vega-lite-gradient-for-line-chart

spfrommer · 2023-12-27T16:40:33Z

Unfortunately, "smooth" histograms ended up being too low-performance. So I think that any kind of client-side outlier rejection is out of the question.

I think now the only plausible way to do this is to pre-compute a few levels of outlier rejection in the backend, and add a drop-down box to select from these in the frontend. The rejection_outlier_proportion will take a Union of float and list of floats. For each float, a new histogram will be concatenated to the histogram strings, with an added text containing the outlier rejection percentage. The data will first be split at a high level, then a new dataset will parse out all the valid outlier percentages.

The outlier percentages will be rendered at the bottom of the explorer panel, as pretty much a slider with a few discrete nodes. There will have to be a maximum of say four nodes in order for this to work since vega doesn't handle dynamic gui elements very well.

The resulting signal goes back to the data, where the histograms are filtered before further processing.

spfrommer added the enhancement New feature or request label Dec 23, 2023

spfrommer self-assigned this Dec 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zoomable histograms #40

Add zoomable histograms #40

spfrommer commented Dec 23, 2023

spfrommer commented Dec 26, 2023

spfrommer commented Dec 27, 2023 •

edited

Loading

Add zoomable histograms #40

Add zoomable histograms #40

Comments

spfrommer commented Dec 23, 2023

spfrommer commented Dec 26, 2023

spfrommer commented Dec 27, 2023 • edited Loading

spfrommer commented Dec 27, 2023 •

edited

Loading