You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure what's the best way to implement. There's a few difficult things to handle:
The binning: would probably have to do a super-fine binning (100s of bins), then aggregate in the vega interface dynamically using the bin transform. Might be slow. Could also do a tensorboard-style exponential binning. This could be a little tricky as for things like grad norm it's not obvious that we want many more bins around zero... Maybe around the median tensor value? As an alternative, could just do the tensorboard histogram view, which isn't zoomable.
On reflection, I think exponential binning of some kind would be necessary. Usually you want to "zoom in" when there's very tightly clustered values around zero with a few outliers, and just linear binning wouldn't cut it.
Could no longer use the string histogram min/max vals, would have to be dynamic. Uglier formatting.
Would have to give up the "correctness" of the binning interpolation when going from exponential to linear scale.
The text was updated successfully, but these errors were encountered:
On further reflection, this is probably the idea way:
Exponential binning in the backend around the median. No outlier rejection. Serialize the bin bounds as well for the frontend. -> Actually should maybe do around zero, since even nonzero things tend to go to zero pretty often?
In the front end, have a slider that controls the outlier rejection rate, and all the plots update dynamically.
Benefits:
Prevent user from "zooming in" on areas with little data
Issues:
Most users are going to do outlier rejection, resulting in ugly vega-formatted floats...
If something changes dramatically over training, the exponential binning is going to really hurt.
Not sure how to "rebin". I'm tempted to not do so at all, just keep adding bins as necessary. Shouldn't be too bad if exponentially scaled.
Not sure if we should linearize the bins in the vega interface...
Unfortunately, "smooth" histograms ended up being too low-performance. So I think that any kind of client-side outlier rejection is out of the question.
I think now the only plausible way to do this is to pre-compute a few levels of outlier rejection in the backend, and add a drop-down box to select from these in the frontend. The rejection_outlier_proportion will take a Union of float and list of floats. For each float, a new histogram will be concatenated to the histogram strings, with an added text containing the outlier rejection percentage. The data will first be split at a high level, then a new dataset will parse out all the valid outlier percentages.
The outlier percentages will be rendered at the bottom of the explorer panel, as pretty much a slider with a few discrete nodes. There will have to be a maximum of say four nodes in order for this to work since vega doesn't handle dynamic gui elements very well.
The resulting signal goes back to the data, where the histograms are filtered before further processing.
Not sure what's the best way to implement. There's a few difficult things to handle:
On reflection, I think exponential binning of some kind would be necessary. Usually you want to "zoom in" when there's very tightly clustered values around zero with a few outliers, and just linear binning wouldn't cut it.
The text was updated successfully, but these errors were encountered: