-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow archetypes to clear and/or update existing components #3381
Comments
If this proves to be more complex than anticipated, then we'll push it to 0.10 |
Needs a follow-up where we hide empty components from the streams view |
We need to think through how |
This API should also handle specifying the InstanceKeys for the collection that's being updated. |
Instance keys and num_instances are not a problem anymore (or won't be, soon). |
@teh-cmc we were talking through this with respect to Transforms I realized this approach has the potential to cause some annoying performance issues for chunks. In particular, every unused optional component will still require growing the offset array for each of the optional components by one element (in many cases, just a repeated 0). This is similar to the overhead of growing sparse unions (growing each child array), and strictly worse than the overhead of sparse unions (just growing one child array). This suggests to me we're going to need some kind of chunk-specific optimization. This also has an annoying ripple-effect on datafusion / dataframe UI of showing a bunch of empty / cleared (not nulled) columns all over the place. |
### What * Closes #6909 * First steps on the road to #3381 * Creates a new issue: #7117 Best reviewed ignoring whitespace changes. The serializers for `Transform3D` and `LeafTransform3D` will now always write every component, effectively clearing all previously logged values. `Transform3D` uses `Option<C>` for each component, treating it as a single-element list component list where `None` is empty. If we ever want to support updating some components using the transform (as outlined in #3381) we would need to turn this into a tristate, i.e. something like `NoneOrZeroOrOne<C>`. `LeafTransform3D` uses `Option<Vec<C>>` for its components, but has no interface yet for logging an update. Both `Transform3D` and `LeafTransform3D` has a new `::clear()` constructor in Rust, that replaced both `::new()` and `::default()`. This is for added clarity, and to make the transition easier if we ever add an `::update()` constructor, as per #3381 ![image](https://github.com/user-attachments/assets/588224b8-daec-463d-b238-8263f578ae19) ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using examples from latest `main` build: [rerun.io/viewer](https://rerun.io/viewer/pr/7111?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [rerun.io/viewer](https://rerun.io/viewer/pr/7111?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! * [x] If have noted any breaking changes to the log API in `CHANGELOG.md` and the migration guide - [PR Build Summary](https://build.rerun.io/pr/7111) - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) To run all checks from `main`, comment on the PR with `@rerun-bot full-check`. --------- Co-authored-by: Andreas Reich <andreas@rerun.io>
Some notes on why I think (parts of) this is needed in order to unblock tagged components. Context: partial updates in a tagged environmentWe need an ergonomic way to work with raw components (as opposed to archetypes) that still allows for proper tagging. As of today, as soon as we re-enable tags, all user-level logging code (including our own code that does a bunch of runtime writes for all things E.g.: rr.set_time_sequence("frame", 42)
rr.log("points", rr.Points3D([[0, 0, 0], [1, 1, 1]], colors=[255, 0, 0]))
rr.set_time_sequence("frame", 42)
rr.log("points", [rr.components.ColorBatch([0, 0, 255])])
rr.set_time_sequence("frame", 43)
rr.log("points", [rr.components.ColorBatch([0, 255, 0])])
# Undefined behaviour, there are three columns in the database now:
# * `points@Points3D:Position3D#positions`
# * `points@Points3D:Color#colors`
# * `points@Color`
#
# What colors should the visualizer use for frame #42? `points@Points3D:Color#colors`, `points@Color`, something else entirely?
# What colors should the visualizer use for frame #43? `points@Points3D:Color#colors`, `points@Color`, something else entirely?
# What colors should the visualizer use for frame #100? `points@Points3D:Color#colors`, `points@Color`, something else entirely? The correct thing to do, as of today, is something like this: rr.set_time_sequence("frame", 42)
rr.log("points", rr.Points3D([[0, 0, 0], [1, 1, 1]], colors=[255, 0, 0]))
rr.set_time_sequence("frame", 42)
rr.log("points", [rr.components.ColorBatch([0, 0, 255]).with_descriptor_overrides(
archetype_name = "rerun.archetypes.Points3D",
archetype_field_name = "colors"
)])
rr.set_time_sequence("frame", 43)
rr.log("points", [rr.components.ColorBatch([0, 255, 0]).with_descriptor_overrides(
archetype_name = "rerun.archetypes.Points3D",
archetype_field_name = "colors"
)])
# There are only two columns in the database:
# * `points@Points3D:Position3D#positions`
# * `points@Points3D:Color#colors`
#
# The visualizer knows what to pick. which obviously is a pretty hard thing to sell to end users as-is 🫠. Path forward 1: make it ergonomicThe first possible path forward is very related to what is already proposed in this issue: make these things actually ergonomic! rr.set_time_sequence("frame", 42)
rr.log("points", rr.Points3D([[0, 0, 0], [1, 1, 1]], colors=[255, 0, 0]))
rr.set_time_sequence("frame", 42)
rr.log("points", [rr.Points3D.update(colors=[0, 0, 255])]) # gets properly tagged
rr.set_time_sequence("frame", 43)
rr.log("points", [rr.Points3D.update(colors=[0, 255, 0])]) # gets properly tagged
# There are only two columns in the database:
# * `points@Points3D:Position3D#positions`
# * `points@Points3D:Color#colors`
#
# The visualizer knows what to pick. Upsides:
Downsides:
Also keep in mind that we don't necessarily need all the features of the original proposal, just enough to get tags rolling (whether that is at all possible, I don't know). Path forward 2: new query semanticsThe other path forward is to introduce new query semantics. One set of semantics that comes fairly naturally to mind is "most-specific at timestamp wins" (whether it's actually a good idea in practice, who knows). I.e. with the code from earlier: rr.set_time_sequence("frame", 42)
rr.log("points", rr.Points3D([[0, 0, 0], [1, 1, 1]], colors=[255, 0, 0]))
rr.set_time_sequence("frame", 42)
rr.log("points", [rr.components.ColorBatch([0, 0, 255])])
rr.set_time_sequence("frame", 43)
rr.log("points", [rr.components.ColorBatch([0, 255, 0])])
# There are three columns in the database now:
# * `points@Points3D:Position3D#positions`
# * `points@Points3D:Color#colors`
# * `points@Color`
#
# What colors should the visualizer use for frame #42? `points@Points3D:Color#colors`, because it is the most specific one available at this timestamp.
# What colors should the visualizer use for frame #43? `points@Color`, because it is the most specific one available at this timestamp.
# What colors should the visualizer use for frame #100? `points@Color`, because it is the most specific one available at the closest timestamp. Upsides:
Downsides:
ConclusionUltimately, I do not think this is an either/or situation: we will need both better logging APIs and improved query semantics, eventually. It's mostly a matter of which should we get to first in order to unblock tagged components the fastest? The former seems more approachable to me, especially if we restrain ourselves to only whatever subset is needed for tagging and partial updates (if that's at all possible) -- but I can be convinced otherwise. |
Thanks for the write up! The motivation for The concept of "most-specific at timestamp wins" is already how I was imagining things would work, but I agree it's still nice to give users a way to avoid the ambiguity in the first place. |
My suggestion is to call it |
Backstory
empty()
method on Rust archetype implementations to log empty #3279What should the following code do:
Should the new positions inherit the colors of the previous log call?
That would be quite non-intuitive.
Related to this: how does one update just the colors of a point cloud?
Design
We've all converged on the following API:
Python:
Necessary changes
Options
, even the required onesComponent lists implications
A component list (e.g.
radii
) can have one of the four special values:null
components aren't logged, but empty components are.When querying the store, missing components are treated the same as empty components.
Related issues:
The text was updated successfully, but these errors were encountered: