Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return iterators rather than vectors when deserializing in Rust backend #2523

Closed
3 tasks
teh-cmc opened this issue Jun 26, 2023 · 7 comments
Closed
3 tasks
Assignees
Labels
codegen/idl 🚀 performance Optimization, memory use, etc 🦀 Rust API Rust logging API

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Jun 26, 2023

TODO:

  • Support for end-to-end iterators (dyn)
  • Integrate fallible iterators
  • Generate fully typed associated iterator types

The deserialization routines for components and datatypes in the Rust backend currently return full-on Vecs, while there is in fact nothing preventing us from returning lazy iterators.

Current:

pub trait Datatype {
    // [...]

    fn from_arrow_opt(data: &dyn ::arrow2::array::Array) -> Vec<Option<Self>>
    where
        Self: Sized;

    // [...]
}

pub trait Component {
    // [...]

    fn from_arrow_opt(data: &dyn ::arrow2::array::Array) -> Vec<Option<Self>>
    where
        Self: Sized;

    // [...]
}

impl trait is still unstable when used in a return position in traits, but it wouldn't be too much work to add an associated type, especially now that we have GATs! e.g.:

pub trait Datatype {
    type Iterator<T> = /* ... */

    // [...]

    fn from_arrow_opt(data: &dyn ::arrow2::array::Array) -> Self::Iterator<Option<Self>>
    where
        Self: Sized;

    // [...]
}

pub trait Component {
    // [...]

    fn from_arrow_opt(data: &dyn ::arrow2::array::Array) -> Self::Iterator<Option<Self>>
    where
        Self: Sized;

    // [...]
}

Of course we could also settle for a boxed trait object (Box<dyn Iterator<Item = Self>>) instead, but why do a half arsed job when it's all codegen'd anyway 🙃

@teh-cmc
Copy link
Member Author

teh-cmc commented Jul 24, 2023

Update: we now know for a fact that Box<dyn Iterator> is not viable.

@jleibs
Copy link
Member

jleibs commented Jul 24, 2023

This will be a blocker for 0.9 but doesn't need to block landing into main once we've cut 0.8.

@teh-cmc
Copy link
Member Author

teh-cmc commented Jul 24, 2023

Also, we need to take care of fallibility, e.g. by using something like https://docs.rs/fallible-iterator/latest/fallible_iterator/index.html

jleibs added a commit that referenced this issue Jul 25, 2023
…-convert (#2769)

### What
Returning `Vec<Loggable>`s creates a sizable performance hit based on
our usage patterns.

This PR introduces a new pair of APIs:
 - `try_from_arrow_iter_item`
 - `convert_item_to_self`
 
This enables a user of `Loggable` to iterate over elements using a
pattern like:
```
C::try_from_arrow_iter_item(arr)?.map(C::convert_item_to_self)
```

Note that even using a Boxed `dyn Iterator` was insufficient to get the
desired performance improvements. A concrete compilation-known type was
necessary for the compiler to work the requisite magic.

One of the major challenges leading to this design was the specification
of the associated types for the `Loggable` trait in conjunction with
arrow2-convert, which iterates over native-arrow types and then converts
them separately such that no type-definition exists for a native
iterator. Once arrow2-convert has been deprecated we may be able to
investigate removing `convert_item_to_self`, although the separation of
responsibilities allowed by the pattern may prove to simplify some of
the code-gen for us as well.

The basic technique appears to resolve the perf issues and even gives us
a moderate speedup in the mono-points bench relative to main.


![image](https://github.com/rerun-io/rerun/assets/3312232/3e186569-79ea-4364-8fbe-d6b53e4c6d8a)


Part of #2523 

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/2769) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/2769)
- [Docs
preview](https://rerun.io/preview/pr%3Ajleibs%2Fiterator_passthrough/docs)
- [Examples
preview](https://rerun.io/preview/pr%3Ajleibs%2Fiterator_passthrough/examples)
@nikolausWest nikolausWest added this to the 0.9 milestone Jul 31, 2023
@emilk
Copy link
Member

emilk commented Aug 8, 2023

Probably related: just py-build --release && ./examples/python/open_photogrammetry_format/main.py is ~100x slower than it should be (10s/frame)

@emilk
Copy link
Member

emilk commented Aug 9, 2023

If this helps as much as we hope, we can then revert the central commit of #2950 (the caching added in 4e6df7e)

@nikolausWest
Copy link
Member

Let's discuss if we even need this at all. Most of the performance problems have already been fixed.

@emilk
Copy link
Member

emilk commented Aug 25, 2023

I think we should do this instead:

@emilk emilk closed this as completed Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen/idl 🚀 performance Optimization, memory use, etc 🦀 Rust API Rust logging API
Projects
None yet
Development

No branches or pull requests

4 participants