Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary sources of overhead in uniffi #244

Open
thomcc opened this issue Aug 14, 2020 · 5 comments
Open

Unnecessary sources of overhead in uniffi #244

thomcc opened this issue Aug 14, 2020 · 5 comments

Comments

@thomcc
Copy link

thomcc commented Aug 14, 2020

(Uh, this was way more than I intended to write about this 😓... I think my thoughts on the locking situation probably were worth writing down, since in general it helps explain what I believe the problem space to be, what lead me to the HandleMap approach, why locking on the other langs is tricky, etc. This was probably important to get down somewhere, so I'm glad I did)


I've shared this crate with a few people, but in general they respond negatively when they realize how much overhead uniffi introduces. (Filing this because a friend of mine finally got back to me about uniffi today, but I've heard it 3 or so times)

There are two main causes of overhead:

  1. Serialization across the boundary.
  2. Using ConcurrentHandleMap, when ConcurrentHandleMap requires a locking a std::sync::RwLock, a std::sync::Mutex, and performing a number of checks on the handle for validity.

Serialization

The 1st could be avoided for e.g. #[repr(C)] structs and enums with defined repr (C, iN, C+iN. note that this well defined for way more cases than you'd expect, thanks to the needs of cbindgen and webrender).

Passing and returning Vec<T> for ffi-safe T or String can be done by converting to a repr-c struct containing (pointer, length, capacity).

Hell, right now we use nul-terminated C-strings (I think), and really there's no real reason to do that over e.g. #[repr(C)] struct Str(*const c_char, usize);.

Locking

Various options.

  1. Supporting the use non-std::sync locking (probably parking_lot) in ffi_support::ConcurrentHandleMap. This is essentially trivial.

  2. A ffi_support::ConcurrentHandleMap which is really a RwLock<HandleMap<RwLock>> or RwLock<HandleMap<AtomicRefcell>> instead of a Mutex. Note that RwLock has higher over head than Mutex, so it would really only be good if we found a way to opt-out of the locking that happens on the non-rust side of the FFI. OTOH, AtomicRefcell is really only viable because we do all the locking on the other side.

  3. There exist algorithms for writing a lock-free HandleMap-alikes*. This is not a good choice for uniffi, as we're using handles because they're more "obviously correct" than pointers, and lock-free data structures are almost never "obviously correct".

  4. Finally, and what I think is a better safety vs complexity trade-off: just use pointers. (Note that you need to enforce T: Send for any type you do this with, and will be relying on locking happening in the caller. More on that in a sec). Continued below.

Using Pointers

This is what we used to do, but stopped because it was to hard to be sure about correctness, e.g. double-free and such, especially in the face of manually written bindings. Sanitizers help here (issue forthcoming), but it's also just way less of an issue if you can be sure mistakes don't happen becuase you're generating the code.

For anything super tricky, like shared ownership (e.g. cases where you'd have a handle owned by multiple types) https://github.com/Manishearth/triomphe/ is stellar, and contains ffi-safe Arc utilities extracted from code used in servo and firefox. That said I don't think we've ever needed or wanted this, it's better to just have the owner of the handle itself be shared.

Additionally, for T: Send + Sync, you still can never give a &mut T out, but you also don't need to. These types are thread safe, and don't need any locking. We even use this in a-s for the interrupt handles. This capability seems important, even if you don't care about the overhead.

Note that you do need to ensure that these pointers are only freed once though. Atomic types can help there though, and triomphe can for the hard cases. See how the destruciton of the interrupt handle was done.

Ensuring non-Rust locks are sufficient

So, locking. Since you'd be relying on the locks on the non-rust side of the FFI. You need be sure that either you lock is sufficiently powerful, or you prevent it being used in a way that would cause an issue. This can be tricky.

Lets look at Kotlin/Java here, since it's a good example and is illuminating:

@Synchronized in java/kotlin is not equivalent to Mutex, but is a recursive/reentrant mutex. As far as I can tell, there are no non-reentrant locks in java's stdlib, probably because if you don't care about "unique access" as a core part of your safety requirements... a reentrant mutex is just a mutex which deadlocks less.

Anyway, Rust cares, and so there's a big difference between a recursive mutex and a non-recursive one: While &Mutex<T> => &mut T is sound, a &RecursiveMutex<T> can only soundly give out a &T. This might seem like it's pointless, but RecursiveMutex<T> still upgrades T: Send to RecursiveMutex<T>: Send + Sync. You could give a Arc<RecursiveMutex<RefCell<T>>> to other threads, although in Rust doing so would seem very confused.

Anyway, While @Synchronized seems non-viable, java has a recursive mutex in it's concurrent.utils.locks package with enough functionality to easily build what we need: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantLock.html. This offers an isHeldByCurrentThread, which we just need to check before locking. That is, we'd do so if (lock.isHeldByCurrentThread()) { throw SomeException(...); } lock.lock() for all locking, and then we can soundly hand out &mut T` to Rust.

That said, we could build this on top of @Synchronized too by testing a flag after we acquired the lock. (Or, equivalently, reimplementing only the writer tracking of RefCell<T>)

For other targets... Swift has NSLock which is enough, and it can use pthread_mutex_t as well. No clue about Python but I always heard it didn't even have threads? No idea. If that's truly the case and you can't get a system mutex or anything, then you can do it with a flag or a refcell-alike.


Footnote: lock-free handlemaps

* Okay, so, as I mentioned it's a bad fit for uniffi, but I think lock-free concurrent handlemaps are a really cool data structure so I can't resist giving some links:

Tokio has one in here https://docs.rs/tokio/0.2.22/src/tokio/util/slab/mod.rs.html, but they very recently switched away from it for an implementation that's lower performance but can return memory back to the OS rather than always keeping it at the high-water-mark. They haven't cut a release with the new version, so maybe they're still going to modify it to make it lock free? No clue. Maybe it's not worth it.

Tokio's slab itself is a version of the algorithm described in https://www.microsoft.com/en-us/research/uploads/prod/2019/06/mimalloc-tr-v1.pdf modified.

At my last job, we had one of these in our game engine based on the code in here: https://cbloomrants.blogspot.com/2013/05/05-08-13-lock-free-weak-reference-table.html (note that it looks like it's talking about a different thing but it's actually just a slightly more powerful version of our HandleMap/Handles, in that it's also reference counted — a lot of the implementation is in the source zip file). That said, as-is the algorithm requires an AtomicU128 on 64-bit platforms which rust doesn't expose even where it's available, like x86_64, (although there's a trick to work around this that works on most 64 bit architectures). Also, it looks like c++ game code, which some people find hard to read.

Anyway, yeah, I very strongly doubt that uniffi is interested in being the owner of a tricky concurrent data structure, no matter how interesting it may be.

┆Issue is synchronized with this Jira Task
┆Issue Number: UNIFFI-16

@rfk
Copy link
Collaborator

rfk commented Aug 18, 2020

Thanks for writing this up! Partly inspired by this issue, over in #248 I've tried to write up some ADRs to explain the current set of tradeoffs that have lead to these sources of overhead, and highlight places where we might revisit those choices over time.

@rfk
Copy link
Collaborator

rfk commented Dec 20, 2020

Hell, right now we use nul-terminated C-strings (I think), and really there's no real reason to do that over
e.g. #[repr(C)] struct Str(*const c_char, usize);.

FWIW, we have now switched away from nul-terminated C strings and are doing something similar to the above ptr+len-in-a-struct approach. There's still a lot more copying than there needs to be, but it's a step in the right direction.

@rfk
Copy link
Collaborator

rfk commented Mar 18, 2021

Additionally, for T: Send + Sync, you still can never give a &mut T out, but you also don't need to. These types are thread safe,
and don't need any locking. We even use this in a-s for the interrupt handles. This capability seems important, even if you
don't care about the overhead.

Oh hey, we did this! Or at least something vaguely-similarly-shaped anyway, ref ADR-0003.

Sorry, I just quite enjoy watching us slowly chip away at all the good ideas in this thread :-)

@rfk
Copy link
Collaborator

rfk commented Mar 18, 2021

Ensuring non-Rust locks are sufficient

FWIW, my current thinking here is that we should go in exactly the opposite direction here, and insist that anything passed over the FFI is Sync + Send, and hence that any locking is managed internally in the Rust code. We might be able to provide some macros to make this easier in simple cases, but we've been bitten a bit recently by the hidden locking that happens inside the UniFFI bindings, so moving it somewhere where it's more clearly visible to developers seems like a win on balance.

@rfk
Copy link
Collaborator

rfk commented May 17, 2021

Finally, and what I think is a better safety vs complexity trade-off: just use pointers. (Note that you need to enforce T: Send
for any type you do this with, and will be relying on locking happening in the caller. More on that in a sec).

Cross-referencing for anyone following this thread - we are actively working on a plan to replace our current use of "hamdle maps" with raw pointers. The initial plan is to wrap object instances in an Arc and use Arc::into_raw/Arc::from_raw for transiting the FFI boundary. We opted to start with the builtin Arc rather an external crate because there are some places where the Arc might end up in user-visible function signatures, but we'll see how it goes in practice.

Ref #430 for evolving details of the plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants