-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary sources of overhead in uniffi #244
Comments
Thanks for writing this up! Partly inspired by this issue, over in #248 I've tried to write up some ADRs to explain the current set of tradeoffs that have lead to these sources of overhead, and highlight places where we might revisit those choices over time. |
FWIW, we have now switched away from nul-terminated C strings and are doing something similar to the above ptr+len-in-a-struct approach. There's still a lot more copying than there needs to be, but it's a step in the right direction. |
Oh hey, we did this! Or at least something vaguely-similarly-shaped anyway, ref ADR-0003. Sorry, I just quite enjoy watching us slowly chip away at all the good ideas in this thread :-) |
FWIW, my current thinking here is that we should go in exactly the opposite direction here, and insist that anything passed over the FFI is |
Cross-referencing for anyone following this thread - we are actively working on a plan to replace our current use of "hamdle maps" with raw pointers. The initial plan is to wrap object instances in an Ref #430 for evolving details of the plan. |
(Uh, this was way more than I intended to write about this 😓... I think my thoughts on the locking situation probably were worth writing down, since in general it helps explain what I believe the problem space to be, what lead me to the HandleMap approach, why locking on the other langs is tricky, etc. This was probably important to get down somewhere, so I'm glad I did)
I've shared this crate with a few people, but in general they respond negatively when they realize how much overhead uniffi introduces. (Filing this because a friend of mine finally got back to me about uniffi today, but I've heard it 3 or so times)
There are two main causes of overhead:
std::sync::RwLock
, astd::sync::Mutex
, and performing a number of checks on the handle for validity.Serialization
The 1st could be avoided for e.g.
#[repr(C)]
structs and enums with definedrepr
(C, iN, C+iN. note that this well defined for way more cases than you'd expect, thanks to the needs of cbindgen and webrender).Passing and returning
Vec<T>
for ffi-safeT
orString
can be done by converting to a repr-c struct containing (pointer, length, capacity).Hell, right now we use nul-terminated C-strings (I think), and really there's no real reason to do that over e.g.
#[repr(C)] struct Str(*const c_char, usize);
.Locking
Various options.
Supporting the use non-
std::sync
locking (probablyparking_lot
) inffi_support::ConcurrentHandleMap
. This is essentially trivial.A
ffi_support::ConcurrentHandleMap
which is really aRwLock<HandleMap<RwLock>>
orRwLock<HandleMap<AtomicRefcell>>
instead of aMutex
. Note thatRwLock
has higher over head than Mutex, so it would really only be good if we found a way to opt-out of the locking that happens on the non-rust side of the FFI. OTOH,AtomicRefcell
is really only viable because we do all the locking on the other side.There exist algorithms for writing a lock-free HandleMap-alikes*. This is not a good choice for uniffi, as we're using handles because they're more "obviously correct" than pointers, and lock-free data structures are almost never "obviously correct".
Finally, and what I think is a better safety vs complexity trade-off: just use pointers. (Note that you need to enforce
T: Send
for any type you do this with, and will be relying on locking happening in the caller. More on that in a sec). Continued below.Using Pointers
This is what we used to do, but stopped because it was to hard to be sure about correctness, e.g. double-free and such, especially in the face of manually written bindings. Sanitizers help here (issue forthcoming), but it's also just way less of an issue if you can be sure mistakes don't happen becuase you're generating the code.
For anything super tricky, like shared ownership (e.g. cases where you'd have a handle owned by multiple types) https://github.com/Manishearth/triomphe/ is stellar, and contains ffi-safe Arc utilities extracted from code used in servo and firefox. That said I don't think we've ever needed or wanted this, it's better to just have the owner of the handle itself be shared.
Additionally, for
T: Send + Sync
, you still can never give a&mut T
out, but you also don't need to. These types are thread safe, and don't need any locking. We even use this in a-s for the interrupt handles. This capability seems important, even if you don't care about the overhead.Note that you do need to ensure that these pointers are only freed once though. Atomic types can help there though, and triomphe can for the hard cases. See how the destruciton of the interrupt handle was done.
Ensuring non-Rust locks are sufficient
So, locking. Since you'd be relying on the locks on the non-rust side of the FFI. You need be sure that either you lock is sufficiently powerful, or you prevent it being used in a way that would cause an issue. This can be tricky.
Lets look at Kotlin/Java here, since it's a good example and is illuminating:
@Synchronized
in java/kotlin is not equivalent toMutex
, but is a recursive/reentrant mutex. As far as I can tell, there are no non-reentrant locks in java's stdlib, probably because if you don't care about "unique access" as a core part of your safety requirements... a reentrant mutex is just a mutex which deadlocks less.Anyway, Rust cares, and so there's a big difference between a recursive mutex and a non-recursive one: While
&Mutex<T>
=>&mut T
is sound, a&RecursiveMutex<T>
can only soundly give out a&T
. This might seem like it's pointless, butRecursiveMutex<T>
still upgradesT: Send
toRecursiveMutex<T>: Send + Sync
. You could give aArc<RecursiveMutex<RefCell<T>>>
to other threads, although in Rust doing so would seem very confused.Anyway, While
@Synchronized
seems non-viable, java has a recursive mutex in it's concurrent.utils.locks package with enough functionality to easily build what we need: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantLock.html. This offers anisHeldByCurrentThread
, which we just need to check before locking. That is, we'd do soif (lock.isHeldByCurrentThread()) { throw SomeException(...); } lock.lock()
for all locking, and then we can soundly hand out &mut T` to Rust.That said, we could build this on top of
@Synchronized
too by testing a flag after we acquired the lock. (Or, equivalently, reimplementing only the writer tracking ofRefCell<T>
)For other targets... Swift has NSLock which is enough, and it can use pthread_mutex_t as well. No clue about Python but I always heard it didn't even have threads? No idea. If that's truly the case and you can't get a system mutex or anything, then you can do it with a flag or a refcell-alike.
Footnote: lock-free handlemaps
* Okay, so, as I mentioned it's a bad fit for uniffi, but I think lock-free concurrent handlemaps are a really cool data structure so I can't resist giving some links:
Tokio has one in here https://docs.rs/tokio/0.2.22/src/tokio/util/slab/mod.rs.html, but they very recently switched away from it for an implementation that's lower performance but can return memory back to the OS rather than always keeping it at the high-water-mark. They haven't cut a release with the new version, so maybe they're still going to modify it to make it lock free? No clue. Maybe it's not worth it.
Tokio's slab itself is a version of the algorithm described in https://www.microsoft.com/en-us/research/uploads/prod/2019/06/mimalloc-tr-v1.pdf modified.
At my last job, we had one of these in our game engine based on the code in here: https://cbloomrants.blogspot.com/2013/05/05-08-13-lock-free-weak-reference-table.html (note that it looks like it's talking about a different thing but it's actually just a slightly more powerful version of our HandleMap/Handles, in that it's also reference counted — a lot of the implementation is in the source zip file). That said, as-is the algorithm requires an AtomicU128 on 64-bit platforms which rust doesn't expose even where it's available, like x86_64, (although there's a trick to work around this that works on most 64 bit architectures). Also, it looks like c++ game code, which some people find hard to read.
Anyway, yeah, I very strongly doubt that uniffi is interested in being the owner of a tricky concurrent data structure, no matter how interesting it may be.
┆Issue is synchronized with this Jira Task
┆Issue Number: UNIFFI-16
The text was updated successfully, but these errors were encountered: