-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C/C++ FFI for BakedDataProvider #2743
Comments
CC @Manishearth @robertbastian. We were discussing this. |
Yeah so there are two ways to do this:
The second is the preferred option. In general users are likely to want to depend on a shim crate like icu_capi_staticlib instead of directly on icu_capi, and you can put stuff in manually there. In the long run, we would like datagen to autogenerate this code and the headers for it1, but it doesn't right now Footnotes
|
A few sub-issues:
Requires knowledge of how our FFI works. |
@hsivonen had some more ideas here, such as a nullptr data provider indicating that data should instead be loaded from a BakedDataProvider that never itself gets exposed over FFI. |
I think the key observations are:
|
I think either myself or @robertbastian should take this issue. |
Without diving into this, I don't think a null pointer is the right API design in 2022. I would like to explore |
So this seems pretty invasive to me. I think a very important thing to consider is how we can do this in a way that makes sense generally from Diplomat's POV. Perhaps via a Note that from Diplomat's POV it would still need to be an instance of ICU4XDataProvider, we can't get rid of that indirection that easily. I guess it could be You'd also still need to support a way to inject the baked data into In general I'm wary of jumping towards a solution in Diplomat for stuff like this, because it often feels kinda niche and while I definitely want to add Diplomat features I want them to be generally useful so that we can design them right. There's another solution which I find to be cleaner: What if we didn't change Diplomat at all, but icu_capi's ICU4XDataProvider had a fourth Baked option, that when enabled requires baked data to exist in some folder (which we need to do anyway!)? Then users can disable any/buffer, and just set up the baked variant. This still involves the baked data provider crossing FFI, however it doesn't matter since if all other features are disabled, the impl of This can be done purely in CFG magic, and we'd add a configurable ¹ A thing I do want to do is make it possible for Diplomat to also respect CFGs and not generate headers in these cases. |
What do you mean by this? |
Oh, also, a thing that does not solve the monomorphization problem, but does solve the problem of using baked providers, is cross-crate Diplomat support, which I also want to get. |
Technically we could implement the injection thing by doing this, btw: fn try_new(#[cfg(not(baked))] provider: &..., options: ... ) -> ... {
#[cfg(baked)] let provider = BakedProvider;
} with the caveat that Diplomat does not yet support CFGs (but I plan to make that happen) |
+1 |
@sffc @robertbastian and I decided today to go with the model in #2992 (as opposed to #2947), because it maintains the same model across Rust and FFI. The basic idea is that we have a
There is one other model that we explored a bit but decided not to go for at the moment. It will be possible to do in the future and we can do it when someone needs it. The main benefit it has is that it gives FFI the flexibility of having multiple subsets of baked data: with the However, a model we can also have is one where datagen is capable of generating baked data that additionally:
The final staticlib must depend on all such crates for compilation to work (this is not a problem if you're doing rlib-to-rlib-to-native linking the way Blaze does; you can continue doing that). But this gives FFI the level of flexibility pure Rust code already has when it comes to baked data. Writing this down so that in the future if this need crops up, we have a path we can take. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Due to crates.io size limits, we need multiple smaller globaldata crates, so I think we should start with one data crate per component crate. I put minimal value on ergonomics that primarily impact ICU4X engineers. |
I don't think the size limit is something that should affect our design, it's straightforward to get an exception. I don't want to end up with 15 data crates that all need to be replaced individually if they are vendored. |
It's not just the crates.io limit. Smaller globaldata crates means that people who only want to build individual components of ICU4X don't need to download/vendor all data for all components, which is going to get quite large. It might also have a marginal improvement on build times, since the compiler doesn't need to tokenize all the macros we're not using. (I thought we were already aligned on the idea of multiple smaller crates) |
This comment was marked as outdated.
This comment was marked as outdated.
Name bikeshed!
|
It's also global as in global state I think 😄 |
Note: All modern locales in one crate is 5.1MB for stable keys, 6.3MB for stable + experimental. |
I'm going to close this in favour of #2945 as it's basically discussing the same things, and the solution we're working on there trivially extends to FFI. |
icu_capi
cannot handleBakedDataProvider
directly. Is there a way to handle it without modifyingicu_capi
?Of course, we can handle it if we modify
icu_capi
like the following.The text was updated successfully, but these errors were encountered: