-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FFI baked data #2947
FFI baked data #2947
Conversation
4b77e6f
to
0eecb14
Compare
029374e
to
4c77d6f
Compare
ffi/diplomat/build.rs
Outdated
// Empty data generated with | ||
// cargo run -p icu_datagen --features bin,icu_segmenter -- --format mod --use-separate-crates --keys none --out empty_bake | ||
println!( | ||
"cargo:rustc-env=ICU4X_FFI_BAKED_ROOT={}/empty_bake", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: we shouldn't break the ability to build this crate without build scripts working; there are a bunch of build environments that are bad at running build scripts and ideally this should be something completely unnecessary to run 99% of the time.
I guess we can just ask people using strange build systems to just set this env var, but idk.
I was under the impression baked data was going to be behind a feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is why the GN CI is failing. I was hoping we wouldn't need to feature-gate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't immediately see why we need a build.rs to do this computation? Can we use std::option_env instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include!
requires a string literal argument, which core::env!("FOO")
is, and core::option_env!("FOO").unwrap_or_else("bar")
isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use https://crates.io/crates/default-env, but that's a proc macro. Still better than a build script I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clients would just have to pass in env=empty.rs
for that to work. I don't consider that a huge change given that data slicing is anyway a weird cyclic part of the build with baked data.
(Literally can't see a clean way to do it with baked data in build systems like Blaze, for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like it if we got automatic DCE-based data slicing in Bazel/GN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bazel/etc really do not like cyclic deps, this would involve a genrule that is able to analyze the build graph, copy it over, and run it once. The compiled binary cannot be an input for generating code
I strongly suggest we make the recommended workflow there to use key files, and provide tools for manually generating that key file (which is fine!)
Anyway, we should have this discussion elsewhere I think (i'm filing an issue for a generic discussion that keeps cropping up, but we need another for how we plan to make keyextract work with such build systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that it would be nice if in Bazel/GN using baked data that we don't need to run datagen against a key file or the binary; that the dataful DataProvider<M>
impls can be DCEd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. We could, though that's going to become a problem again if combinators are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should solve #2945 first?
ffi/diplomat/src/provider.rs
Outdated
ICU4XDataProviderInner::Baked => panic!( | ||
"Locale fallback for baked providers requires the 'any_provider' feature" | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought/Suggestion: You can do this without any_provider
by adding another variant to the enum, BakedFallback
or something like that. I think it's a common enough case to warrant that.
Nit: I think panic!
might be too strong for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that. Problem is that fallback doesn't get DCE'd then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't try exactly that, I tried Baked(Option<LocaleFallbacker>)
. I guess having a separate variant will still work with DCE, but we get exponential blowup of forking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to teach diplomat about cfg, then this whole method can be cfg(any(feature = "any_provider", feature = "buffer_provider"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we do cfgs inside the methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried again with an extra BakedFallback(Vec<LocaleFallbacker>)
option, but the compiler cannot figure out that a variant is never used, so it always pulls in the fallback code (which has to be in load
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the variant should also be cfg'd off, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would require a separate baked_fallback
feature. I'm trying to reduce feature complexity these days...
ffi/diplomat/src/provider.rs
Outdated
ICU4XDataProvider(ICU4XDataProviderInner::Baked) | ||
} | ||
#[cfg(feature = "any_provider")] | ||
(ICU4XDataProviderInner::Any(a), ICU4XDataProviderInner::Baked) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought/Issue: I think this means that if the caller invokes fork_by_key
, even if they aren't using the baked provider, they get the massive Baked AnyProvider impl, with no way to disable it; even if it is an empty baked provider, that's still a lot of vtable and ZeroFrom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but the AnyProvider
itself will be the baked provider wrapped in locale fallback or forking, so there's no additional vtable.
ffi/diplomat/build.rs
Outdated
// Empty data generated with | ||
// cargo run -p icu_datagen --features bin,icu_segmenter -- --format mod --use-separate-crates --keys none --out empty_bake | ||
println!( | ||
"cargo:rustc-env=ICU4X_FFI_BAKED_ROOT={}/empty_bake", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't immediately see why we need a build.rs to do this computation? Can we use std::option_env instead?
26b683e
to
0f1e796
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
5b0c7ea
to
0f1e796
Compare
ffi/gn/icu4x/BUILD.gn
Outdated
if (current_cpu == "wasm32") { | ||
deps += [ ":log-v0_4_14" ] | ||
} | ||
|
||
rustenv = [] | ||
rustenv = [ "ICU4X_FFI_BAKED_ROOT=../empty_bake" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd still like there to be a feature so that this isn't necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I'm blocking this on #3005 because it would add another feature to datagen. |
As discussed: start with the approach listed in #2992 and maybe revisit this later. |
Fixes #2743
Added a
COMPLETE
option to databake:This will implement all keys on
Provider
, even those that weren't generated. We need this as otherwiseicu_capi
won't compile unless it has complete data, but we don't want to rely just on DCE.Added a baked provider to
icu_capi
. This uses data from theICU4_FFI_BAKED_ROOT
environment variable, or falls back to empty dataSplit the
Empty
variant intoEmpty
andDestroyed
Destroyed
always returns errors (I wonder if we should just make it UB). We should teach Diplomat about ownership so we can let the compatibility layer deal with thisEmpty
now supportsfork_by_key
,fork_by_locale
andenable_fallback
(basically no-ops)Updated the
fixeddecimal_tiny
example to use databake: 26k -> 21kRemoved
any_provider
as a default feature. The baked provider doesn't go throughdyn AnyProvider
, unless locale fallback is enabled, then we need the feature.