Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to WASMI engine in icu_codepointtrie_builder #4621

Merged
merged 16 commits into from
Feb 27, 2024
Merged

Conversation

sffc
Copy link
Member

@sffc sffc commented Feb 16, 2024

Fixes #3241

I switched our WASM engine to wasmi because it is faster to compile and has fewer dependencies. I changed how we interact with C++, too, in order to improve the performance.

@sffc sffc changed the title Rough upgrade to Wasmer 4 Upgrade to Wasmer 4 Feb 16, 2024
@sffc sffc requested a review from robertbastian February 16, 2024 02:26
sffc

This comment was marked as outdated.

@sffc

This comment was marked as outdated.

@sffc sffc changed the title Upgrade to Wasmer 4 Switch to WASMI engine in icu_codepointtrie_builder Feb 16, 2024
@sffc sffc marked this pull request as ready for review February 16, 2024 11:30
@sffc sffc requested review from echeran and a team as code owners February 16, 2024 11:30
@sffc sffc requested review from Manishearth and removed request for echeran and robertbastian February 16, 2024 11:30
@sffc
Copy link
Member Author

sffc commented Feb 16, 2024

Here is the output of depscheck with this PR:

Found two versions for `syn` (v1.0.109 & v2.0.48)
	Found non-allowlisted crate `ambient-authority`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `anyhow`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `async-trait`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `bitflags`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `cap-fs-ext`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `cap-primitives`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `cap-rand`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `cap-std`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `cap-time-ext`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `dirs`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `dirs-sys`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `downcast-rs`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `fs-set-times`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `getrandom`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `heck`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `indexmap-nostd`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `io-extras`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `io-lifetimes`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `ipnet`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `is-terminal`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `leb[128](https://github.com/unicode-org/icu4x/actions/runs/7929915708/job/21651128951?pr=4621#step:11:129)`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `linux-raw-sys`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `maybe-owned`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `paste`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `pin-project-lite`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `ppv-lite86`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `rand`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `rand_chacha`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `rand_core`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `rustix`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `shellexpand`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `spin`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `system-interface`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `tracing`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `tracing-attributes`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `tracing-core`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wasi-cap-std-sync`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wasi-common`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wasmi`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wasmi_arena`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wasmi_core`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wasmi_wasi`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wasmparser-nostd`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wast`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wiggle`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wiggle-generate`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `wiggle-macro`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional
	Found non-allowlisted crate `witx`, consider adding to `EXTRA_DATAGEN_DEPS` or `EXTRA_ZIP_DEPS` or `EXTRA_RAYON_DEPS` in tools/depcheck/src/allowlist.rs if intentional

@Manishearth

@sffc
Copy link
Member Author

sffc commented Feb 16, 2024

On main:

INFO  [icu_datagen::driver] Generated key segmenter/sentence@1 (3.360s, flushed in 64.444ms)
INFO  [icu_datagen::driver] Generated key segmenter/word@1 (4.779s, flushed in 62.042ms)
INFO  [icu_datagen::driver] Generated key segmenter/grapheme@1 (4.805s, flushed in 55.470ms)
INFO  [icu_datagen::driver] Generated key segmenter/line@1 (9.127s, flushed in 79.752ms)

On this PR:

INFO  [icu_datagen::driver] Generated key segmenter/grapheme@1 (6.838s, flushed in 55.157ms)
INFO  [icu_datagen::driver] Generated key segmenter/sentence@1 (8.447s, flushed in 63.641ms)
INFO  [icu_datagen::driver] Generated key segmenter/word@1 (8.472s, flushed in 61.680ms)
INFO  [icu_datagen::driver] Generated key segmenter/line@1 (29.712s, flushed in 78.175ms)

Please let me know what you all think. I don't want to sink more time into this until we know which direction we're taking.

Manishearth
Manishearth previously approved these changes Feb 17, 2024
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this approach. I'm okay with the slowdown, though I think we can potentially speed things up by building via ranges instead of codepoint by codepoint.

Comment on lines 45 to 46
wasmi = "0.31.2"
wasmi_wasi = "0.31.2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: why are these non-optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

clippy::unwrap_used,
clippy::expect_used,
clippy::panic,
// This Datagen-only builder code may panic when interacting with WASM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: datagen is not really a concept in this crate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docs

@sffc
Copy link
Member Author

sffc commented Feb 24, 2024

I managed to get rid of the WASI requirement at runtime entirely, which eliminates a ton of dependencies. We still need WASI for building the C++ code, but the linker strips out all of the runtime WASI symbols. This is enabled by Reactor Modules which are a relatively recent change to the WASI spec.

robertbastian
robertbastian previously approved these changes Feb 26, 2024
components/collections/codepointtrie_builder/cpp/Makefile Outdated Show resolved Hide resolved
Comment on lines 237 to 239
2 => 0, // UCPTRIE_VALUE_BITS_16
4 => 1, // UCPTRIE_VALUE_BITS_32
1 => 2, // UCPTRIE_VALUE_BITS_8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impressive size reduction!

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also hooray we don't need to dedup syn anymore

either way i have most of the work done for that in wasmi-labs/wasmi#941

@sffc sffc merged commit 952a39f into unicode-org:main Feb 27, 2024
30 checks passed
@sffc sffc deleted the wasmer4 branch February 27, 2024 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update wasmer to 3.1.1
3 participants