Bump Diplomat and use `DiplomatStr[16]` #4353

robertbastian · 2023-11-22T20:55:07Z

robertbastian · 2023-11-22T21:01:06Z

ffi/capi/src/bidi.rs

            default_level: u8,
        ) -> Box<ICU4XBidiInfo<'text>> {
+            #[allow(clippy::unwrap_used)] // #2520
+            let text = core::str::from_utf8(text).unwrap();


This is the only user-triggerable panic left in icu_capi

I think we should make this a 2.0 breaking thing to return an Option here since you're basically making the changes that fix most of #2520

though I guess we can do that today without breaking ABI (still, breaks the C++ API)

Hmm, I don't want this to become fallible in other languages. I think we should reintroduce str here once we have Diplomat support rust-diplomat/diplomat#369

sffc · 2023-11-22T23:26:33Z

components/timezone/src/iana_ids.rs

+    }
+
+    #[doc(hidden)]
+    pub fn get_bytes(&self, iana_id: &[u8]) -> Option<TimeZoneBcp47Id> {


Suggestion: this should be get_utf8 and it can be public. Compare to ComposingNormalizer::normalize and ::normalize_utf8

This is more similar to our from_bytes methods than to normalize_utf8, as it's basically a raw tinystr

This takes in a string and looks up in the data payload to find the corresponding tinystr if there is one. It is a data structure function. We're not doing any type conversions here so "from" is not accurate.

Besides, we're going to want get_utf16 here as well at some point I think. That is, unless, according to #2413, we want to make these functions named get, get8, and get16? I think "UTF-16" is seen as more of an adjective though, and the reason get_u32 was confusion to basically everyone was because "u32" is a noun.

This is why I want to keep it doc-hidden for now, I don't want to block this PR on name bikeshedding.

Please make a follow up issue for this because adding doc hidden APIs is tech debt.

sffc · 2023-11-22T23:28:43Z

ffi/capi/src/casemap.rs

            locale: &ICU4XLocale,
            write: &mut DiplomatWriteable,
        ) -> Result<(), ICU4XError> {
-            // #2520
-            // In the future we should be able to make assumptions based on backend


For Diplomat frontends that actually support UTF-8 (like C++20, Swift, Golang, ...) we still want a way to not re-run UTF-8 validation. Is that still in the plan?

This is also true for frontends like JavaScript that use a TextEncoder before giving the strings to ICU4X.

Yes, we're going to need three version of each string method: UTF-8, maybe UTF-8, maybe UTF-16.

@sffc Yes, that will be the str type.

In the case of JS we should be using the utf-16 endpoint anyway, so JS won't make use of this

sffc

According to #4343, we should start caring about the stability of our doc hidden internal APIs, so while this is not blocking, we definitely need to discuss this new internal API before 1.5 is released. It puts the repo in a bit of a nonreleasable state.

robertbastian · 2023-11-23T16:07:30Z

According to #4343, we should start caring about the stability of our doc hidden internal APIs, so while this is not blocking, we definitely need to discuss this new internal API before 1.5 is released. It puts the repo in a bit of a nonreleasable state.

I disagree that it puts the repo in a non-releasable state. If we need to release with the doc-hidden API, we just cannot remove it in the future, but that's fine, we just keep it around doc-hidden until the next major version (which turns out to be quite soon).

robertbastian · 2023-11-27T08:22:33Z

#3006

Manishearth

r+ with the doc(hidden), we can bikeshed later (file a followup)

str

029bcd5

robertbastian requested a review from sffc November 22, 2023 20:55

robertbastian requested review from Manishearth, nordzilla and a team as code owners November 22, 2023 20:55

robertbastian removed the request for review from nordzilla November 22, 2023 20:55

robertbastian commented Nov 22, 2023

View reviewed changes

sffc reviewed Nov 22, 2023

View reviewed changes

gn

c377da4

robertbastian requested a review from sffc November 23, 2023 00:49

sffc reviewed Nov 23, 2023

View reviewed changes

robertbastian added 2 commits November 23, 2023 16:46

js fixes

2430172

gn

5a37334

robertbastian requested a review from sffc November 23, 2023 16:07

robertbastian added 3 commits November 27, 2023 08:14

Merge branch 'main' into str

2e1cb40

bump

03c8169

fuck

3f51840

Manishearth approved these changes Nov 27, 2023

View reviewed changes

sffc approved these changes Nov 27, 2023

View reviewed changes

robertbastian merged commit afc612d into unicode-org:main Nov 28, 2023

robertbastian deleted the str branch November 28, 2023 18:05

robertbastian mentioned this pull request Apr 23, 2024

Add is_normalized_up_to to Normalizer #4334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump Diplomat and use `DiplomatStr[16]` #4353

Bump Diplomat and use `DiplomatStr[16]` #4353

robertbastian commented Nov 22, 2023

robertbastian Nov 22, 2023

Manishearth Nov 22, 2023 •

edited

Loading

robertbastian Nov 23, 2023

sffc Nov 22, 2023

robertbastian Nov 23, 2023

sffc Nov 23, 2023

robertbastian Nov 23, 2023

sffc Nov 27, 2023

robertbastian Nov 28, 2023

sffc Nov 22, 2023

sffc Nov 22, 2023

robertbastian Nov 22, 2023

Manishearth Nov 22, 2023

sffc left a comment

robertbastian commented Nov 23, 2023

robertbastian commented Nov 27, 2023

Manishearth left a comment

Bump Diplomat and use DiplomatStr[16] #4353

Bump Diplomat and use DiplomatStr[16] #4353

Conversation

robertbastian commented Nov 22, 2023

Choose a reason for hiding this comment

Manishearth Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sffc left a comment

Choose a reason for hiding this comment

robertbastian commented Nov 23, 2023

robertbastian commented Nov 27, 2023

Manishearth left a comment

Choose a reason for hiding this comment

Bump Diplomat and use `DiplomatStr[16]` #4353

Bump Diplomat and use `DiplomatStr[16]` #4353

Manishearth Nov 22, 2023 •

edited

Loading