Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Diplomat and use DiplomatStr[16] #4353

Merged
merged 7 commits into from
Nov 28, 2023

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian requested a review from sffc November 22, 2023 20:55
@robertbastian robertbastian removed the request for review from nordzilla November 22, 2023 20:55
default_level: u8,
) -> Box<ICU4XBidiInfo<'text>> {
#[allow(clippy::unwrap_used)] // #2520
let text = core::str::from_utf8(text).unwrap();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only user-triggerable panic left in icu_capi

Copy link
Member

@Manishearth Manishearth Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this a 2.0 breaking thing to return an Option here since you're basically making the changes that fix most of #2520

though I guess we can do that today without breaking ABI (still, breaks the C++ API)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't want this to become fallible in other languages. I think we should reintroduce str here once we have Diplomat support rust-diplomat/diplomat#369

}

#[doc(hidden)]
pub fn get_bytes(&self, iana_id: &[u8]) -> Option<TimeZoneBcp47Id> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: this should be get_utf8 and it can be public. Compare to ComposingNormalizer::normalize and ::normalize_utf8

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more similar to our from_bytes methods than to normalize_utf8, as it's basically a raw tinystr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes in a string and looks up in the data payload to find the corresponding tinystr if there is one. It is a data structure function. We're not doing any type conversions here so "from" is not accurate.

Besides, we're going to want get_utf16 here as well at some point I think. That is, unless, according to #2413, we want to make these functions named get, get8, and get16? I think "UTF-16" is seen as more of an adjective though, and the reason get_u32 was confusion to basically everyone was because "u32" is a noun.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I want to keep it doc-hidden for now, I don't want to block this PR on name bikeshedding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a follow up issue for this because adding doc hidden APIs is tech debt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locale: &ICU4XLocale,
write: &mut DiplomatWriteable,
) -> Result<(), ICU4XError> {
// #2520
// In the future we should be able to make assumptions based on backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Diplomat frontends that actually support UTF-8 (like C++20, Swift, Golang, ...) we still want a way to not re-run UTF-8 validation. Is that still in the plan?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also true for frontends like JavaScript that use a TextEncoder before giving the strings to ICU4X.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we're going to need three version of each string method: UTF-8, maybe UTF-8, maybe UTF-16.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sffc Yes, that will be the str type.

In the case of JS we should be using the utf-16 endpoint anyway, so JS won't make use of this

@robertbastian robertbastian requested a review from sffc November 23, 2023 00:49
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to #4343, we should start caring about the stability of our doc hidden internal APIs, so while this is not blocking, we definitely need to discuss this new internal API before 1.5 is released. It puts the repo in a bit of a nonreleasable state.

@robertbastian
Copy link
Member Author

According to #4343, we should start caring about the stability of our doc hidden internal APIs, so while this is not blocking, we definitely need to discuss this new internal API before 1.5 is released. It puts the repo in a bit of a nonreleasable state.

I disagree that it puts the repo in a non-releasable state. If we need to release with the doc-hidden API, we just cannot remove it in the future, but that's fine, we just keep it around doc-hidden until the next major version (which turns out to be quite soon).

@robertbastian robertbastian requested a review from sffc November 23, 2023 16:07
@robertbastian
Copy link
Member Author

#3006

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ with the doc(hidden), we can bikeshed later (file a followup)

@robertbastian robertbastian merged commit afc612d into unicode-org:main Nov 28, 2023
@robertbastian robertbastian deleted the str branch November 28, 2023 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants