-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump Diplomat and use DiplomatStr[16]
#4353
Conversation
default_level: u8, | ||
) -> Box<ICU4XBidiInfo<'text>> { | ||
#[allow(clippy::unwrap_used)] // #2520 | ||
let text = core::str::from_utf8(text).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only user-triggerable panic left in icu_capi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this a 2.0 breaking thing to return an Option here since you're basically making the changes that fix most of #2520
though I guess we can do that today without breaking ABI (still, breaks the C++ API)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't want this to become fallible in other languages. I think we should reintroduce str
here once we have Diplomat support rust-diplomat/diplomat#369
} | ||
|
||
#[doc(hidden)] | ||
pub fn get_bytes(&self, iana_id: &[u8]) -> Option<TimeZoneBcp47Id> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: this should be get_utf8
and it can be public. Compare to ComposingNormalizer::normalize
and ::normalize_utf8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more similar to our from_bytes
methods than to normalize_utf8
, as it's basically a raw tinystr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes in a string and looks up in the data payload to find the corresponding tinystr if there is one. It is a data structure function. We're not doing any type conversions here so "from" is not accurate.
Besides, we're going to want get_utf16
here as well at some point I think. That is, unless, according to #2413, we want to make these functions named get
, get8
, and get16
? I think "UTF-16" is seen as more of an adjective though, and the reason get_u32
was confusion to basically everyone was because "u32" is a noun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I want to keep it doc-hidden for now, I don't want to block this PR on name bikeshedding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make a follow up issue for this because adding doc hidden APIs is tech debt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
locale: &ICU4XLocale, | ||
write: &mut DiplomatWriteable, | ||
) -> Result<(), ICU4XError> { | ||
// #2520 | ||
// In the future we should be able to make assumptions based on backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Diplomat frontends that actually support UTF-8 (like C++20, Swift, Golang, ...) we still want a way to not re-run UTF-8 validation. Is that still in the plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also true for frontends like JavaScript that use a TextEncoder before giving the strings to ICU4X.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we're going to need three version of each string method: UTF-8, maybe UTF-8, maybe UTF-16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sffc Yes, that will be the str
type.
In the case of JS we should be using the utf-16 endpoint anyway, so JS won't make use of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to #4343, we should start caring about the stability of our doc hidden internal APIs, so while this is not blocking, we definitely need to discuss this new internal API before 1.5 is released. It puts the repo in a bit of a nonreleasable state.
I disagree that it puts the repo in a non-releasable state. If we need to release with the doc-hidden API, we just cannot remove it in the future, but that's fine, we just keep it around doc-hidden until the next major version (which turns out to be quite soon). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+ with the doc(hidden), we can bikeshed later (file a followup)
#2520