-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor properties to separate crate #1153
Conversation
The "Data" in "UnicodeSetData" doesn't seem necessary. (Compare, say icu_decimal::provider, where we have What do you intend to call the CodePointTrie equivalent? |
I looked at other data provider structs, and it seems like they all have a plural noun name, not a singular word name -- just an observation. As far as how to re-/name the data struct for UnicodeSet data in a way that will be consistent with the data struct for CPT data, I'm open to suggestions for improve my naming ideas. We currently have UnicodePropertyV1 and UnicodePropertyMapV1, but ythen ou could imagine changing UnicodePropertyV1 -> UnicodePropertySetV1 for consistency. But then at that point, why not drop the "Property" -- b/c I think the distinguishing factor between these data structs is the shape of the data (the data structure) that they contain -- and then you get UnicodeSetV1 and UnicodeMapV1. I thought UnicodeSetV1 sounded too much like UnicodeSet itself (which is why I originally said UnicodeSetDataV1), but that's enough naming bikeshedding for me, and I usually defer to others for short better names. |
e5c4dec
to
5b84b60
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
The conversation above is relevant to PR #1167, and I incorporated those ideas into that PR since it seemed necessary to avoid confusing code and naming. However, in order for that PR to proceed, it needs to solve a confusing and misleading circular dependency that is currently caused by having property enums / newtype structs defined in At the least, we should refactor the property enums/newtypes to a separate crate. This PR is being renamed and re-purposed to represent that new scope of work which will unblock #1167. (And if Rust only allows impls of traits on enums/newtypes in the same module/crate in which they are defined, then perhaps the |
Regarding names: I don't have strong feelings, but it occurs to me that if we were doing this from scratch, without basing anything on ICU4C, I think it's likely that we'd omit "Unicode" and call these something more like "CharacterSet" and "CharacterMap". Most of our data is from Unicode; what's interesting about this data is that it represents character properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure all the points from the renaming issue are covered
pub struct GeneralCategory(pub(crate) u32); | ||
pub struct GeneralCategory(pub u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change, and if you really need it, add a impl TryFrom<u32> for GeneralCategory
if there isn't already such an impl. @iainireland thoughts? Don't we already need this for the Serde?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pub struct Script(pub(crate) u16); | ||
pub struct Script(pub u16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
components/properties/src/lib.rs
Outdated
//! `icu_properties` is a utility crate of the [`ICU4X`] project. | ||
//! | ||
//! This component provides definitions of [Unicode Properties]. | ||
//! | ||
//! [Unicode Properties]: https://unicode-org.github.io/icu/userguide/strings/properties.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please strengthen these docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
components/properties/src/lib.rs
Outdated
mod enum_props; | ||
mod ule; | ||
|
||
pub use enum_props::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion: Do we want to pub use enum_props::*
and put everything in the top-level namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two reasonable choices:
icu_properties::get_alnum()
andicu_properties::get_general_category()
icu_properties::sets::get_alnum()
andicu_properties::maps::get_general_category()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly favor hoisting to the top level (1/10), but I can see advantages to the smaller namespaces.
@zbraniecki says:
my intuition is that it's easier to consoidate namespaces than separate them, so I'd start with more specific (
::sets::
) and reconsider once the API surface stabilizes
So I think we should go ahead with the smaller namespaces for now, and revisit this later when we do a comprehensive API review prior to 1.0. (We don't need a separate issue to track that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. We only have getter APIs for UnicodeSet
s, which are in the new icu_properties::sets
module. The enums/newtypes for the actual properties themselves are up in the top-level namespace.
components/icu/src/lib.rs
Outdated
pub use icu_uniset::*; | ||
pub use icu_properties::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename icu::uniset
to icu::properties
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was part of the section defining icu::uniset
, which was removed.
/// use icu::uniset::UnicodeSetBuilder; | ||
/// use icu_uniset::UnicodeSetBuilder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should still be icu::uniset
, pending the discussion below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we decided to remove icu::uniset
, I am keeping it was icu_uniset
because we cannot revert back to icu::uniset
.
… property enums+etc. module (`enum_props`->`props`)
… separate crate)
/// GeneralSubcategory only supports specific subcategories (eg `UppercaseLetter`). | ||
/// It does not support grouped categories (eg `Letter`). For grouped categories, use [`GeneralCategory`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Praise: good catch
//! [`ICU4X`]: ../icu/index.html | ||
//! [Unicode Properties]: https://unicode-org.github.io/icu/userguide/strings/properties.html | ||
//! [`UnicodeSet`]: ../../icu_uniset/struct.UnicodeSet.html | ||
//! [`sets`]: sets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below: Let's fix these crossrefs.
components/properties/src/lib.rs
Outdated
#![no_std] | ||
|
||
mod props; | ||
#[allow(unused)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you have the #[allow(unused)]
?
@@ -658,7 +659,7 @@ where | |||
GeneralCategory::SpaceSeparator => key::GENERAL_CATEGORY_SPACE_SEPARATOR_V1, | |||
_ => return Err(UnicodeSetError::UnknownGeneralCategorySet(enum_val.0)), | |||
}; | |||
get_prop(provider, key) | |||
get_uniset(provider, key) | |||
} | |||
|
|||
/// Return a [`UnicodeSet`] for a particular value of the Script Unicode enumerated property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: change this function to get_for_script
so that you call it as icu::properties::sets::get_for_script()
Renaming the
UnicodePropertyV1
data struct, which representsUnicodeSet
data for binary properties and for enumerated properties (per key=val pair). It would allow the corresponding data provider struct forCodePointTrie
data, which would only exist for enumerated properties, to have a consistent name, once created.