-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[char_property] Implement *CharProperty traits #89
Conversation
fef3999
to
e2f3c6f
Compare
The first link gives me a "Oops! That page doesn’t exist or is private." I think you have a trailing The second link follows a couple redirects and then 404s. Since the URL starts with behnam/rust-unic, I suspect github's intra-github linking autocomplete got the better of you. rust-lang/rfcs/pull/1406 |
Oops. Fixed the links! |
03a8168
to
39494f5
Compare
Okay, I think I have covered all the basics of all the UCD properties we already have and now this PR is ready for review and land. Here's the summary of implementations: $ g g -e 'impl \w*CharProperty' unic/ucd/
unic/ucd/age/src/age.rs:impl CharProperty for Age {
unic/ucd/bidi/src/bidi_class.rs:impl CharProperty for BidiClass {
unic/ucd/bidi/src/bidi_class.rs:impl EnumeratedCharProperty for BidiClass {
unic/ucd/category/src/category.rs:impl CharProperty for GeneralCategory {
unic/ucd/category/src/category.rs:impl EnumeratedCharProperty for GeneralCategory {
unic/ucd/normal/src/canonical_combining_class.rs:impl CharProperty for CanonicalCombiningClass {
unic/ucd/normal/src/canonical_combining_class.rs:impl NumericCharProperty<u8> for CanonicalCombiningClass {
unic/ucd/normal/src/decomposition_type.rs:impl OptionCharProperty for DecompositionType {
unic/ucd/normal/src/decomposition_type.rs:impl EnumeratedCharProperty for DecompositionType { What do you think, @CAD97? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
I've just left a few minor notes inline.
As for the matter of having trait methods in scope, it would make sense for the char_property
macro to make these methods intrinsic and only expose them into the traits. Hopefully the future will make the syntax cleaner, but hey, it's hidden behind the macro implementation for now.
unic/utils/src/char_property.rs
Outdated
/// A Character Property defined on all characters. | ||
/// | ||
/// Examples: *Age*, *Name*, *General_Category*, *Bidi_Class* | ||
pub trait CharProperty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably extend from OptionCharProperty
, as do Eq : PartialEq
and Ord : PartialOrd
.
I would then do a blanket impl from CharProperty
to OptionCharProperty
:
impl<P: CharProperty> OpionCharProperty for P {
fn of(ch: char) -> Option<Self> {
Some(<Self as CharProperty>::of(ch))
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing about Eq
/Ord
and their Partial
equivalents is that the function signature stays the same, and it's only the semantics that different.
Here, we have a different signature for of
, which we can pull of to work somehow, but I don't think it's good practice.
Have you seen anything more like this case on std/3rd-party libs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the only real example is the (ongoing?) discussion of an impl<T: From> TryFrom for T
.
I'd be fine without this, but this would also simplify the marker for Enumerated/NumericCharProperty
to just be OptionCharProperty
since CharProperty
implies CharProperty
.
I don't really have an argument for it, but it just seems like "infallible transformation from char" is a subset of "fallible transformation from char". (Thus, the same use case as Result<_, !>
?)
unic/utils/src/char_property.rs
Outdated
/// Usage Note: If the property is of type *Catalog* (as defined by Unicode), it's recommended to | ||
/// (in some way) mark the type as *non-exhaustive*, so that adding new variants to the `enum` type | ||
/// won't result in API breakage. | ||
pub trait EnumeratedCharProperty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably extend from (Option)CharProperty
, unless it would ever make sense to have a EnumeratedCharProperty
which is not also a (Option)CharProperty
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I can introduce a marker here to ensure more enforcement.
unic/utils/src/char_property.rs
Outdated
/// A Character Property with numeric values. | ||
/// | ||
/// Examples: *Numeric_Value*, *Canonical_Combining_Class* | ||
pub trait NumericCharProperty<Value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably extend from (Option)CharProperty
, unless it would ever make sense to have a NumericCharProperty
which is not also a (Option)CharProperty
.
unic/utils/src/char_property.rs
Outdated
pub trait NumericCharProperty<Value> | ||
where | ||
Self: Copy + Debug + Default + Display + Eq + Hash, | ||
Value: NumericCharPropertyValue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a matter of taste, but I would put this trait bound inside the <>
. (So pub trait NumericCharProperty<Value: NumericCharPropertyValue>
Reasoning: it's simple, and is very important for the definition of the trait.
unic/ucd/normal/src/gen_cat.rs
Outdated
/// Return whether the given character is a combining mark (`General_Category=Mark`) | ||
pub fn is_combining_mark(c: char) -> bool { | ||
GENERAL_CATEGORY_MARK.find(c).is_some() | ||
const TABLE: &'static [(char, char)] = include!("tables/general_category_mark.rsv"); | ||
TABLE.find(c).is_some() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait how is this working when you aren't use
ing unic_utils::CharDataTable
anywhere in the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it working with use unic_utils::CharDataTable;
under mod mark
and CharDataTable::<()>::find(&TABLE, c).is_some()
.
Unfortunately &[(char, char)]
matches &[(char, V)]
as well, thus the reason I want to get around to adding the CharRange
structure.
Once all of unic-gen
is engaged and I can make the transition to actually using it across the board, I'll work on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm splitting the clean up diff, plus a fix for this, into another PR. Will update this PR with only the char-props matters.
Character Properties are of different kinds and shapes, and as UNIC components grow, we need a better way to be able to categorize them by their shape, and a way to make sure we have consistent, noncolliding API for them. This is the first step into building a CharProperty taxonomy, with as little as possibly needed to provide the assurances desired. We hope that the implementation can be improved over time with new features added to the language. There's already some proposals in this front. See these discussions for more details: * [Traits as contract, without changes to call-sites](https://users.rust-lang.org/t/traits-as-contract-without-changes-to-call-sites/11938/11>) * [RFC: delegation of implementation](rust-lang/rfcs#1406)
unic/utils/src/char_property.rs
Outdated
|
||
// Make char property type T also a parameter to this trait, otherwise there will be "conflicting | ||
// implementation" error when marking traits below as `CharProperty`. | ||
impl<T> CharProperty<T> for T {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this makes the marker useless, since every T now is a CharProperty
! :( Need to figure out another solution here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like what you want is CompleteCharProperty
and OptionCharProperty
to be mutually exclusive...
Making OptionCharProperty
a parent of CompleteCharProperty
and using OptionCharProperty
would solve the problem (and in any case I suspect of
will be intrinsic and used as such), but it seems difficult to use this pattern like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, can't figure out a way to tell compiler that these two traits are mutually exclusive.
What do you mean by "I suspect of()
will be intrinsic and used as such"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect to see (really sketchy syntax)
enum Property {}
impl Property {
fn of()...
}
impl CharProperty for Property {
fn of() = (Self as Property).of()...
}
if that works such that you can just do Property::of
without knowing about CompleteCharProperty
.
But ¯\_(ツ)_/¯ I'm just assuming shapes of unwritten things right now
/// A Character Property defined on all characters. | ||
/// | ||
/// Examples: *Age*, *Name*, *General_Category*, *Bidi_Class* | ||
pub trait CompleteCharProperty: PartialCharProperty + Default { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works very well. Thanks for the idea, @CAD97!
The only issue is the error if none of these are implemented and a Property Range type is used, in which case rustc claims that CompleteCharProperty
is required to be implemented, which is invalid, and only PartialCharProperty
is required. I suppose I should file an issue for this upstream.
Posted about the side issues notice here on the forum: https://users.rust-lang.org/t/unexpected-behaviors-of-trait-bounds/12286 |
Character Properties are of different kinds and shapes, and as UNIC
components grow, we need a better way to be able to categorize them by
their shape, and a way to make sure we have consistent, noncolliding
API for them.
This is the first step into building a CharProperty taxonomy, with as
little as possibly needed to provide the assurances desired.
We hope that the implementation can be improved over time with new
features added to the language. There's already some proposals in this
front. See these discussions for more details:
Traits as contract, without changes to call-sites
RFC: delegation of implementation