Final crate locations for `unicodeset_parser` #3849

skius · 2023-08-11T13:37:44Z

Where should the parsing crates for UnicodeSets (icu_unicodeset_parser) and transliterators (icu_transliterator_parser) end up?

Suggestion: The UnicodeSet parser could live as a utility crate under icu_collections, as it parses into CodePointInversionListAndStringList.

The text was updated successfully, but these errors were encountered:

sffc · 2023-08-29T15:52:51Z

Decisions:

Transliterator parser behind a datagen feature
Note that the whole icu_transliterate crate is experimental for now
The icu_transliteration crate should be renamed to experimental/icu_transliterate
UnicodeSet Parser goes to experimental/unicodeset_parse
Release icu_transliterate under the metacrate as an experimental component but not unicodeset_parse at this time

LGTM: @sffc @skius @robertbastian @eggrobin @Manishearth

Notes:

@sffc - Should it be icu_translit?
@skius - Looks a lot like "translate"
@robertbastian - We say icu_properties not icu_props
@eggrobin - We're not restricted by the 8-character rule in Apple C
@eggrobin - Some sort of regular set expression library may be a dependency of the unicodeset spec, and it is implemented by ICU4C/ICU4J
@Manishearth - Why not do icu_unicodeset_parser a separate crate like icu_codepointtrie_builder?
@sffc - icu_codepointtrie_builder is separate because of WASM/C build issues. That doesn't apply to unicodeset parser.
@robertbastian - unicodeset parser requires properties, like codepointtrie builder requires wasm.
@robertbastian - It shouldn't be a utils crate because we don't want utils depending on components
@sffc - I think it should be experimental/unicodeset_parse right now and eventually components/unicodeset_parse

skius · 2023-08-29T17:34:19Z

On the topic of icu_transliterate vs icu_transliterator - there's icu_normalizer, icu_collator, icu_segmenter (noun-based) - is icu_casemap (i.e., verb-based) the example we want to follow?

Manishearth · 2023-08-29T19:32:51Z

I would lean towards more consistency (and we still have time to rename casemap) but I do kinda like the shorter name

eggrobin · 2023-08-29T19:41:38Z

I prefer transliterator regardless of consistency.

sffc · 2023-08-29T19:55:57Z

Oh, yeah, I would consider it a mistake that we did noun-based names for the older crates. I think using "er"/"or" for the crate name is wrong because the crate contains functionality and doesn't declare exactly how that functionality is surfaced. The icu_transliterate crate helps you transliterate things, and the Transliterator is the way you do it.

Go English making half of these end with "er" and half with "or".

We don't have the suffix for the formatter crates like icu_decimal, icu_list, and icu_datetime.

If we were starting from scratch I'd like to see:

icu_collate::Collator
icu_transliterate::Transliterator
icu_casemap::CaseMapper
icu_normalize::Normalizer
icu_segments::Segmenter

I think casemap is different enough that we should stick with icu_casemap.

But I'm torn between icu_transliterate and icu_transliterator between consistency and correctness.

Note that googling for "transliterate" gives much more useful results than "transliterator". So "transliterate" is better from an education point of view.

How much are people going to really notice the difference between "collator" and "transliterate"? They serve different users with different use cases.

So I think I lean toward icu_transliterate.

robertbastian · 2023-09-05T16:52:55Z

icu_segment::Segmenter, no? icu_<verb>::<verb>+[eo]r

skius added the discuss Discuss at a future ICU4X-SC meeting label Aug 11, 2023

skius added this to the 1.3 Blocking ⟨P1⟩ milestone Aug 11, 2023

sffc removed the discuss Discuss at a future ICU4X-SC meeting label Aug 29, 2023

sffc assigned robertbastian Aug 29, 2023

sffc added the S-small Size: One afternoon (small bug fix or enhancement) label Aug 29, 2023

skius mentioned this issue Aug 29, 2023

Stabilize UnicodeSet parsing #3959

Open

skius mentioned this issue Aug 29, 2023

Stabilize Transliterators #3961

Open

skius mentioned this issue Sep 1, 2023

Chained Transliterator ID parsing #3991

Open

robertbastian mentioned this issue Sep 5, 2023

Moving icu_transliterator_parser into icu_transliteration #4003

Merged

robertbastian mentioned this issue Sep 5, 2023

s/icu_transliteration/icu_transliterate #4007

Merged

robertbastian changed the title ~~Final crate locations for unicodeset_parser, transliterator_parser~~ Final crate locations for unicodeset_parser Sep 13, 2023

robertbastian modified the milestones: 1.3 Blocking ⟨P1⟩, Priority Backlog ⟨P3⟩ Sep 13, 2023

robertbastian removed their assignment Sep 14, 2023

robertbastian modified the milestones: Priority Backlog ⟨P3⟩, 1.3 Blocking ⟨P1⟩ Sep 14, 2023

robertbastian mentioned this issue Sep 19, 2023

s/icu_unicodeset_parser/icu_unicodeset_parse #4054

Merged

robertbastian self-assigned this Sep 19, 2023

Manishearth closed this as completed in #4054 Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final crate locations for `unicodeset_parser` #3849

Final crate locations for `unicodeset_parser` #3849

skius commented Aug 11, 2023

sffc commented Aug 29, 2023

skius commented Aug 29, 2023 •

edited

Loading

Manishearth commented Aug 29, 2023

eggrobin commented Aug 29, 2023

sffc commented Aug 29, 2023 •

edited

Loading

robertbastian commented Sep 5, 2023

Final crate locations for unicodeset_parser #3849

Final crate locations for unicodeset_parser #3849

Comments

skius commented Aug 11, 2023

sffc commented Aug 29, 2023

skius commented Aug 29, 2023 • edited Loading

Manishearth commented Aug 29, 2023

eggrobin commented Aug 29, 2023

sffc commented Aug 29, 2023 • edited Loading

robertbastian commented Sep 5, 2023

Final crate locations for `unicodeset_parser` #3849

Final crate locations for `unicodeset_parser` #3849

skius commented Aug 29, 2023 •

edited

Loading

sffc commented Aug 29, 2023 •

edited

Loading