Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign combining table #277

Merged
merged 8 commits into from
Dec 29, 2024
Merged

Conversation

eschnett
Copy link
Collaborator

@eschnett eschnett commented Dec 18, 2024

Closes #271.

@eschnett
Copy link
Collaborator Author

This is based on and supersedes #271.

@eschnett
Copy link
Collaborator Author

The previously failing normalization tests now succeed. However, others are failing.

It seems the failing ones are mentioned in the exclusions:

# ================================================
# (2) Post Composition Version precomposed characters
#
# These characters cannot be derived solely from the UnicodeData.txt file
# in this version of Unicode.
#
# Note that characters added to the standard after the
# Composition Version and which have canonical decomposition mappings
# are not automatically added to this list of Post Composition
# Version precomposed characters.
# ================================================

2ADC    #  FORKING
1D15E   #  MUSICAL SYMBOL HALF NOTE
1D15F   #  MUSICAL SYMBOL QUARTER NOTE
1D160   #  MUSICAL SYMBOL EIGHTH NOTE
1D161   #  MUSICAL SYMBOL SIXTEENTH NOTE
1D162   #  MUSICAL SYMBOL THIRTY-SECOND NOTE
1D163   #  MUSICAL SYMBOL SIXTY-FOURTH NOTE
1D164   #  MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE
1D1BB   #  MUSICAL SYMBOL MINIMA
1D1BC   #  MUSICAL SYMBOL MINIMA BLACK
1D1BD   #  MUSICAL SYMBOL SEMIMINIMA WHITE
1D1BE   #  MUSICAL SYMBOL SEMIMINIMA BLACK
1D1BF   #  MUSICAL SYMBOL FUSA WHITE
1D1C0   #  MUSICAL SYMBOL FUSA BLACK

@eschnett
Copy link
Collaborator Author

eschnett commented Dec 18, 2024

Yes, the only failing tests are normalization tests involving these characters. Ideas appreciated.

Edit: Solved.

@eschnett
Copy link
Collaborator Author

@stevengj Can you approve the CI run?

data/pairs Outdated Show resolved Hide resolved
utf8proc.c Outdated Show resolved Hide resolved
@eschnett
Copy link
Collaborator Author

@stevengj Can you approve the CI run?

@stevengj
Copy link
Member

stevengj commented Dec 20, 2024

CI is running, but is failing with:

Error: This request has been automatically failed because it uses a deprecated version of actions/upload-artifact: v2. Learn more: https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/

This was part of the actions added by @inkydragon in #229 … I'm not actually sure why we have them. Is anyone actually using these artifacts?

@eschnett
Copy link
Collaborator Author

@stevengj I updated the actions. Can you approve the CI workflow again?

@eschnett
Copy link
Collaborator Author

@inkydragon Can you approve the CI workflows?

@inkydragon
Copy link

I'm not actually sure why we have them. Is anyone actually using these artifacts?

One of the problems here is that we lack a depbot to automatically upgrade the version of github action.

I would say that this is usually more useful for Windows users who only want compiled libraries, rather than downloading the source code and compiling it themselves.
Of course, since utf8proc is not a program that end-users can use directly, perhaps removing the upload artifacts step would be a good option.

Can you approve the CI workflows?

Sorry, I don't have committing privileges for this repo.

@dundargoc

This comment was marked as resolved.

@stevengj
Copy link
Member

@eschnett, can you add some documentation of the new combining tables, e.g. add a README.md to the data directory?

@eschnett
Copy link
Collaborator Author

@stevengj I added a comment in the definition of utf8proc_property_struct describing the table layout and lookup.

utf8proc.h Outdated Show resolved Hide resolved
utf8proc.c Show resolved Hide resolved
utf8proc.h Show resolved Hide resolved
@eschnett eschnett merged commit c8d815a into JuliaStrings:master Dec 29, 2024
12 checks passed
@stevengj
Copy link
Member

In the future, please squash commits when merging.

inkydragon pushed a commit to JuliaLang/julia that referenced this pull request Jan 2, 2025
Similar to #51799, support [Unicode
16](https://www.unicode.org/versions/Unicode16.0.0/) by bumping utf8proc
to 2.10.0 (thanks to JuliaStrings/utf8proc#277
by @eschnett).

This allows us to use [7 exciting new emoji
characters](https://www.unicode.org/emoji/charts-16.0/emoji-released.html)
as identifiers, including "face with bags under eyes"

![image](https://github.com/user-attachments/assets/4959b7ca-100e-4efc-af58-b03184ae2dca)
`"\U1fae9"` (but still no superscript "q").

Closes #56035.
@c42f
Copy link
Contributor

c42f commented Jan 23, 2025

Just a driveby comment of appreciation for how much nicer the table generator is in the section you've changed. Thank you @eschnett :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants