Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing docs on how to run unicode-table-generator #131640

Closed
RalfJung opened this issue Oct 13, 2024 · 14 comments · Fixed by #132499
Closed

Missing docs on how to run unicode-table-generator #131640

RalfJung opened this issue Oct 13, 2024 · 14 comments · Fixed by #132499
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Comments

@RalfJung
Copy link
Member

The file library/core/src/unicode/unicode_data.rs says

///! This file is generated by src/tools/unicode-table-generator; do not edit manually!

However, it doesn't say how to run that tool. The "obvious" ./x.py run src/tools/unicode-table-generator does not work. I didn't find anything in the dev guide either.

So I now edited the file by hand instead 🤷 but that can't be the goal of this.^^

Cc @Mark-Simulacrum

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 13, 2024
@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

maybe

cargo +nightly run src/tools/unicode-table-generator -- library/core/src/unicode/unicode_data.rs

This looks like a helper util that's not managed by bootstrap AFAICT

EDIT: no it's not that simple, this will cause src/tool/miri errors because of how the workspace works.

@jieyouxu
Copy link
Member

I got something working, I think?

PS E:\Repos\rust> ./x run src/tools/unicode-table-generator
Building bootstrap
    Finished `dev` profile [unoptimized] target(s) in 0.06s
Building stage0 tool unicode-table-generator (x86_64-pc-windows-msvc)
   Compiling unicode-table-generator v0.1.0 (E:\Repos\rust\src\tools\unicode-table-generator)
    Finished `release` profile [optimized + debuginfo] target(s) in 1.72s
Alphabetic     : 1727 bytes, 142759 codepoints in 757 ranges (65 - 205744) using skiplist
Case_Ignorable : 1053 bytes, 2749 codepoints in 452 ranges (39 - 918000) using skiplist
Cased          : 407 bytes, 4578 codepoints in 159 ranges (65 - 127370) using skiplist
Cc             : 9 bytes, 65 codepoints in 2 ranges (0 - 160) using skiplist
Grapheme_Extend: 887 bytes, 2193 codepoints in 375 ranges (768 - 918000) using skiplist
Lowercase      : 935 bytes, 2569 codepoints in 675 ranges (97 - 125252) using bitset
N              : 457 bytes, 1911 codepoints in 144 ranges (48 - 130042) using skiplist
Uppercase      : 799 bytes, 1978 codepoints in 656 ranges (65 - 127370) using bitset
there are 25 points
White_Space    : 256 bytes, 25 codepoints in 10 ranges (9 - 12289) using cascading
Total table sizes: 6530 bytes
Build completed successfully in 0:00:15
PS E:\Repos\rust>

@jieyouxu

This comment has been minimized.

@RalfJung
Copy link
Member Author

That's very strange, I just get an error:

$ ./x.py run src/tools/unicode-table-generator
Building bootstrap
    Finished `dev` profile [unoptimized] target(s) in 0.08s
ERROR: no `run` rules matched ["src/tools/unicode-table-generator"]
HELP: run `x.py run --help --verbose` to show a list of available paths
NOTE: if you are adding a new Step to bootstrap itself, make sure you register it with `describe!`
Build completed unsuccessfully in 0:00:00

@jieyouxu
Copy link
Member

Oh sorry to be clear, I said "I got something working" i.e. I had to hook it up in bootstrap 😆

@RalfJung
Copy link
Member Author

Ah :D

@jieyouxu
Copy link
Member

When I ran it after hooking it up, it seemed like it just modified the FIXME messages, lol

PS E:\Repos\rust> git diff .\library\
diff --git a/library/core/src/unicode/unicode_data.rs b/library/core/src/unicode/unicode_data.rs
index db2e3ddd754..143beb37706 100644
--- a/library/core/src/unicode/unicode_data.rs
+++ b/library/core/src/unicode/unicode_data.rs
@@ -18,14 +18,16 @@ const fn bitset_search<
     let bucket_idx = (needle / 64) as usize;
     let chunk_map_idx = bucket_idx / CHUNK_SIZE;
     let chunk_piece = bucket_idx % CHUNK_SIZE;
-    // FIXME(const-hack): Revert to `slice::get` when slice indexing becomes possible in const.
+    // FIXME: const-hack: Revert to `slice::get` after `const_slice_index`
+    // feature stabilizes.
     let chunk_idx = if chunk_map_idx < chunk_idx_map.len() {
         chunk_idx_map[chunk_map_idx]
     } else {
         return false;
     };
     let idx = bitset_chunk_idx[chunk_idx as usize][chunk_piece] as usize;
-    // FIXME(const-hack): Revert to `slice::get` when slice indexing becomes possible in const.
+    // FIXME: const-hack: Revert to `slice::get` after `const_slice_index`
+    // feature stabilizes.
     let word = if idx < bitset_canonical.len() {
         bitset_canonical[idx]
     } else {

@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

@RalfJung I hooked it up to bootstrap ./x run src/tools/unicode-table-generator in #131647 (maybe you can base your changes on top of that branch if you want to edit the tool itself to generate the unicode tables and run it via bootstrap 🤷). I can't say I have any clue about how that tool is supposed to be run, though.

@jieyouxu jieyouxu added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. labels Oct 13, 2024
@RalfJung
Copy link
Member Author

When I ran it after hooking it up, it seemed like it just modified the FIXME messages, lol

That's probably because I patched the unicode_data file to update the comments and didn't realize that this is a generated file with its sources somewhere else, and CI did not stop me. 😂

@jieyouxu
Copy link
Member

Ok cool, I synced the comments in the tool to your changes, and ./x run src/tools/unicode-table-generator now no longer produces a diff for library/core/src/unicode/unicode_data.rs.

@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 13, 2024
@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

Oh wait, but that means we'll have to sync back changes from #131641 once that's merged again. EDIT: the tool was also updated

@jieyouxu
Copy link
Member

I'll go add a triagebot message to remind that this is generated by the unicode-table-generator tool and should not be hand edited in source, but instead edit the tool itself.

@RalfJung
Copy link
Member Author

Oh wait, but that means we'll have to sync back changes from #131641 once that's merged again.

That PR changes both the tool and the generated file so it should not need any new manual sync... unless I screwed up and didn't properly do the same changes on both sides.

I'll go add a triagebot message to remind that this is generated by the unicode-table-generator tool and should not be hand edited in source, but instead edit the tool itself.

That's a good start, but ideally CI would fail if the file does not match. This could be a tidy check, maybe? Tidy already checks other, similar things.

@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

That PR changes both the tool and the generated file so it should not need any new manual sync... unless I screwed up and didn't properly do the same changes on both sides.

Ah right good point.

That's a good start, but ideally CI would fail if the file does not match. This could be a tidy check, maybe? Tidy already checks other, similar things.

I mean, it can be checked by trying to run ./x run src/tools/unicode-table-generator then asserting nothing is modified. But yeah, it could be a tidy check. Probably Mark can decide on what to do here.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 20, 2024
…r=Mark-Simulacrum

Register `src/tools/unicode-table-generator` as a runnable tool

It seems like `src/tools/unicode-table-generator` is not currently managed by bootstrap. This PR wires it up with bootstrap as a runnable tool.

This tool seems to take two possible args:

1. (Mandatory) path to `library/core/src/unicode/unicode_data.rs`, and
2. (Optional) path to generate a test file.

I only passed the mandatory path to `unicode_data.rs` in bootstrap and didn't do anything about (2). I'm not sure about how this tool is supposed to be run.

`Cargo.lock` is modified because I renamed `unicode-table-generator`'s bin name to match the tool name, as bootstrap's tool running logic expects the bin name to be derived from the tool name.

I also added a triagebot message to remind to not manually edit the library source file and edit the tool then regenerate instead, but this should probably be a tidy check (if that's desirable then that can be in a follow-up PR, though may be overkill).

Helps with rust-lang#131640 but does not close it because still no docs.

r? `@Mark-Simulacrum` (since I think you authored this tool?)
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 20, 2024
Rollup merge of rust-lang#131647 - jieyouxu:unicode-table-generator, r=Mark-Simulacrum

Register `src/tools/unicode-table-generator` as a runnable tool

It seems like `src/tools/unicode-table-generator` is not currently managed by bootstrap. This PR wires it up with bootstrap as a runnable tool.

This tool seems to take two possible args:

1. (Mandatory) path to `library/core/src/unicode/unicode_data.rs`, and
2. (Optional) path to generate a test file.

I only passed the mandatory path to `unicode_data.rs` in bootstrap and didn't do anything about (2). I'm not sure about how this tool is supposed to be run.

`Cargo.lock` is modified because I renamed `unicode-table-generator`'s bin name to match the tool name, as bootstrap's tool running logic expects the bin name to be derived from the tool name.

I also added a triagebot message to remind to not manually edit the library source file and edit the tool then regenerate instead, but this should probably be a tidy check (if that's desirable then that can be in a follow-up PR, though may be overkill).

Helps with rust-lang#131640 but does not close it because still no docs.

r? `@Mark-Simulacrum` (since I think you authored this tool?)
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 2, 2024
unicode_data.rs: show command for generating file

rust-lang#131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes rust-lang#131640.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 3, 2024
unicode_data.rs: show command for generating file

rust-lang#131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes rust-lang#131640.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 3, 2024
unicode_data.rs: show command for generating file

rust-lang#131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes rust-lang#131640.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 3, 2024
unicode_data.rs: show command for generating file

rust-lang#131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes rust-lang#131640.
@bors bors closed this as completed in b438a5c Nov 3, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Nov 3, 2024
Rollup merge of rust-lang#132499 - RalfJung:unicode_data.rs, r=tgross35

unicode_data.rs: show command for generating file

rust-lang#131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes rust-lang#131640.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants