pr3994 review suggestions #1

achabense · 2023-09-21T17:13:55Z

(will add comment for each change below)

> add comment for `fill_range` > rename some variables > type: bool~>int > refine names for the two function; simplify comments > better message

achabense · 2023-09-21T17:14:49Z

tools/unicode_properties_parse/format_width_estimate_intervals.py

-    UNICODE_TABLE_SIZE: int = MAX_UNICODE_POINT + 1
+class UnicodeWidthTable:
+    # A valid Unicode code point won't exceed MAX_CODE_POINT.
+    MAX_CODE_POINT: int = 0x10FFFF


(code point is a thing)

achabense · 2023-09-21T17:16:24Z

tools/unicode_properties_parse/format_width_estimate_intervals.py

-        Whenever a code point's width differs from the previous one,
-        the function print the code point to indicate the start of a new range.
-        """
+    def print_width_estimate_intervals(self):


As the [I]!=[I-1]->print I logic and the assertion already made things much clearer, with renaming I think it's ok not to add extra comments.

achabense · 2023-09-21T17:18:25Z

tools/unicode_properties_parse/format_width_estimate_intervals.py

@@ -140,7 +130,7 @@ def get_width(str: str):
        line = line.strip()
        if line and not line.startswith("#"):
            match = LINE_REGEX.fullmatch(line)
-            assert match, "invalid line"
+            assert match, line # invalid line


This will print line details when things goes wrong.
(Actually, I've found that the regex has been slightly outdated due to some added whitespaces in EastAsianWidth-15.1.0.txt (in https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt; which is recently updated))
(I think we don't have to update the regex eagerly in this pr as it still works for 15.0.0)

(Was 0000..001F;N # Cc [32] <control-0000>..<control-001F> ,
|now 0000..001F ; N # Cc [32] <control-0000>..<control-001F>)

achabense · 2023-09-21T17:20:43Z

tools/unicode_properties_parse/format_width_estimate_intervals.py

@@ -129,6 +117,8 @@ def read_from(source: TextIO) -> UnicodeTable:

    # Read explicitly assigned ranges.
    # The lines that are not empty or pure comment are uniformly of the format "HEX(..HEX)?;(A|F|H|N|Na|W) #comment".
+    LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A|F|H|N|Na|W) *#.*")


This should be moved back to read_from as it is used and documented here.

achabense · 2023-09-21T17:29:44Z

tools/unicode_properties_parse/format_width_estimate_intervals.py

@@ -9,38 +9,30 @@
 from pathlib import Path


-LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A|F|H|N|Na|W) *#.*")


> restore the location of LINE_REGEX

294f157

> add comment for `fill_range` > rename some variables > type: bool~>int > refine names for the two function; simplify comments > better message

achabense commented Sep 21, 2023

View reviewed changes

achabense mentioned this pull request Sep 21, 2023

P2675R1 <format> generator converted from C++ to Python microsoft/STL#3994

Merged

achabense commented Sep 21, 2023

View reviewed changes

fsb4000 merged commit c1f1aa6 into fsb4000:fix3908 Sep 21, 2023

achabense deleted the review3994 branch September 22, 2023 04:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pr3994 review suggestions #1

pr3994 review suggestions #1

achabense commented Sep 21, 2023

achabense Sep 21, 2023

achabense Sep 21, 2023 •

edited

Loading

achabense Sep 21, 2023 •

edited

Loading

achabense Sep 21, 2023

achabense Sep 21, 2023 •

edited

Loading

		@@ -9,38 +9,30 @@
		from pathlib import Path


		LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A\|F\|H\|N\|Na\|W) #.")

pr3994 review suggestions #1

pr3994 review suggestions #1

Conversation

achabense commented Sep 21, 2023

achabense Sep 21, 2023

Choose a reason for hiding this comment

achabense Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

achabense Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

achabense Sep 21, 2023

Choose a reason for hiding this comment

achabense Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

achabense Sep 21, 2023 •

edited

Loading

achabense Sep 21, 2023 •

edited

Loading

achabense Sep 21, 2023 •

edited

Loading