Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pr3994 review suggestions #1

Merged
merged 1 commit into from
Sep 21, 2023
Merged

Conversation

achabense
Copy link

(will add comment for each change below)

> add comment for `fill_range`
> rename some variables
> type: bool~>int
> refine names for the two function; simplify comments
> better message
UNICODE_TABLE_SIZE: int = MAX_UNICODE_POINT + 1
class UnicodeWidthTable:
# A valid Unicode code point won't exceed MAX_CODE_POINT.
MAX_CODE_POINT: int = 0x10FFFF
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(code point is a thing)

Whenever a code point's width differs from the previous one,
the function print the code point to indicate the start of a new range.
"""
def print_width_estimate_intervals(self):
Copy link
Author

@achabense achabense Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the [I]!=[I-1]->print I logic and the assertion already made things much clearer, with renaming I think it's ok not to add extra comments.

@@ -140,7 +130,7 @@ def get_width(str: str):
line = line.strip()
if line and not line.startswith("#"):
match = LINE_REGEX.fullmatch(line)
assert match, "invalid line"
assert match, line # invalid line
Copy link
Author

@achabense achabense Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will print line details when things goes wrong.
(Actually, I've found that the regex has been slightly outdated due to some added whitespaces in EastAsianWidth-15.1.0.txt (in https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt; which is recently updated))
(I think we don't have to update the regex eagerly in this pr as it still works for 15.0.0)
image
(Was 0000..001F;N # Cc [32] <control-0000>..<control-001F> ,
|now 0000..001F ; N # Cc [32] <control-0000>..<control-001F>)

@@ -129,6 +117,8 @@ def read_from(source: TextIO) -> UnicodeTable:

# Read explicitly assigned ranges.
# The lines that are not empty or pure comment are uniformly of the format "HEX(..HEX)?;(A|F|H|N|Na|W) #comment".
LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A|F|H|N|Na|W) *#.*")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved back to read_from as it is used and documented here.

@@ -9,38 +9,30 @@
from pathlib import Path


LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A|F|H|N|Na|W) *#.*")
Copy link
Author

@achabense achabense Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved here

@fsb4000 fsb4000 merged commit c1f1aa6 into fsb4000:fix3908 Sep 21, 2023
@achabense achabense deleted the review3994 branch September 22, 2023 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants