-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pr3994 review suggestions #1
Conversation
> add comment for `fill_range` > rename some variables > type: bool~>int > refine names for the two function; simplify comments > better message
UNICODE_TABLE_SIZE: int = MAX_UNICODE_POINT + 1 | ||
class UnicodeWidthTable: | ||
# A valid Unicode code point won't exceed MAX_CODE_POINT. | ||
MAX_CODE_POINT: int = 0x10FFFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(code point
is a thing)
Whenever a code point's width differs from the previous one, | ||
the function print the code point to indicate the start of a new range. | ||
""" | ||
def print_width_estimate_intervals(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the [I]!=[I-1]->print I
logic and the assertion already made things much clearer, with renaming I think it's ok not to add extra comments.
@@ -140,7 +130,7 @@ def get_width(str: str): | |||
line = line.strip() | |||
if line and not line.startswith("#"): | |||
match = LINE_REGEX.fullmatch(line) | |||
assert match, "invalid line" | |||
assert match, line # invalid line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will print line details when things goes wrong.
(Actually, I've found that the regex has been slightly outdated due to some added whitespaces in EastAsianWidth-15.1.0.txt
(in https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt; which is recently updated))
(I think we don't have to update the regex eagerly in this pr as it still works for 15.0.0)
(Was 0000..001F;N # Cc [32] <control-0000>..<control-001F>
,
|now 0000..001F ; N # Cc [32] <control-0000>..<control-001F>
)
@@ -129,6 +117,8 @@ def read_from(source: TextIO) -> UnicodeTable: | |||
|
|||
# Read explicitly assigned ranges. | |||
# The lines that are not empty or pure comment are uniformly of the format "HEX(..HEX)?;(A|F|H|N|Na|W) #comment". | |||
LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A|F|H|N|Na|W) *#.*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved back to read_from
as it is used and documented here.
@@ -9,38 +9,30 @@ | |||
from pathlib import Path | |||
|
|||
|
|||
LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A|F|H|N|Na|W) *#.*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved here
(will add comment for each change below)