diff --git a/README.md b/README.md index 3d7a821..f4ba419 100644 --- a/README.md +++ b/README.md @@ -2524,7 +2524,7 @@ For more information, see the deep-dive on
-The second object yielded by a +The first object yielded by a [`lines`](#liness-separatorsnone--line_number1-column_number1-tab_width8-kwargs) iterator, containing metadata about the line. You can add your own fields by passing them in @@ -4970,11 +4970,29 @@ best practice for any field in `LineInfo`; you should amend it, rather than set it outright. Speaking of best practices for lines modifier functions, -it's also considered good hygiene to modify the `LineInfo` -object that was yielded to you. Don't create a new one -and yield that instead. Previous lines modifier iterators -may have added fields to the `LineInfo` that you need to -preserve. +it's also best practice to *modify* the *existing* +`LineInfo` object that was yielded to you, rather than +throwing it away, creating a new one, and yielding that +instead. Previous lines modifier iterators may have added +fields to the `LineInfo` that you'd to preserve. + +### leading + line + trailing + end + +Generally speaking, `LineInfo` objects obey an invariant. +For any `(info, line)` pair yielded by `lines` or a lines +modifier: + + info.leading + line + info.trailing + info.end == info.line + +That is, you can recreate the original line by concatenating +the "leading" string, the modified line, the "trailing" string, +and the "end" string. + +However, this is no longer true when using lines modifiers that +replace characters in the line. For example, `lines_convert_tabs_to_spaces` +replaces tab characters with one or more space characters. +If the original line contains tabs, obviously the above invariant +will no longer hold true.
@@ -5607,45 +5625,59 @@ Lots of changes this time! Grouping by submodule: The old version is still available under a new name: `old_split_quoted_string`. It's deprecated, and will - eventually be removed, but not before August 2025 + eventually be removed, but not before September 2025 (one year from now). Changes: - * `split_quoted_string` used to use a hand-coded parser, - manually analyzing each character in the input text. - Now it uses `multisplit` to only examine the interesting - substrings. `multisplit` has a large startup cost - the first time you use a particular set of iterators, - but this information is cached for subsequent calls. - Bottom line, the new version is much faster - for larger workloads. (It's slower for trivial - examples... where speed doesn't matter.) + * The value it yields has changed: + * The old version yielded `(is_quote, segment)`, where + `is_quote` was a boolean value indicating whether or not + `segment` was quoted. If `segment` was quoted, it began + and ended with (single character) quote marks. To reassemble + the original string, join together all the `segment` strings + in order. + * The new version yields `(leading_quote, segment, trailing_quote)`, + where `leading_quote` and `trailing_quote` are either matching + quote marks or empty. If they're true values, the `segment` + string is inside the quotes. To reassemble the original string, + join together *all* the yielded strings in order. * The `backslash` parameter has been replaced by a new parameter, `escape`. `escape` allows specifying the escape string, which by default is '\\' (backslash). If you specify a false value, there will be no escape character in strings. - * Another benefit of switching to `multisplit`: `quotes` - now supports quote delimiters and an escape string - of any nonzero length. If more than one quote delimiter - matches at a time, `split_quoted_string` will always - pick the longer string. * By default `quotes` only contains `'`` (single-quote) and `"` (double-quote). The previous version also - supported `"""` and `'''` by default; this is no longer - true, it was too opinionated and Python-specific. + used `"""` and `'''` as multiline quote marks + by default; this is no longer true, as it was too + opinionated and Python-specific. * `split_quoted_string` also accepts a new parameter, `state`, which sets the initial state of quoting. * The `triple_quotes` parameter has been removed. (See next bullet point.) - * `split_quoted_string` is now documented as being completely - agnostic about newlines. The previous version was, too; - even though the documentation discussed triple-quoted - strings vs single-quoted strings, in reality it didn't - ever care about newlines. With the updated API, it's - officially up to you to enforce any rules here - (e.g. "newlines aren't permitted in - single-quoted strings.") + * `split_quoted_string` used to use a hand-coded parser, + manually analyzing each character in the input text. + Now it uses `multisplit` to only examine the interesting + substrings. `multisplit` has a large startup cost + the first time you use a particular set of iterators, + but this information is cached for subsequent calls. + Bottom line, the new version is much faster + for larger workloads. (It's slower for trivial + examples... where speed doesn't matter.) + * Another benefit of switching to `multisplit`: `quotes` + now supports quote delimiters and an escape string + of any nonzero length. In the case of ambiguity--if + more than one quote delimiter matches at a + time--`split_quoted_string` will always pick the + longer string. + * `split_quoted_string` is now deliberately + (and documented-ly) completely agnostic about newlines. + The previous version was, too; even though the + documentation discussed triple-quoted strings vs + single-quoted strings, in reality it didn't ever care + about newlines. With the updated API, it's officially + up to you to enforce the rules you want (e.g. "newlines + aren't permitted in single-quoted strings.") * Breaking change: `parse_delimiters` has also been completely re-tooled, re-written... *and* re-named! @@ -5653,7 +5685,8 @@ Lots of changes this time! Grouping by submodule: The old version is still available under the old name. It's deprecated, and will eventually be - removed, but not before August 2025 (one year from now). + removed, but not before September 2025. + However, the old `Delimiter` class has been renamed to `ParseDelimiter`; there's a new `Delimiter` class used by `split_delimiters`. @@ -5668,39 +5701,63 @@ Lots of changes this time! Grouping by submodule: which specifies the initial state of nested delimiters. * `split_delimiters` no longer cares if there were unclosed open delimiters at the end of the string. (It used to - raise `ValueError`.) - * `parse_delimiters` manually parsed the string, character - by character. `split_delimiters` uses `multisplit`, so it - zips past the uninteresting characters to find the delimiters - and escape characters. It's always faster, except for - some trivial calls (which are fast enough anyway). + raise `ValueError`.) This includes quote marks; if you + don't want quoted strings to span multiple lines, it's up + to you to detect it and react (e.g. raise an exception). + * `parse_delimiters` manually parsed the input string + character by character. `split_delimiters` uses `multisplit`, + so it zips past the uninteresting characters and only examines + the delimiters and escape characters. It's always faster, + except for some trivial calls (which are fast enough anyway). * Another benefit of using `multisplit`: open delimiters, close delimiters, and the escape string may now all be any nonzero length. (In the face of ambiguity, `split_delimiters` will always choose the longer delimiter.) * The `ParseDelimiter` object used with `parse_delimiters` - has a boolean `backslash` attribute; if it's True, that + has a boolean `backslash` attribute; if it was True, that delimiter allows escaping using a backslash. The new - `Delimiter` class used with `split_delimiters` instead - has an `escape=c` attribute, where `c` is the escape + `Delimiter` class used with `split_delimiters` replaces that + with an `escape=c` attribute, where `c` is the escape character you want to use with that set of delimiters. All the predefined `Delimiter` values have been updated to match. - * As mentioned above, the `Delimiter` object doesn't have - an `open` attribute. (`ParseDelimiter` still does.) - -* Breaking change: the `LineInfo` constructor has added - a new `lines` positional parameter, in front of the - existing positional parameters. This should be the - `lines` iterator that yielded this `LineInfo` object. - It's stored in the `lines` attribute. - -* New feature: `LineInfo` objects yielded by `lines` - previously had many optional fields, which might or might - not be added dynamically. Now all fields are pre-added. - (This makes the CPython 3.13 runtime happier; it really - wants you to set *all* your class's attributes in its - `__init__`.) + * As mentioned above, the new `Delimiter` object doesn't + have an `open` attribute. (`ParseDelimiter` still does.) + +* Breaking change: `lines_strip_comments` has *also* been + completely rewritten and renamed. It's now named + `lines_strip_line_comments`. + + Changes: + * The old function required quote marks and the escape string + to be single characters, and had a slightly-smelly + `triple_quotes` parameter to support multiline strings. + The new function allows quote marks to be of any length, + and has separate parameters for single-line quote marks + and multiline quote marks. + * The `backslash` parameter has been renamed to `escape`. + * The old function didn't enforce that strings shouldn't + span lines. The new version raises `SyntaxError` + if quoted strings aren't closed (unless they're explicitly + strings that support multiline). + * Breaking change to the old version: it used to write the + comment it rstripped to `info.comment`, and it threw away + any whitespace it stripped. It now obeys the modern + `LineInfo` aesthetic, and writes *both* the whitespace it + rstripped *and* the comment to `info.trailing`. + + +* Breaking change: the `LineInfo` constructor has a + new `lines` positional parameter, added *in front of* + the existing positional parameters. This new first argument + should be the `lines` iterator that yielded this + `LineInfo` object. It's stored in the `lines` attribute. + +* `LineInfo` objects (yielded by `lines`) previously had + many optional fields, which might or might not be added + dynamically. Now all fields are pre-added. (This makes + the CPython 3.13 runtime happier; it really wants you to + set *all* your class's attributes in its `__init__`.) `LineInfo` objects now always have these attributes: * `lines`, which contains the base lines iterator. @@ -5727,41 +5784,41 @@ Lots of changes this time! Grouping by submodule: was matched with a regular expression, and `None` otherwise. * `LineInfo` now has two new methods: `extend_leading` - and `extend_trailing`. These methods - move a leading or trailing substring from the current `line` - to the relevant field in `LineInfo`, maintaining all the - guaranteed invariants, and updating all related `LineInfo` - fields (like `column_number`). - -* There have been plenty of changes to line modifiers, too: - * `lines_strip_comments` has been renamed to `lines_strip_line_comments`. - It's also been improved: now it raises `SyntaxError` if quoted - strings aren't closed. - * `lines_filter_comment_lines` has been renamed to - `lines_filter_line_comment_lines`. `lines_filter_line_comment_lines` - now enforces that single-quoted strings can't span lines, - and multi-quoted strings must be closed before the end of - the last line. - * `lines_strip` and `lines_rstrip` now accept a new `separators` - argument; this is an iterable of separators, like the argument - to `multisplit`. - The default value of `None` preserves the existing behavior, - stripping whitespace. - * `lines_grep` now adds a `match` attribute to the `LineInfo` - object, containing the return value from calling `re.search`. - (If you pass in `invert=True` to `lines_grep`, `lines_grep` - will still write `None` to the `match` attribute.) - * Bugfix: `lines_strip_indent` previously required - whitespace-only lines to obey the indenting rules. - My intention was always for `lines_strip_indent` to - behave like Python, and that includes not really caring - about the intra-line-whitespace for whitespace-only - lines. Now `lines_strip_indent` behaves more like Python: - a whitespace-only line behaves as if it has - the same indent as the previous line. (Not that the - indent value of an empty line should matter; this is - mostly just there to present a consistent interface to - the user.) + and `extend_trailing`. These methods move a leading or + trailing substring from the current `line` to the relevant + field in `LineInfo`, maintaining all the guaranteed + invariants, and updating all related `LineInfo` fields + (like `column_number`). + +* `lines_filter_comment_lines` has been renamed to + `lines_filter_line_comment_lines`. `lines_filter_line_comment_lines` + now enforces that single-quoted strings can't span lines, + and multi-quoted strings must be closed before the end of + the last line. For backwards compatibility, the new function + is also available under the old name; this old name will + eventually be removed, but not before September 2025. + +* `lines_strip` and `lines_rstrip` now accept a new `separators` + argument; this is an iterable of separators, like the argument + to `multisplit`. + The default value of `None` preserves the existing behavior, + stripping whitespace. + +* `lines_grep` now writes to the `match` attribute to the `LineInfo` + object, containing the return value from calling `re.search`. + (If you pass in `invert=True` to `lines_grep`, `lines_grep` + still writes to the `match` attribute--but it always writes `None`.) + +* Bugfix: `lines_strip_indent` previously required + whitespace-only lines to obey the indenting rules, which was + a mistake. My intention was always for `lines_strip_indent` + to behave like Python, and that includes not really caring + about the intra-line-whitespace for whitespace-only + lines. Now `lines_strip_indent` behaves more like Python: + a whitespace-only line behaves as if it has + the same indent as the previous line. (Not that the + indent value of an empty line should matter--but this + behavior is how you'd intuitively expect it to work.) * New function: `split_title_case`, which splits a string at title case change word boundaries. @@ -5791,7 +5848,7 @@ Lots of changes this time! Grouping by submodule: * Another minor speedup for `multisplit`: when `reverse=True`, we used to reverse the results *three times!* We now explicitly - observe and manage the reverse state of the result and avoid + observe and manage the reverse state of the result, to avoid needless reversing. ### scheduler diff --git a/big/text.py b/big/text.py index fc24f99..d60c726 100644 --- a/big/text.py +++ b/big/text.py @@ -1819,7 +1819,8 @@ def old_split_quoted_strings(s, quotes=None, *, triple_quotes=True, backslash=No Returns an iterator yielding 2-tuples: (is_quoted, segment) where segment is a substring of s, and is_quoted is true if the segment is - quoted. Joining all the segments together recreates s. + quoted. Joining all the segments together recreates s. (The segment + strings include the quote marks.) If triple_quotes is true, supports "triple-quoted" strings like Python. @@ -2738,38 +2739,38 @@ def __init__(self, lines, line, line_number, column_number, *, leading=None, tra elif is_str: empty = '' else: - raise TypeError("line must be str or bytes") + raise TypeError(f"line must be str or bytes, not {line!r}") if not isinstance(line_number, int): - raise TypeError("line_number must be int") + raise TypeError(f"line_number must be int, not {line_number!r}") if not isinstance(column_number, int): - raise TypeError("column_number must be int") + raise TypeError(f"column_number must be int, not {column_number!r}") line_type = type(line) if leading == None: leading = empty elif not isinstance(leading, line_type): - raise TypeError("leading must be same type as line or None") + raise TypeError(f"leading must be same type as line or None, not {leading!r}") if trailing == None: trailing = empty elif not isinstance(trailing, line_type): - raise TypeError("trailing must be same type as line or None") + raise TypeError(f"trailing must be same type as line or None, not {trailing!r}") if end == None: end = empty elif not isinstance(end, line_type): - raise TypeError("end must be same type as line or None") + raise TypeError(f"end must be same type as line or None, not {end!r}") self.lines = lines self.line = line self.line_number = line_number self.column_number = column_number - self.indent = None self.leading = leading self.trailing = trailing self.end = end + self.indent = None self.match = None self._is_bytes = is_bytes self.__dict__.update(kwargs) @@ -2779,7 +2780,7 @@ def detab(self, s): def extend_leading(self, s, line): if isinstance(s, int): - assert -len(line) <= s < len(line) + assert -len(line) <= s < len(line), f"extend_leading invalid parameters: s={s!r} line={line!r}" s = line[:s] else: assert line.startswith(s), f"line {line!r} doesn't start with s {s!r}" @@ -2793,7 +2794,7 @@ def extend_leading(self, s, line): def extend_trailing(self, s, line): if isinstance(s, int): - assert -len(line) <= s < len(line) + assert -len(line) <= s < len(line), f"extend_trailing invalid parameters: s={s!r} line={line!r}" s = line[-s:] else: assert line.endswith(s), f"line {line!r} doesn't end with s {s!r}" @@ -2804,7 +2805,7 @@ def extend_trailing(self, s, line): def __repr__(self): names = list(self.__dict__) - priority_names = ['line', 'lines', 'line_number', 'column_number', 'leading', 'trailing', 'end'] + priority_names = ['lines', 'line', 'line_number', 'column_number', 'leading', 'trailing', 'end'] fields = [] for name in priority_names: names.remove(name) @@ -3023,11 +3024,11 @@ def lines_strip(li, separators=None): yield (info, line) -def lines_filter_line_comment_lines(li, comment_re): +def lines_filter_line_comment_lines(li, match): "The generator function returned by the public lines_filter_line_comment_lines function." for info, line in li: s = line.lstrip() - if comment_re.match(s): + if match(s): continue yield (info, line) @@ -3067,8 +3068,12 @@ def lines_filter_line_comment_lines(li, comment_markers): comment_pattern = _separators_to_re(comment_markers, comment_markers_is_bytes, separate=False, keep=False) comment_re = re.compile(comment_pattern) - return _lines_filter_line_comment_lines(li, comment_re) + return _lines_filter_line_comment_lines(li, comment_re.match) +# old, deprecated name. +# will eventually be deleted, but not before September 2025. +lines_filter_comment_lines = lines_filter_line_comment_lines +_export_name("lines_filter_comment_lines") @_export def lines_containing(li, s, *, invert=False): @@ -3242,18 +3247,18 @@ def lines_strip_line_comments(li, line_comment_markers, *, A lines modifier function. Strips line comments from the lines of a "lines iterator". Line comments are substrings beginning with a special marker that mean the rest of the line should be - ignored; lines_strip_comments truncates the line at the + ignored; lines_strip_line_comments truncates the line at the beginning of the leftmost line comment marker. line_comment_markers should be an iterable of line comment marker strings. These are strings that denote a "line comment", which is to say, a comment that extends from the marker to the - end of the line. lines_strip_comments truncates each line + end of the line. lines_strip_line_comments truncates each line starting at the beginning of the leftmost line comment marker found on that line. If quotes is true, it must be an iterable of quote marker - strings, length 1 or more. lines_strip_comments will + strings, length 1 or more. lines_strip_line_comments will parse the line using big's split_quoted_strings function, and ignore comment characters inside quoted strings. If quotes is false, quote characters are ignored and @@ -3261,7 +3266,7 @@ def lines_strip_line_comments(li, line_comment_markers, *, By default quotes is the tuple ("'", '"'). Quoted strings delimited by markers in quotes may not span lines; if a line ends with an unterminated quoted string, - lines_strip_comments will raise a SyntaxError. + lines_strip_line_comments will raise a SyntaxError. If escape is true, it must be a string. This string will "escape" (quote) quote markers, as per backslash @@ -3276,18 +3281,18 @@ def lines_strip_line_comments(li, line_comment_markers, *, in (conventional) quotes are not allowed to. By default multiline_quotes is an empty string. - If rstrip is true (the default), lines_strip_comments calls - the rstrip() method on line after it truncates the line. + If rstrip is true (the default), lines_strip_line_comments + calls the rstrip() method on line after it truncates the line. Updates LineInfo.comment and LineInfo.trailing as appropriate. - What's the difference between lines_strip_comments and + What's the difference between lines_strip_line_comments and lines_filter_comment_lines? * lines_filter_comment_lines only recognizes lines that *start* with a comment separator (ignoring leading whitespace). Also, it filters out those lines completely, rather than modifying the line. - * lines_strip_comments handles comment characters + * lines_strip_line_comments handles comment characters anywhere in the line, although it can ignore comments inside quoted strings. It truncates the line but still always yields the line. @@ -3323,8 +3328,109 @@ def lines_strip_line_comments(li, line_comment_markers, *, return _lines_strip_line_comments(li, line_comment_splitter, quotes, multiline_quotes, escape, rstrip, empty_join) -lines_strip_comments = lines_strip_line_comments -_export_name("lines_strip_comments") + +# backwards compatibility +@_export +def lines_strip_comments(li, comment_separators, *, quotes=('"', "'"), backslash='\\', rstrip=True, triple_quotes=True): + """ + NOTE: This function is deprecated. Please use + lines_strip_line_comments instead. lines_strip_comments + will eventually be removed from big, no sooner than September 2025. + + A lines modifier function. Strips comments from the lines + of a "lines iterator". Comments are substrings that indicate + the rest of the line should be ignored; lines_strip_comments + truncates the line at the beginning of the leftmost comment + separator. + + If rstrip is true (the default), lines_strip_comments calls + the rstrip() method on line after it truncates the line. + + If quotes is true, it must be an iterable of quote characters. + (Each quote character MUST be a single character.) + lines_strip_comments will parse the line and ignore comment + characters inside quoted strings. If quotes is false, + quote characters are ignored and line_strip_comments will + truncate anywhere in the line. + + backslash and triple_quotes are passed in to + split_quoted_string, which is used internally to detect + the quoted strings in the line. + + Sets a new field on the associated LineInfo object for every line: + * comment - the comment stripped from the line, if any. + if no comment was found, "comment" will be an empty string. + + What's the difference between lines_strip_comments and + lines_filter_comment_lines? + * lines_filter_comment_lines only recognizes lines that + *start* with a comment separator (ignoring leading + whitespace). Also, it filters out those lines + completely, rather than modifying the line. + * lines_strip_comments handles comment characters + anywhere in the line, although it can ignore + comments inside quoted strings. It truncates the + line but still always yields the line. + + Composable with all the lines_ modifier functions in the big.text module. + """ + if not comment_separators: + raise ValueError("illegal comment_separators") + + if isinstance(comment_separators, bytes): + comment_separators = _iterate_over_bytes(comment_separators) + comment_separators_is_bytes = True + else: + comment_separators_is_bytes = isinstance(comment_separators[0], bytes) + comment_separators = tuple(comment_separators) + + if comment_separators_is_bytes: + empty = b'' + else: + empty = '' + empty_join = empty.join + + comment_pattern = __separators_to_re(comment_separators, separators_is_bytes=comment_separators_is_bytes, separate=True, keep=True) + re_comment = re.compile(comment_pattern) + split = re_comment.split + + + def lines_strip_comments(li, split, quotes, backslash, rstrip, triple_quotes): + for info, line in li: + if quotes: + i = old_split_quoted_strings(line, quotes, backslash=backslash, triple_quotes=triple_quotes) + else: + i = ((False, line),) + + # iterate over the line until we either hit the end or find a comment. + segments = [] + append = segments.append + for is_quoted, segment in i: + if is_quoted: + append(segment) + continue + + fields = split(segment, maxsplit=1) + leading = fields[0] + if len(fields) == 1: + append(leading) + continue + + # found a comment marker in an unquoted segment! + if rstrip: + leading = leading.rstrip() + append(leading) + break + + keeping = sum(len(s) for s in segments) + removing = len(line) - keeping + if removing: + line = info.extend_trailing(removing, line) + yield (info, line) + return lines_strip_comments(li, split, quotes, backslash, rstrip, triple_quotes) + + + @_export def lines_convert_tabs_to_spaces(li): diff --git a/tests/test_text.py b/tests/test_text.py index 1d0438e..4b9308e 100644 --- a/tests/test_text.py +++ b/tests/test_text.py @@ -3308,6 +3308,9 @@ def test(i, expected, *, test_reconstituted_line=True): print("got:") pprint.pprint(got) print("\n\n") + for e, g in zip(expected, got): + print(e==g) + print("\n\n") self.assertEqual(expected, got) def L(line, line_number, column_number=1, end='\n', final=None, **kwargs): @@ -3694,6 +3697,55 @@ def test_and_remove_lineinfo_match(i, substring, *, invert=False, match='match') ] ) + + ## + ## testing for a deprecated function! + ## lines_strip_comments + ## + lines = big.lines(""" +for x in range(5): # this is a comment + print("# this is quoted", x) + print("") # this "comment" is useless + print(no_comments_or_quotes_on_this_line) +"""[1:]) + test(big.lines_strip_comments(lines, ("#", "//")), + [ + L(line='for x in range(5): # this is a comment', line_number=1, column_number=1, trailing=' # this is a comment', final='for x in range(5):'), + L(line=' print("# this is quoted", x)', line_number=2, column_number=1), + L(line=' print("") # this "comment" is useless', line_number=3, column_number=1, trailing=' # this "comment" is useless', final=' print("")'), + L(line=' print(no_comments_or_quotes_on_this_line)', line_number=4, column_number=1), + L(line='', line_number=5, column_number=1, end=''), + ]) + + # don't get alarmed! we intentionally break quote characters in this test. + lines = big.lines(""" +for x in range(5): # this is a comment + print("# this is quoted", x) + print("") # this "comment" is useless + print(no_comments_or_quotes_on_this_line) +"""[1:]) + test(big.lines_strip_comments(lines, ("#", "//"), quotes=None), + [ + L(line='for x in range(5): # this is a comment', line_number=1, column_number=1, trailing=' # this is a comment', final='for x in range(5):'), + L(line=' print("# this is quoted", x)', line_number=2, column_number=1, trailing='# this is quoted", x)', final=' print("'), + L(line=' print("") # this "comment" is useless', line_number=3, column_number=1, trailing=' # this "comment" is useless', final=' print("")'), + L(line=' print(no_comments_or_quotes_on_this_line)', line_number=4, column_number=1), + L(line='', line_number=5, column_number=1, end=''), + ]) + + with self.assertRaises(ValueError): + list(big.lines_strip_comments(big.lines("a\nb\n"), None)) + + lines = big.lines(b"a\nb# ignored\n c") + test(big.lines_strip_comments(lines, b'#'), + [ + L(b'a', 1), + L(b'b# ignored', 2, 1, trailing=b'# ignored', final=b'b'), + L(b' c', 3, end=b''), + ] + ) + + lines = big.lines( " \n" + " a = b \n" + @@ -3713,7 +3765,7 @@ def test_and_remove_lineinfo_match(i, substring, *, invert=False, match='match') def test_lines_strip_indent(self): self.maxDiff = 2**32 - def assert_lines_reconstitutes_properly(i): + def assert_line_reconstitutes_properly(i): for t in i: info, line = t reconstituted_line = info.leading + line + info.trailing @@ -3721,8 +3773,9 @@ def assert_lines_reconstitutes_properly(i): yield t def test(lines, expected, *, tab_width=8): - lines = big.lines(lines, tab_width=tab_width) - i = assert_lines_reconstitutes_properly(big.lines_strip_indent(lines)) + if not isinstance(lines, types.GeneratorType): + lines = big.lines(lines, tab_width=tab_width) + i = assert_line_reconstitutes_properly(big.lines_strip_indent(lines)) got = list(i) # fixup lines objects @@ -3899,6 +3952,8 @@ def LineInfo(lines, line, line_number, column_number, end=_sentinel, **kwargs): # with self.assertRaises(ValueError): # test("first line\n \u3000 second line\nthird line\n", []) + + def test_lines_misc(self): ## error handling with self.assertRaises(TypeError):