diff --git a/README.md b/README.md
index 3d7a821..f4ba419 100644
--- a/README.md
+++ b/README.md
@@ -2524,7 +2524,7 @@ For more information, see the deep-dive on
-
-The second object yielded by a
+The first object yielded by a
[`lines`](#liness-separatorsnone--line_number1-column_number1-tab_width8-kwargs)
iterator, containing metadata about the line.
You can add your own fields by passing them in
@@ -4970,11 +4970,29 @@ best practice for any field in `LineInfo`; you should amend
it, rather than set it outright.
Speaking of best practices for lines modifier functions,
-it's also considered good hygiene to modify the `LineInfo`
-object that was yielded to you. Don't create a new one
-and yield that instead. Previous lines modifier iterators
-may have added fields to the `LineInfo` that you need to
-preserve.
+it's also best practice to *modify* the *existing*
+`LineInfo` object that was yielded to you, rather than
+throwing it away, creating a new one, and yielding that
+instead. Previous lines modifier iterators may have added
+fields to the `LineInfo` that you'd to preserve.
+
+### leading + line + trailing + end
+
+Generally speaking, `LineInfo` objects obey an invariant.
+For any `(info, line)` pair yielded by `lines` or a lines
+modifier:
+
+ info.leading + line + info.trailing + info.end == info.line
+
+That is, you can recreate the original line by concatenating
+the "leading" string, the modified line, the "trailing" string,
+and the "end" string.
+
+However, this is no longer true when using lines modifiers that
+replace characters in the line. For example, `lines_convert_tabs_to_spaces`
+replaces tab characters with one or more space characters.
+If the original line contains tabs, obviously the above invariant
+will no longer hold true.
@@ -5607,45 +5625,59 @@ Lots of changes this time! Grouping by submodule:
The old version is still available under a new
name: `old_split_quoted_string`. It's deprecated, and will
- eventually be removed, but not before August 2025
+ eventually be removed, but not before September 2025
(one year from now).
Changes:
- * `split_quoted_string` used to use a hand-coded parser,
- manually analyzing each character in the input text.
- Now it uses `multisplit` to only examine the interesting
- substrings. `multisplit` has a large startup cost
- the first time you use a particular set of iterators,
- but this information is cached for subsequent calls.
- Bottom line, the new version is much faster
- for larger workloads. (It's slower for trivial
- examples... where speed doesn't matter.)
+ * The value it yields has changed:
+ * The old version yielded `(is_quote, segment)`, where
+ `is_quote` was a boolean value indicating whether or not
+ `segment` was quoted. If `segment` was quoted, it began
+ and ended with (single character) quote marks. To reassemble
+ the original string, join together all the `segment` strings
+ in order.
+ * The new version yields `(leading_quote, segment, trailing_quote)`,
+ where `leading_quote` and `trailing_quote` are either matching
+ quote marks or empty. If they're true values, the `segment`
+ string is inside the quotes. To reassemble the original string,
+ join together *all* the yielded strings in order.
* The `backslash` parameter has been replaced by a
new parameter, `escape`.
`escape` allows specifying the escape string, which
by default is '\\' (backslash). If you specify a false
value, there will be no escape character in strings.
- * Another benefit of switching to `multisplit`: `quotes`
- now supports quote delimiters and an escape string
- of any nonzero length. If more than one quote delimiter
- matches at a time, `split_quoted_string` will always
- pick the longer string.
* By default `quotes` only contains `'`` (single-quote)
and `"` (double-quote). The previous version also
- supported `"""` and `'''` by default; this is no longer
- true, it was too opinionated and Python-specific.
+ used `"""` and `'''` as multiline quote marks
+ by default; this is no longer true, as it was too
+ opinionated and Python-specific.
* `split_quoted_string` also accepts a new parameter,
`state`, which sets the initial state of quoting.
* The `triple_quotes` parameter has been removed. (See
next bullet point.)
- * `split_quoted_string` is now documented as being completely
- agnostic about newlines. The previous version was, too;
- even though the documentation discussed triple-quoted
- strings vs single-quoted strings, in reality it didn't
- ever care about newlines. With the updated API, it's
- officially up to you to enforce any rules here
- (e.g. "newlines aren't permitted in
- single-quoted strings.")
+ * `split_quoted_string` used to use a hand-coded parser,
+ manually analyzing each character in the input text.
+ Now it uses `multisplit` to only examine the interesting
+ substrings. `multisplit` has a large startup cost
+ the first time you use a particular set of iterators,
+ but this information is cached for subsequent calls.
+ Bottom line, the new version is much faster
+ for larger workloads. (It's slower for trivial
+ examples... where speed doesn't matter.)
+ * Another benefit of switching to `multisplit`: `quotes`
+ now supports quote delimiters and an escape string
+ of any nonzero length. In the case of ambiguity--if
+ more than one quote delimiter matches at a
+ time--`split_quoted_string` will always pick the
+ longer string.
+ * `split_quoted_string` is now deliberately
+ (and documented-ly) completely agnostic about newlines.
+ The previous version was, too; even though the
+ documentation discussed triple-quoted strings vs
+ single-quoted strings, in reality it didn't ever care
+ about newlines. With the updated API, it's officially
+ up to you to enforce the rules you want (e.g. "newlines
+ aren't permitted in single-quoted strings.")
* Breaking change: `parse_delimiters` has also been
completely re-tooled, re-written... *and* re-named!
@@ -5653,7 +5685,8 @@ Lots of changes this time! Grouping by submodule:
The old version is still available under the old
name. It's deprecated, and will eventually be
- removed, but not before August 2025 (one year from now).
+ removed, but not before September 2025.
+
However, the old `Delimiter` class has been
renamed to `ParseDelimiter`; there's a new
`Delimiter` class used by `split_delimiters`.
@@ -5668,39 +5701,63 @@ Lots of changes this time! Grouping by submodule:
which specifies the initial state of nested delimiters.
* `split_delimiters` no longer cares if there were unclosed
open delimiters at the end of the string. (It used to
- raise `ValueError`.)
- * `parse_delimiters` manually parsed the string, character
- by character. `split_delimiters` uses `multisplit`, so it
- zips past the uninteresting characters to find the delimiters
- and escape characters. It's always faster, except for
- some trivial calls (which are fast enough anyway).
+ raise `ValueError`.) This includes quote marks; if you
+ don't want quoted strings to span multiple lines, it's up
+ to you to detect it and react (e.g. raise an exception).
+ * `parse_delimiters` manually parsed the input string
+ character by character. `split_delimiters` uses `multisplit`,
+ so it zips past the uninteresting characters and only examines
+ the delimiters and escape characters. It's always faster,
+ except for some trivial calls (which are fast enough anyway).
* Another benefit of using `multisplit`: open delimiters,
close delimiters, and the escape string may now all be
any nonzero length. (In the face of ambiguity,
`split_delimiters` will always choose the longer delimiter.)
* The `ParseDelimiter` object used with `parse_delimiters`
- has a boolean `backslash` attribute; if it's True, that
+ has a boolean `backslash` attribute; if it was True, that
delimiter allows escaping using a backslash. The new
- `Delimiter` class used with `split_delimiters` instead
- has an `escape=c` attribute, where `c` is the escape
+ `Delimiter` class used with `split_delimiters` replaces that
+ with an `escape=c` attribute, where `c` is the escape
character you want to use with that set of delimiters.
All the predefined `Delimiter` values have been updated
to match.
- * As mentioned above, the `Delimiter` object doesn't have
- an `open` attribute. (`ParseDelimiter` still does.)
-
-* Breaking change: the `LineInfo` constructor has added
- a new `lines` positional parameter, in front of the
- existing positional parameters. This should be the
- `lines` iterator that yielded this `LineInfo` object.
- It's stored in the `lines` attribute.
-
-* New feature: `LineInfo` objects yielded by `lines`
- previously had many optional fields, which might or might
- not be added dynamically. Now all fields are pre-added.
- (This makes the CPython 3.13 runtime happier; it really
- wants you to set *all* your class's attributes in its
- `__init__`.)
+ * As mentioned above, the new `Delimiter` object doesn't
+ have an `open` attribute. (`ParseDelimiter` still does.)
+
+* Breaking change: `lines_strip_comments` has *also* been
+ completely rewritten and renamed. It's now named
+ `lines_strip_line_comments`.
+
+ Changes:
+ * The old function required quote marks and the escape string
+ to be single characters, and had a slightly-smelly
+ `triple_quotes` parameter to support multiline strings.
+ The new function allows quote marks to be of any length,
+ and has separate parameters for single-line quote marks
+ and multiline quote marks.
+ * The `backslash` parameter has been renamed to `escape`.
+ * The old function didn't enforce that strings shouldn't
+ span lines. The new version raises `SyntaxError`
+ if quoted strings aren't closed (unless they're explicitly
+ strings that support multiline).
+ * Breaking change to the old version: it used to write the
+ comment it rstripped to `info.comment`, and it threw away
+ any whitespace it stripped. It now obeys the modern
+ `LineInfo` aesthetic, and writes *both* the whitespace it
+ rstripped *and* the comment to `info.trailing`.
+
+
+* Breaking change: the `LineInfo` constructor has a
+ new `lines` positional parameter, added *in front of*
+ the existing positional parameters. This new first argument
+ should be the `lines` iterator that yielded this
+ `LineInfo` object. It's stored in the `lines` attribute.
+
+* `LineInfo` objects (yielded by `lines`) previously had
+ many optional fields, which might or might not be added
+ dynamically. Now all fields are pre-added. (This makes
+ the CPython 3.13 runtime happier; it really wants you to
+ set *all* your class's attributes in its `__init__`.)
`LineInfo` objects now always have these attributes:
* `lines`, which contains the base lines iterator.
@@ -5727,41 +5784,41 @@ Lots of changes this time! Grouping by submodule:
was matched with a regular expression, and `None` otherwise.
* `LineInfo` now has two new methods: `extend_leading`
- and `extend_trailing`. These methods
- move a leading or trailing substring from the current `line`
- to the relevant field in `LineInfo`, maintaining all the
- guaranteed invariants, and updating all related `LineInfo`
- fields (like `column_number`).
-
-* There have been plenty of changes to line modifiers, too:
- * `lines_strip_comments` has been renamed to `lines_strip_line_comments`.
- It's also been improved: now it raises `SyntaxError` if quoted
- strings aren't closed.
- * `lines_filter_comment_lines` has been renamed to
- `lines_filter_line_comment_lines`. `lines_filter_line_comment_lines`
- now enforces that single-quoted strings can't span lines,
- and multi-quoted strings must be closed before the end of
- the last line.
- * `lines_strip` and `lines_rstrip` now accept a new `separators`
- argument; this is an iterable of separators, like the argument
- to `multisplit`.
- The default value of `None` preserves the existing behavior,
- stripping whitespace.
- * `lines_grep` now adds a `match` attribute to the `LineInfo`
- object, containing the return value from calling `re.search`.
- (If you pass in `invert=True` to `lines_grep`, `lines_grep`
- will still write `None` to the `match` attribute.)
- * Bugfix: `lines_strip_indent` previously required
- whitespace-only lines to obey the indenting rules.
- My intention was always for `lines_strip_indent` to
- behave like Python, and that includes not really caring
- about the intra-line-whitespace for whitespace-only
- lines. Now `lines_strip_indent` behaves more like Python:
- a whitespace-only line behaves as if it has
- the same indent as the previous line. (Not that the
- indent value of an empty line should matter; this is
- mostly just there to present a consistent interface to
- the user.)
+ and `extend_trailing`. These methods move a leading or
+ trailing substring from the current `line` to the relevant
+ field in `LineInfo`, maintaining all the guaranteed
+ invariants, and updating all related `LineInfo` fields
+ (like `column_number`).
+
+* `lines_filter_comment_lines` has been renamed to
+ `lines_filter_line_comment_lines`. `lines_filter_line_comment_lines`
+ now enforces that single-quoted strings can't span lines,
+ and multi-quoted strings must be closed before the end of
+ the last line. For backwards compatibility, the new function
+ is also available under the old name; this old name will
+ eventually be removed, but not before September 2025.
+
+* `lines_strip` and `lines_rstrip` now accept a new `separators`
+ argument; this is an iterable of separators, like the argument
+ to `multisplit`.
+ The default value of `None` preserves the existing behavior,
+ stripping whitespace.
+
+* `lines_grep` now writes to the `match` attribute to the `LineInfo`
+ object, containing the return value from calling `re.search`.
+ (If you pass in `invert=True` to `lines_grep`, `lines_grep`
+ still writes to the `match` attribute--but it always writes `None`.)
+
+* Bugfix: `lines_strip_indent` previously required
+ whitespace-only lines to obey the indenting rules, which was
+ a mistake. My intention was always for `lines_strip_indent`
+ to behave like Python, and that includes not really caring
+ about the intra-line-whitespace for whitespace-only
+ lines. Now `lines_strip_indent` behaves more like Python:
+ a whitespace-only line behaves as if it has
+ the same indent as the previous line. (Not that the
+ indent value of an empty line should matter--but this
+ behavior is how you'd intuitively expect it to work.)
* New function: `split_title_case`, which splits a string
at title case change word boundaries.
@@ -5791,7 +5848,7 @@ Lots of changes this time! Grouping by submodule:
* Another minor speedup for `multisplit`: when `reverse=True`,
we used to reverse the results *three times!* We now explicitly
- observe and manage the reverse state of the result and avoid
+ observe and manage the reverse state of the result, to avoid
needless reversing.
### scheduler
diff --git a/big/text.py b/big/text.py
index fc24f99..d60c726 100644
--- a/big/text.py
+++ b/big/text.py
@@ -1819,7 +1819,8 @@ def old_split_quoted_strings(s, quotes=None, *, triple_quotes=True, backslash=No
Returns an iterator yielding 2-tuples:
(is_quoted, segment)
where segment is a substring of s, and is_quoted is true if the segment is
- quoted. Joining all the segments together recreates s.
+ quoted. Joining all the segments together recreates s. (The segment
+ strings include the quote marks.)
If triple_quotes is true, supports "triple-quoted" strings like Python.
@@ -2738,38 +2739,38 @@ def __init__(self, lines, line, line_number, column_number, *, leading=None, tra
elif is_str:
empty = ''
else:
- raise TypeError("line must be str or bytes")
+ raise TypeError(f"line must be str or bytes, not {line!r}")
if not isinstance(line_number, int):
- raise TypeError("line_number must be int")
+ raise TypeError(f"line_number must be int, not {line_number!r}")
if not isinstance(column_number, int):
- raise TypeError("column_number must be int")
+ raise TypeError(f"column_number must be int, not {column_number!r}")
line_type = type(line)
if leading == None:
leading = empty
elif not isinstance(leading, line_type):
- raise TypeError("leading must be same type as line or None")
+ raise TypeError(f"leading must be same type as line or None, not {leading!r}")
if trailing == None:
trailing = empty
elif not isinstance(trailing, line_type):
- raise TypeError("trailing must be same type as line or None")
+ raise TypeError(f"trailing must be same type as line or None, not {trailing!r}")
if end == None:
end = empty
elif not isinstance(end, line_type):
- raise TypeError("end must be same type as line or None")
+ raise TypeError(f"end must be same type as line or None, not {end!r}")
self.lines = lines
self.line = line
self.line_number = line_number
self.column_number = column_number
- self.indent = None
self.leading = leading
self.trailing = trailing
self.end = end
+ self.indent = None
self.match = None
self._is_bytes = is_bytes
self.__dict__.update(kwargs)
@@ -2779,7 +2780,7 @@ def detab(self, s):
def extend_leading(self, s, line):
if isinstance(s, int):
- assert -len(line) <= s < len(line)
+ assert -len(line) <= s < len(line), f"extend_leading invalid parameters: s={s!r} line={line!r}"
s = line[:s]
else:
assert line.startswith(s), f"line {line!r} doesn't start with s {s!r}"
@@ -2793,7 +2794,7 @@ def extend_leading(self, s, line):
def extend_trailing(self, s, line):
if isinstance(s, int):
- assert -len(line) <= s < len(line)
+ assert -len(line) <= s < len(line), f"extend_trailing invalid parameters: s={s!r} line={line!r}"
s = line[-s:]
else:
assert line.endswith(s), f"line {line!r} doesn't end with s {s!r}"
@@ -2804,7 +2805,7 @@ def extend_trailing(self, s, line):
def __repr__(self):
names = list(self.__dict__)
- priority_names = ['line', 'lines', 'line_number', 'column_number', 'leading', 'trailing', 'end']
+ priority_names = ['lines', 'line', 'line_number', 'column_number', 'leading', 'trailing', 'end']
fields = []
for name in priority_names:
names.remove(name)
@@ -3023,11 +3024,11 @@ def lines_strip(li, separators=None):
yield (info, line)
-def lines_filter_line_comment_lines(li, comment_re):
+def lines_filter_line_comment_lines(li, match):
"The generator function returned by the public lines_filter_line_comment_lines function."
for info, line in li:
s = line.lstrip()
- if comment_re.match(s):
+ if match(s):
continue
yield (info, line)
@@ -3067,8 +3068,12 @@ def lines_filter_line_comment_lines(li, comment_markers):
comment_pattern = _separators_to_re(comment_markers, comment_markers_is_bytes, separate=False, keep=False)
comment_re = re.compile(comment_pattern)
- return _lines_filter_line_comment_lines(li, comment_re)
+ return _lines_filter_line_comment_lines(li, comment_re.match)
+# old, deprecated name.
+# will eventually be deleted, but not before September 2025.
+lines_filter_comment_lines = lines_filter_line_comment_lines
+_export_name("lines_filter_comment_lines")
@_export
def lines_containing(li, s, *, invert=False):
@@ -3242,18 +3247,18 @@ def lines_strip_line_comments(li, line_comment_markers, *,
A lines modifier function. Strips line comments from the lines
of a "lines iterator". Line comments are substrings beginning
with a special marker that mean the rest of the line should be
- ignored; lines_strip_comments truncates the line at the
+ ignored; lines_strip_line_comments truncates the line at the
beginning of the leftmost line comment marker.
line_comment_markers should be an iterable of line comment
marker strings. These are strings that denote a "line comment",
which is to say, a comment that extends from the marker to the
- end of the line. lines_strip_comments truncates each line
+ end of the line. lines_strip_line_comments truncates each line
starting at the beginning of the leftmost line comment marker
found on that line.
If quotes is true, it must be an iterable of quote marker
- strings, length 1 or more. lines_strip_comments will
+ strings, length 1 or more. lines_strip_line_comments will
parse the line using big's split_quoted_strings function,
and ignore comment characters inside quoted strings. If
quotes is false, quote characters are ignored and
@@ -3261,7 +3266,7 @@ def lines_strip_line_comments(li, line_comment_markers, *,
By default quotes is the tuple ("'", '"'). Quoted strings
delimited by markers in quotes may not span lines; if a
line ends with an unterminated quoted string,
- lines_strip_comments will raise a SyntaxError.
+ lines_strip_line_comments will raise a SyntaxError.
If escape is true, it must be a string. This string
will "escape" (quote) quote markers, as per backslash
@@ -3276,18 +3281,18 @@ def lines_strip_line_comments(li, line_comment_markers, *,
in (conventional) quotes are not allowed to. By default
multiline_quotes is an empty string.
- If rstrip is true (the default), lines_strip_comments calls
- the rstrip() method on line after it truncates the line.
+ If rstrip is true (the default), lines_strip_line_comments
+ calls the rstrip() method on line after it truncates the line.
Updates LineInfo.comment and LineInfo.trailing as appropriate.
- What's the difference between lines_strip_comments and
+ What's the difference between lines_strip_line_comments and
lines_filter_comment_lines?
* lines_filter_comment_lines only recognizes lines that
*start* with a comment separator (ignoring leading
whitespace). Also, it filters out those lines
completely, rather than modifying the line.
- * lines_strip_comments handles comment characters
+ * lines_strip_line_comments handles comment characters
anywhere in the line, although it can ignore
comments inside quoted strings. It truncates the
line but still always yields the line.
@@ -3323,8 +3328,109 @@ def lines_strip_line_comments(li, line_comment_markers, *,
return _lines_strip_line_comments(li, line_comment_splitter, quotes, multiline_quotes, escape, rstrip, empty_join)
-lines_strip_comments = lines_strip_line_comments
-_export_name("lines_strip_comments")
+
+# backwards compatibility
+@_export
+def lines_strip_comments(li, comment_separators, *, quotes=('"', "'"), backslash='\\', rstrip=True, triple_quotes=True):
+ """
+ NOTE: This function is deprecated. Please use
+ lines_strip_line_comments instead. lines_strip_comments
+ will eventually be removed from big, no sooner than September 2025.
+
+ A lines modifier function. Strips comments from the lines
+ of a "lines iterator". Comments are substrings that indicate
+ the rest of the line should be ignored; lines_strip_comments
+ truncates the line at the beginning of the leftmost comment
+ separator.
+
+ If rstrip is true (the default), lines_strip_comments calls
+ the rstrip() method on line after it truncates the line.
+
+ If quotes is true, it must be an iterable of quote characters.
+ (Each quote character MUST be a single character.)
+ lines_strip_comments will parse the line and ignore comment
+ characters inside quoted strings. If quotes is false,
+ quote characters are ignored and line_strip_comments will
+ truncate anywhere in the line.
+
+ backslash and triple_quotes are passed in to
+ split_quoted_string, which is used internally to detect
+ the quoted strings in the line.
+
+ Sets a new field on the associated LineInfo object for every line:
+ * comment - the comment stripped from the line, if any.
+ if no comment was found, "comment" will be an empty string.
+
+ What's the difference between lines_strip_comments and
+ lines_filter_comment_lines?
+ * lines_filter_comment_lines only recognizes lines that
+ *start* with a comment separator (ignoring leading
+ whitespace). Also, it filters out those lines
+ completely, rather than modifying the line.
+ * lines_strip_comments handles comment characters
+ anywhere in the line, although it can ignore
+ comments inside quoted strings. It truncates the
+ line but still always yields the line.
+
+ Composable with all the lines_ modifier functions in the big.text module.
+ """
+ if not comment_separators:
+ raise ValueError("illegal comment_separators")
+
+ if isinstance(comment_separators, bytes):
+ comment_separators = _iterate_over_bytes(comment_separators)
+ comment_separators_is_bytes = True
+ else:
+ comment_separators_is_bytes = isinstance(comment_separators[0], bytes)
+ comment_separators = tuple(comment_separators)
+
+ if comment_separators_is_bytes:
+ empty = b''
+ else:
+ empty = ''
+ empty_join = empty.join
+
+ comment_pattern = __separators_to_re(comment_separators, separators_is_bytes=comment_separators_is_bytes, separate=True, keep=True)
+ re_comment = re.compile(comment_pattern)
+ split = re_comment.split
+
+
+ def lines_strip_comments(li, split, quotes, backslash, rstrip, triple_quotes):
+ for info, line in li:
+ if quotes:
+ i = old_split_quoted_strings(line, quotes, backslash=backslash, triple_quotes=triple_quotes)
+ else:
+ i = ((False, line),)
+
+ # iterate over the line until we either hit the end or find a comment.
+ segments = []
+ append = segments.append
+ for is_quoted, segment in i:
+ if is_quoted:
+ append(segment)
+ continue
+
+ fields = split(segment, maxsplit=1)
+ leading = fields[0]
+ if len(fields) == 1:
+ append(leading)
+ continue
+
+ # found a comment marker in an unquoted segment!
+ if rstrip:
+ leading = leading.rstrip()
+ append(leading)
+ break
+
+ keeping = sum(len(s) for s in segments)
+ removing = len(line) - keeping
+ if removing:
+ line = info.extend_trailing(removing, line)
+ yield (info, line)
+ return lines_strip_comments(li, split, quotes, backslash, rstrip, triple_quotes)
+
+
+
@_export
def lines_convert_tabs_to_spaces(li):
diff --git a/tests/test_text.py b/tests/test_text.py
index 1d0438e..4b9308e 100644
--- a/tests/test_text.py
+++ b/tests/test_text.py
@@ -3308,6 +3308,9 @@ def test(i, expected, *, test_reconstituted_line=True):
print("got:")
pprint.pprint(got)
print("\n\n")
+ for e, g in zip(expected, got):
+ print(e==g)
+ print("\n\n")
self.assertEqual(expected, got)
def L(line, line_number, column_number=1, end='\n', final=None, **kwargs):
@@ -3694,6 +3697,55 @@ def test_and_remove_lineinfo_match(i, substring, *, invert=False, match='match')
]
)
+
+ ##
+ ## testing for a deprecated function!
+ ## lines_strip_comments
+ ##
+ lines = big.lines("""
+for x in range(5): # this is a comment
+ print("# this is quoted", x)
+ print("") # this "comment" is useless
+ print(no_comments_or_quotes_on_this_line)
+"""[1:])
+ test(big.lines_strip_comments(lines, ("#", "//")),
+ [
+ L(line='for x in range(5): # this is a comment', line_number=1, column_number=1, trailing=' # this is a comment', final='for x in range(5):'),
+ L(line=' print("# this is quoted", x)', line_number=2, column_number=1),
+ L(line=' print("") # this "comment" is useless', line_number=3, column_number=1, trailing=' # this "comment" is useless', final=' print("")'),
+ L(line=' print(no_comments_or_quotes_on_this_line)', line_number=4, column_number=1),
+ L(line='', line_number=5, column_number=1, end=''),
+ ])
+
+ # don't get alarmed! we intentionally break quote characters in this test.
+ lines = big.lines("""
+for x in range(5): # this is a comment
+ print("# this is quoted", x)
+ print("") # this "comment" is useless
+ print(no_comments_or_quotes_on_this_line)
+"""[1:])
+ test(big.lines_strip_comments(lines, ("#", "//"), quotes=None),
+ [
+ L(line='for x in range(5): # this is a comment', line_number=1, column_number=1, trailing=' # this is a comment', final='for x in range(5):'),
+ L(line=' print("# this is quoted", x)', line_number=2, column_number=1, trailing='# this is quoted", x)', final=' print("'),
+ L(line=' print("") # this "comment" is useless', line_number=3, column_number=1, trailing=' # this "comment" is useless', final=' print("")'),
+ L(line=' print(no_comments_or_quotes_on_this_line)', line_number=4, column_number=1),
+ L(line='', line_number=5, column_number=1, end=''),
+ ])
+
+ with self.assertRaises(ValueError):
+ list(big.lines_strip_comments(big.lines("a\nb\n"), None))
+
+ lines = big.lines(b"a\nb# ignored\n c")
+ test(big.lines_strip_comments(lines, b'#'),
+ [
+ L(b'a', 1),
+ L(b'b# ignored', 2, 1, trailing=b'# ignored', final=b'b'),
+ L(b' c', 3, end=b''),
+ ]
+ )
+
+
lines = big.lines(
" \n" +
" a = b \n" +
@@ -3713,7 +3765,7 @@ def test_and_remove_lineinfo_match(i, substring, *, invert=False, match='match')
def test_lines_strip_indent(self):
self.maxDiff = 2**32
- def assert_lines_reconstitutes_properly(i):
+ def assert_line_reconstitutes_properly(i):
for t in i:
info, line = t
reconstituted_line = info.leading + line + info.trailing
@@ -3721,8 +3773,9 @@ def assert_lines_reconstitutes_properly(i):
yield t
def test(lines, expected, *, tab_width=8):
- lines = big.lines(lines, tab_width=tab_width)
- i = assert_lines_reconstitutes_properly(big.lines_strip_indent(lines))
+ if not isinstance(lines, types.GeneratorType):
+ lines = big.lines(lines, tab_width=tab_width)
+ i = assert_line_reconstitutes_properly(big.lines_strip_indent(lines))
got = list(i)
# fixup lines objects
@@ -3899,6 +3952,8 @@ def LineInfo(lines, line, line_number, column_number, end=_sentinel, **kwargs):
# with self.assertRaises(ValueError):
# test("first line\n \u3000 second line\nthird line\n", [])
+
+
def test_lines_misc(self):
## error handling
with self.assertRaises(TypeError):