Skip to content

Commit

Permalink
Eensy-weensy teeny toy_multisplit optimization.
Browse files Browse the repository at this point in the history
For every position in the string, we shave off a substring
of the longest possible separator.  Then we clip off progressively
shorter substrings, until we find a substring that matches one
of the separators of that length.  If the string doesn't start with
any separator, we'll try 'em all.

I happen to know that CPython has some hacks that speed up string
manipulation.  If you only have one reference to a string, and
you transform the string somehow and overwrite your reference with
the transformed verison, CPython can often simply modify your
existing string in-place.

So, I rewrote toy_multisplit so it pre-creates the transformed
string, then shaves characters off *that*.  And it does appear
to be a teeny tiny bit faster.
  • Loading branch information
larryhastings committed Sep 7, 2023
1 parent ff22f46 commit b6077e8
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion tests/test_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -1124,9 +1124,11 @@ def flush_word():
segments.append(empty.join(word))
word.clear()

longest_separator_length = separators_by_length[0][0]
while s:
substring = s
for length, separators_set in separators_by_length:
substring = s[:length]
substring = substring[:length]
# print(f"substring={substring!r} separators_set={separators_set!r}")
if substring in separators_set:
flush_word()
Expand Down

0 comments on commit b6077e8

Please sign in to comment.