Eensy-weensy teeny toy_multisplit optimization.

For every position in the string, we shave off a substring of the longest possible separator. Then we clip off progressively shorter substrings, until we find a substring that matches one of the separators of that length. If the string doesn't start with any separator, we'll try 'em all. I happen to know that CPython has some hacks that speed up string manipulation. If you only have one reference to a string, and you transform the string somehow and overwrite your reference with the transformed verison, CPython can often simply modify your existing string in-place. So, I rewrote toy_multisplit so it pre-creates the transformed string, then shaves characters off *that*. And it does appear to be a teeny tiny bit faster.
larryhastings · Sep 7, 2023 · b6077e8 · b6077e8
1 parent ff22f46
commit b6077e8
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/tests/test_text.py b/tests/test_text.py
@@ -1124,9 +1124,11 @@ def flush_word():
                 segments.append(empty.join(word))
                 word.clear()
 
+            longest_separator_length = separators_by_length[0][0]
             while s:
+                substring = s
                 for length, separators_set in separators_by_length:
-                    substring = s[:length]
+                    substring = substring[:length]
                     # print(f"substring={substring!r} separators_set={separators_set!r}")
                     if substring in separators_set:
                         flush_word()