-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR-8905] Gaps in String benchmarking #51411
Comments
When implementing these benchmarks please keep in mind that Swift Benchmark Suite is mostly a set of micro benchmarks, so aim for base workloads that can execute in under 10ms, ideally under 1ms (or 1000 microseconds — the unit reported from Benchmark_O). Longer runtimes accumulate measurement errors due to context switching between processes by OS scheduler. |
Good point! I updated each major bullet point with an existing benchmark to refer to as inspiration |
Comment by Patrick Balestra (JIRA) This seems like a good starter issue for a newcomer in the standard library. I just created SR-8908 to try and fill the `insert(_:Character)` gap with some benchmarks. |
@palimondo to clarify, is the goal for a run time of 1-10ms when passed `1` as an argument, or some other value? |
@milseman If you’re asking about command line argument |
Comment by Armen Mkrtchian (JIRA) Hey, I will work on benchmarking non-ASCII on AngryPhonebook |
armenm (JIRA User) could you link the PR? keitaito (JIRA User) opened #25310 for replaceSubrange calls. |
Comment by Armen Mkrtchian (JIRA) Hi @milseman |
Comment by Keita Ito (JIRA) Benchmarks for `String.replaceSubrange(_:with:)` are added by this merged PR #25310 Thank you Pavol and Michael for your patient review and support! |
Comment by Valeriy Van (JIRA) Benchmarks for UTF16 decoding were added by PR #34435. |
Additional Detail from JIRA
md5: 310808e3ff76ad0c9651e7a3550b6d27
Sub-Tasks:
Issue Description:
While inspecting our benchmarking story, there are many micro-benchmark gaps that we should fill. This bug holds a listing of such gaps, and is a start task. Anyone interested in covering a gap should create a new bug for it, assign to themselves, and apply the fix. I can review the PR or provide further guidance.
Some String RangeReplaceableCollection operations. We're missing benchmarking for:
insert<C: Collection>(_: C)
See benchmark/single-source/RemoveWhere.swift for an example of some RRC operations.
Grapheme breaking
We have many benchmarks present, but they're disabled (for bad historical reasons). We should re-enable them.
Also, we have unicode-scalar breaking variants that are enabled, however this is a highly redundant suite as unicode-scalar-breaking is far more uniform. The list of workloads for unicode-scalar breaking should be pruned to: ascii, russian, chinese, and emoji, and renamed to not imply Character iteration
We don't benchmark grapheme-breaking on bridged NSStrings strings. Historically this hasn't exhibited much perf difference, but it's a bind spot currently.
We also count via iteration, but in theory, String.count could be made faster than raw iteration. We should pick one workload to just run String.count on.
See benchmark/single-source/StringWalk.swift.gyb
Substring-based comparison/hashing and benchmark unification
We have some for Substring without a very diverse payload in Substring.swift, and diverse payloads only for String in StringComparison.swift.
We should merge these two benchmarks together, getting the diversity of StringComparison.swift and the same-buffer-but-different-pointer variants for Substring.swift, applied to Substrings as well as Strings.
Similarly, for Strings and Substrings from bridged NSStrings, though we might prune the datasets to reduce a combinatorial explosion in number of benchmarks
See benchmark/single-source/Substring.swift and benchmark/single-source/StringComparison.swift
Transcoding chunks of data from one encoding to another
Each encoding in Unicode.Encoding has a transcode<>() method, which we can benchmark.
UTF8 -> UTF16 is likely to be an increasingly important one for the future.
See benchmark/single-source/UTF8Decode.swift for some inspiration
Case conversion: String.lowercased() and String.uppercased()
ASCII and non-ASCII bridged NSStrings
See benchmarks/single-source/AngryPhonebook.swift for some guidance
Completed
String Breadcrumbing - https://bugs.swift.org/browse/SR-9226
insert(_:Character) - https://bugs.swift.org/browse/SR-8908
replaceSubrange<C: Collection>(_:C) - [benchmark] Add ReplaceSubrange benchmark #25310
case conversion ASCII and non-ASCII - https://bugs.swift.org/browse/SR-10855
Wait for approval:
The text was updated successfully, but these errors were encountered: