-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[benchmark] SR-10855: Added non-ASCII characters to String benchmarking #25309
[benchmark] SR-10855: Added non-ASCII characters to String benchmarking #25309
Conversation
@swift-ci Please benchmark |
@swift-ci Please smoke test |
This comment has been minimized.
This comment has been minimized.
Hey @gottesmm I have decreased the workload, should be enough to pass the check. Could you please re-run the benchmark test? |
@swift-ci Please benchmark |
@swift-ci Please smoke test |
This comment has been minimized.
This comment has been minimized.
🥳🇦🇲👏🏽🤓💪🏽💻📲 |
@Armenm Looks like you still need to reduce the workload a little more. Maybe mixed strings with emoji should be measured in separate benchmarks? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome to Swift project, Armen! Looks like we need to coordinate two PRs modifying the same benchmark… See PR #25305. You have filed SR-10855 first, so we'll start here.
Since the addition of other bicameral scripts is turning a single benchmark into a family of benchmarks, we should take this opportunity to properly design it to allow for their relative comparison, while we're at it. We'll leave the original AngryPhonebook
benchmark without changes.
cc @mdab121
@@ -39,3 +49,39 @@ public func run_AngryPhonebook(_ N: Int) { | |||
} | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set the workload baseline by the slowest benchmark to be under 1000 μs, looking at the last measured performance of Armenian
(27^2 = 729 name pairs), I'm guessing we could standardize on workload size of 20 names per script. We should also try to keep the overall size roughly the same. Document this:
// Workloads for various scripts. Always 20 names for 400 pairings.
// To keep the performance of various scripts roughly comparable, aim for
// a total length of approximately 120 characters.
// E.g.: `latin.joined(separator: "").count == 124`
let latin = Array(words.prefix(20))
let cyrillic = [
"Александр", "Аркадий", "Аня", "Даниил", "Дмитрий", "Эдуард", "Юрій", "Давид",
"Анна", "Дмитрий", "Евгений", "Борис", "Владимир", "Артур", "Антон",
"Антон", "Надія", "Алёна", "Алиса"]
let armenian = [
"Արմեն", "Աննա", "Հարություն", "Միքայել", "Մարիա", "Դավիթ", "Վարդան", "Նարինե",
"Հռիփսիմե", "Տիգրան", "Տաթև", "Ադամ", "Ազատ", "Ազնաւուր", "Գրիգոր", "Անի",
"Լիլիթ", "Հայկ", "Անդրանիկ", "Գառնիկ"]
I think cyrillic
needs a bit tuning, the first 20 names I took are only 108 characters… could you please bring it up to ~120? Armenian at 114 are fine, I think…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Armenian and Cyrillic are both 2-byte encoded languages. Unless there are salient differences in how capitalization works (I don't know either enough to know), normalization, or use of combining characters, these would be the same workload from a technical perspective.
As @palimondo pointed out, what's also interesting would be the following:
let smallTwoByte = [
// something up to 15 UTF-8 code units in length, i.e. 7 "characters" in Cyrillic or Armenian ...
]
let largeTwoByte = [
// something > 16 UTF-8 code units in length, i.e. at least 8 "characters" in Cyrillic or Armenian...
]
And the same for 3-byte for some latter-BMP language with capitalization (not sure off the top of my head), and without (e.g. Han ideographs), and similarly for 4-byte, and a large 1-byte (i.e. ASCII) workload. This can be done as a separate PR, though.
@palimondo Thanks for the great feedback and support. Yes, I will review and make all necessary changes. Thanks in advance. |
Sure, $ ./Benchmark_Driver --help
usage: Benchmark_Driver [-h] COMMAND ...
optional arguments:
-h, --help show this help message and exit
Swift benchmark driver commands:
COMMAND See COMMAND -h for additional arguments
run Run benchmarks and output results to stdout
check
compare Compare benchmark results
Example: ./Benchmark_Driver run -i 5 -f Prefix -f .*Suffix.* You want to run the check to validate that your newly added benchmarks pass our sanity checks: $ ./Benchmark_Driver check --help
usage: Benchmark_Driver check [-h] [-f PATTERN] [-t TESTS] [-o OPT] [-v | -md]
[BENCHMARK [BENCHMARK ...]]
positional arguments:
BENCHMARK benchmark to run (default: all)
optional arguments:
-h, --help show this help message and exit
-f PATTERN, --filter PATTERN
run all tests whose name match regular expression
PATTERN, multiple filters are supported
-t TESTS, --tests TESTS
directory containing Benchmark_O{,none,size} (default:
DRIVER_DIR)
-o OPT, --optimization OPT
optimization level to use: {O,Onone,Osize}, (default:
O)
-v, --verbose show more details during benchmark analysis
-md, --markdown format report as Markdown table So for the testing of new $ ./Benchmark_Driver run -i 3 --output-dir log -f Angry The To run the benchmark validation locally you would do: $ ./Benchmark_Driver check -f Angry This takes a while, because it measures each benchmark for a second, ten times. You'll get color coded warnings and errors if we find something suspicious. To see all the nitty gritty while it does its thing do: $ ./Benchmark_Driver check -f Angry --verbose |
@Armenm I've noticed one more thing, that we could take care of here if we want to be super fair when comparing across different scripts… The $ ./swift
*** You are running Swift's integrated REPL, ***
*** intended for compiler and stdlib ***
*** development and testing purposes only. ***
*** The full REPL is built as part of LLDB. ***
*** Type ':help' for assistance. ***
(swift) "smol"._guts.isSmall
// r0 : Bool = true
(swift) "Long String > 16B"._guts.isSmall
// r1 : Bool = false So to be totally fair, we should probably take the extra step to ensure we'll be comparing apples to apples, because in latin, all names end up being small strings: (swift) latin.allSatisfy { $0._guts.isSmall }
// r7 : Bool = true but (swift) cyrillic.filter { !$0._guts.isSmall }
// r5 : [String] = ["Александр", "Владимир"]
(swift) armenian.filter { !$0._guts.isSmall }
// r7 : [String] = ["Հարություն", "Հռիփսիմե", "Ազնաւուր", "Անդրանիկ"] Could you try to pick the names with lengths so that they would fit into small strings (16 bytes when UTF-8 encoded) and still get the overall length? @milseman Do you want to have benchmarks per script (this PR adds Cyrillic and Armenian, PR #25305 is poised to add some Extended Latin)? We could probably add Greek to cover all major bicameral scripts… or do you want just one non-ASCII addition? |
Hey @palimondo This is the first time I am touching this code and had looked only for 4-5 hours during try! Swift San Jose. Forgive me if it takes too much time to understand what's going on here :) |
@Armenm Don't worry, you're doing great and I'm overthinking things.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great, thanks for working on this! Could you incorporate @palimondo 's code review feedback (naming convention, extracting out the common code), and we can merge this?
@@ -39,3 +49,39 @@ public func run_AngryPhonebook(_ N: Int) { | |||
} | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Armenian and Cyrillic are both 2-byte encoded languages. Unless there are salient differences in how capitalization works (I don't know either enough to know), normalization, or use of combining characters, these would be the same workload from a technical perspective.
As @palimondo pointed out, what's also interesting would be the following:
let smallTwoByte = [
// something up to 15 UTF-8 code units in length, i.e. 7 "characters" in Cyrillic or Armenian ...
]
let largeTwoByte = [
// something > 16 UTF-8 code units in length, i.e. at least 8 "characters" in Cyrillic or Armenian...
]
And the same for 3-byte for some latter-BMP language with capitalization (not sure off the top of my head), and without (e.g. Han ideographs), and similarly for 4-byte, and a large 1-byte (i.e. ASCII) workload. This can be done as a separate PR, though.
Sure @milseman Will look into it. Thanks! |
Added Cyrillic and Armenian names with emojis
1bf0ba1
to
4483e84
Compare
Hey @palimondo. I have refactored the code based on your comments. There are 20 names in each language and ~120 characters. Also removed all names that were not fitting in small strings. |
I was not able to run
|
It's unable to find the benchmark binary. You need to be running it from the |
@swift-ci please benchmark |
@swift-ci please benchmark |
@swift-ci please smoke test |
@milseman since I was fixing the error and had these benchmarks running locally, I've added a |
This comment has been minimized.
This comment has been minimized.
@Armenm, for future work in this area, you might be interested in what @milseman said in @mdab121's PR #25305 (comment) @milseman Do you have any explanation for the performance of Per iteration, the small variants are doing 800 calls to ICU on 2-byte variants (Armenian, Cyrillic), vs. 2 call in the The |
@palimondo The naming here is preventing me from being able to discuss this intelligently. Are we using "Latin" (which is an irrelevant designation for benchmark purposes) to mean that it consists entirely of ASCII? If so, we should use "ASCII" for the name. Similarly, are we using "Large" to mean that it will not fit in a small-string representation? Otherwise, we're just rearranging the deck chairs: more quicker iterations vs fewer slower iterations. |
Correct, The |
I don’t understand your point here. I was saying that the |
It seems we have different assumption about the implementation of optimizations for case conversion here. If I understand your perspective from an earlier discussion above it makes sense to distinguish only between cases based on UTF-8 encoding length:
As I’ve explained in #25305 (comment) it seem like it could be possible to create optimizations for case conversions specialized for different Unicode code blocks. Is that a wrong way to think about this? Edit: Looking at Rust's implementation, I see that the case conversions are implemented using a |
Pathological case, uppercase: ß -> SS
@swift-ci please benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@milseman I've added a benchmark for the pathological case of uppercased ß -> SS: "Straße".uppercased() == "STRASSE" You can see that even that one ( Heaviest Stack Trace from Instruments
|
In general, Unicode code blocks is an irrelevant categorization for benchmarking, and “Latin” isn’t even the formal name of the code block (nor would the formal name be as recognizable). ASCII vs non-ASCII is a very relevant categorization, which just so happens to align with a Unicode code block because Unicode explicitly chose to be a superset of ASCII. For example, a workload that consists entirely of Han Ideographs would be interesting, not because they all come from a given block, but because of the attributes they have in comparison to other workloads, such as case invariance (and to a lesser extent, encoding width for the BMP ones). Similarly, improvements and additional fast-paths would apply based on these kinds of attributes, such as over case-invariant ranges of scalar values. The boundaries of such ranges are not strictly aligned with code block boundaries.
I was confused by your workload in https://github.com/apple/swift/pull/25309/files#r291816905, which had “small” workloads that don’t fit in the small-string representation (15 bytes of UTF-8). If the PR is consistent in small vs large aligning with small vs large string representations, then this is fine. Otherwise I think that “long” vs “short” is better.
This seems unlikely, and perhaps we need to dig into the details if this is happening. Interfacing with ICU requires that we:
In contrast, if we know a String is all-ASCII, we do the following:
Point #2 could probably be improved with wider/bulk operations.
If Extended Latin, Armenian, and Cyrillic have similar/identical case-conversion properties (e.g. invariance), and ICU’s implementation is otherwise not distinguishing between them in its own fast paths, then yes.
Again, code block boundaries is irrelevant. You’re likely going to optimize for ranges of code points that may span or lie within code block boundaries.
A BoolTrie or something similar would be very useful for representing Unicode Scalar Properties in the stdlib without asking ICU on a scalar-by-scalar basis. That includes case conversion properties, though we might want to combine it with some kind of sparse interval set for case invariance.
That profile is demonstrating something highly suspect about the implementation.
This should switch to https://forums.swift.org/t/pitch-add-a-string-initializer-with-access-to-uninitialized-storage/22611 when it happens. @Catfish-Man, here is another example of something that would benefit from this. We should propose it immediately after 5.1 wraps up.
Something is going terribly wrong in the implementation (or benchmark construction, but I suspect the implementation) that’s sending us down a very slow path here.
The difference between these two traces for |
Clarified the naming of tests that exercise `_SmallString`s with `.Small` suffix and used plane name for regular String workloads. Reordered declarations. Documented workload sizes.
@swift-ci please benchmark |
@swift-ci please smoke test |
In addition to I have filed a bug for this performance anomaly: SR-11069. I have renamed the tests that exercise the
|
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@Armenm, I'll merge this as soon as all the tests pass. I hope you were not discouraged by the meandering progress… Thank you for your contribution! |
@palimondo it seems removing all that senseless overhead gives us around a 30x speedup for the large case, and nice wins even for the smaller one. |
Not discouraged at all @palimondo. I would say even the opposite. It was really cool to dive deep into the details. And I am happy to see that my humble help brought to some other implementation and optimization on |
_SmallString
optimization have.Small
suffix).Resolves SR-10855
Subtask of SR-8905: Gaps in String benchmarking