-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add strutils in-place #13610
Add strutils in-place #13610
Conversation
The code itself is ok (I only skimmed it though) but the ideas behind it need to be polished: Now that we have |
A better question might be, does linearScanEnd provide any performance benefit in this case? |
I can remove the |
I can remove the About
Literally the opposite. Again is not my opinion, is just what the numbers say, I just want std lib to be faster, other parts of std lib will benefit from this eventually, You can read the code of People confirmed that Nim is slower than Interpreted languages about this. Confirmed root cause may be Nim People want to use Nim string find is slower than Interpreted Python. For a Compiled language to be faster than an Interpreted language, is a good Marketing. |
I don't mind "linearScanEnd", I invented it. But I assume the documentation is only understandable by me. cmpIgnoreCase does not use toUpperAscii for strings, only for single chars. Microbenchmarks are always flawed, if people want a faster |
Is Trying the naive in-place version proc toLowerAscii3*(c: var char) =
if c in {'A'..'Z'}:
c = chr(ord(c) + (ord('a') - ord('A'))) and using the full linear scan range required gives me:
|
This has I have seen the comment about hot spot in the compiler etc, Documentation of If a Developer cant understand a |
That's a bold claim. |
lib/pure/strutils.nim
Outdated
template toLowerAsciiInPlace*(c: var char) = | ||
## Returns the lower case version of character ``c``. | ||
## | ||
## This works only for the letters ``A-Z``. See `unicode.toLower | ||
## <unicode.html#toLower,Rune>`_ for a version that works for any Unicode | ||
## character. | ||
## | ||
## See also: | ||
## * `isLowerAscii proc<#isLowerAscii,char>`_ | ||
## * `toLowerAscii proc<#toLowerAscii,string>`_ for converting a string | ||
runnableExamples: | ||
var character = 'F' | ||
toLowerAsciiInPlace(character) | ||
doAssert character == 'f' | ||
var chara = 'A' | ||
toLowerAsciiInPlace(chara) | ||
doAssert chara == 'a' | ||
if c in {'A'..'Z'}: c = chr(ord(c) + (ord('a') - ord('A'))) | ||
|
||
|
||
template toUpperAsciiInPlace*(c: var char) = | ||
## Converts character `c` into upper case. | ||
## | ||
## This works only for the letters ``A-Z``. See `unicode.toUpper | ||
## <unicode.html#toUpper,Rune>`_ for a version that works for any Unicode | ||
## character. | ||
## | ||
## See also: | ||
## * `isLowerAscii proc<#isLowerAscii,char>`_ | ||
## * `toUpperAscii proc<#toUpperAscii,string>`_ for converting a string | ||
## * `capitalizeAscii proc<#capitalizeAscii,string>`_ | ||
runnableExamples: | ||
var character = 'f' | ||
toUpperAsciiInPlace(character) | ||
doAssert character == 'F' | ||
var chara = 'z' | ||
toUpperAsciiInPlace(chara) | ||
doAssert chara == 'Z' | ||
if c in {'a'..'z'}: c = chr(ord(c) - (ord('a') - ord('A'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But... why? Is it that slow to return a char?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this makes no sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try for yourself:
import times
from strutils import toLowerAscii
template toLowerAscii2(c: var char) =
if c in {'A'..'Z'}: c = chr(ord(c) + (ord('a') - ord('A')))
echo "strutils.toLowerAscii"
var char1: char
let t0 = now()
let t0a = cpuTime()
for _ in 0..99:
char1 = 'A'
discard strutils.toLowerAscii(char1)
echo now() - t0, '\t', cpuTime() - t0a
echo "toLowerAscii2"
var char0: char
let t1 = now()
let t1a = cpuTime()
for _ in 0..99:
char0 = 'A'
toLowerAscii2(char0)
echo now() - t1, '\t', cpuTime() - t1a
$ nim c -r example2.nim
strutils.toLowerAscii
55 microseconds and 883 nanoseconds 4.047999999999899e-05
toLowerAscii2
9 microseconds and 76 nanoseconds 1.158199999999883e-05
$ nim c -r -d:release -d:danger example2.nim
strutils.toLowerAscii
32 microseconds and 9 nanoseconds 4.101599999999962e-05
toLowerAscii2
8 microseconds and 804 nanoseconds 1.222200000000023e-05
🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I tweaked the test a bit:
import std/monotimes, times, strutils
proc toLowerAscii1(c: char): char {.inline.} =
# same as stdlib but with {.inline.}
if c in {'A'..'Z'}:
result = chr(ord(c) + (ord('a') - ord('A')))
else:
result = c
proc toLowerAscii2(c: var char) {.inline.} =
if c in {'A'..'Z'}: c = chr(ord(c) + (ord('a') - ord('A')))
template bench(body: untyped): untyped =
block:
let start = getMonoTime()
var ch {.inject.}: char
for _ in 0..99:
ch = 'A'
body
echo getMonoTime() - start
echo "strutils.toLowerAscii"
bench: ch = toLowerAscii(ch)
echo "toLowerAscii1"
bench: ch = toLowerAscii1(ch)
echo "toLowerAscii2"
bench: toLowerAscii2(ch)
And the results:
$ nim c -r test.nim
strutils.toLowerAscii
2 microseconds and 170 nanoseconds
toLowerAscii1
1 microsecond and 800 nanoseconds
toLowerAscii2
1 microsecond and 731 nanoseconds
$ nim c -d:danger -r test.nim
strutils.toLowerAscii
330 nanoseconds
toLowerAscii1
20 nanoseconds
toLowerAscii2
20 nanoseconds
It seems to me that the only overhead here is the function call overhead, you can easily fix this by adding {.inline.}
into the stdlib version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats 330
Vs 20
, that confirms is faster. 🤷♂️
I think that the point is that difference accumulates in the loop,
I dont care too much about the char
functions,
but the string
function the difference is bigger.
I got more difference (Microseconds Vs Nanoseconds, but I believe you).
What do we do?, add {.inline.}
to all the strutils
ones?, close this PR?. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats
330
Vs20
, that confirms is faster.
Well of course there will be a difference when you {.inline.}
vs calling functions. The point here is var char
doesn't really improve anything, so we don't need to add new APIs for that, just adding {.inline.}
strategically is enough.
I dont care too much about the
char
functions
You may not care about it, but it does matter when a new API is added to the stdlib. We should be conservative about adding new APIs, since once added we will be stuck with it for the rest of the 1.x series.
With that said, I don't have any objections on adding the in-place versions for the string
variant, as there are actual speed up there.
add
{.inline.}
to all the strutils ones?
We should only target simple procs (ie. <= 3 lines). Better would be to make the compiler able to perform this optimization on it's own, though that's a lot harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Only string
proc is public.
lib/pure/strutils.nim
Outdated
toLowerAsciiInPlace(c) | ||
s[i] = c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in-place already, so why assigning it again?
Interestingly C import std/monotimes, times
proc toLowerAsciiC(c: char): char {.importc: "tolower", header: "ctype.h".}
block:
echo "toLowerAsciiC"
var ch: char
let start = getMonoTime()
for _ in 0..99:
ch = 'A'
ch = toLowerAsciiC(ch)
echo getMonoTime() - start |
lib/pure/strutils.nim
Outdated
template toLowerAsciiInPlace(c: var char) = | ||
if c in {'A'..'Z'}: c = chr(ord(c) + (ord('a') - ord('A'))) | ||
|
||
template toUpperAsciiInPlace(c: var char) = | ||
if c in {'a'..'z'}: c = chr(ord(c) - (ord('a') - ord('A'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add {.inline.}
to the current toLowerAscii
and toUpperAscii
instead of duplicating the code here. It's demonstrated that the performance gains of this duplication is negligible compared to when you add {.inline.}
to the current versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/pure/strutils.nim
Outdated
toLowerAsciiInPlace(c) | ||
s[i] = c | ||
inc i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toLowerAsciiInPlace(c) | |
s[i] = c | |
inc i | |
c = toLowerAscii(c) |
You're using mitems
, which allows you to directly modify the elements of the passed string.
lib/pure/strutils.nim
Outdated
toUpperAsciiInPlace(c) | ||
s[i] = c | ||
inc i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toUpperAsciiInPlace(c) | |
s[i] = c | |
inc i | |
c = toUpperAscii(c) |
lib/pure/strutils.nim
Outdated
var stringx = "-bar" | ||
capitalizeAsciiInPlace(stringx) | ||
doAssert stringx == "-bar" | ||
toUpperAsciiInPlace(s[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toUpperAsciiInPlace(s[0]) | |
s[0] = toUpperAscii(s[0]) |
toLowerAsciiInPlace(strng) | ||
doAssert strng == "nim" | ||
for c in mitems(s): | ||
if c in {'A'..'Z'}: c = chr(ord(c) + (ord('a') - ord('A'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add {.inline.}
to to{Lower,Upper}Ascii(char)
then use them directly instead? Please don't reply "Done" to my suggestions then do it your way.
if c in {'A'..'Z'}: c = chr(ord(c) + (ord('a') - ord('A'))) | |
c = toLowerAscii(c) |
If you disagree with my suggestions, please voice up instead of keep doing it your way.
I assume what you're referring to is an new strutils that's completely in-place, then users may use |
Yes. |
|
I only use strutils because I have nothing better. I thought about introducing |
@timotheecour If you want to PR do it, I close this no problem, @Araq I think of using a separate file and doing |
I have a lot of PR's open already, and you're almost done (after addressing review comments) so just keep working on your PR instead
include should be almost always be avoided (and always avoided when import works); IMO that new module should be done in a separate PR |
lib/pure/strutils.nim
Outdated
var x = "abcdefghijklmnopqrstuvwxyz01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ*+,-./:;<=>?@[]_{|}~" | ||
toLowerAsciiInPlace(x) | ||
doAssert x == "abcdefghijklmnopqrstuvwxyz01234567890abcdefghijklmnopqrstuvwxyz*+,-./:;<=>?@[]_{|}~" | ||
const t = "abcdefghijklmnopqrstuvwxyz".indent(65).cstring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly, pointless lookup table. Instead use c = chr(ord(c) - ord('A') + ord('a'))
. Basic math, it's a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is slower on my bench, but if you say so 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Microbenchmarks are misleading, in reality you can also have I-cache problems and then you appreciate not everything was "optimized" by 8 times loop unrolling with SSE instructions and 1KB lookup tables.
lib/pure/strutils.nim
Outdated
var z = "abcdefghijklmnopqrstuvwxyz01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ*+,-./:;<=>?@[]_{|}~" | ||
toUpperAsciiInPlace(z) | ||
doAssert z == "ABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ*+,-./:;<=>?@[]_{|}~" | ||
const t = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".indent(97).cstring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
Sorry, I have to reject this before even more time and effort is spent on it.
|
I pressed the Close button too, and it failed, now I know why :P @Araq import forceInline
forceInline( someProc ) # Adds {.inline.} to someProc 🙂 |
No, that's not possible. We would need to delay code generation until the full program was processed, that means it's anti-modular. |
strutils
in-place functions.code-block
,runnableExamples
,since
, changelog.Bench
Try yourself if you want: