-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
system.sets card
is slow
#10617
Labels
Comments
also, maybe I'm missing some impl details, but since it's actually using an int8, wouldn't it be faster to count the set bits on an int8 rather than converting to int32 and counting 4X as many bits? |
here's a diff that counts just the 8 bits. It is a little faster: diff --git a/lib/system/sets.nim b/lib/system/sets.nim
index a5f6c5d..c53097b 100644
--- a/lib/system/sets.nim
+++ b/lib/system/sets.nim
@@ -13,8 +13,8 @@ type
NimSet = array[0..4*2048-1, uint8]
proc countBits32(n: int32): int {.compilerproc.} =
- var v = n
- v = v -% ((v shr 1'i32) and 0x55555555'i32)
+ #var v = n
+ var v = n -% ((n shr 1'i32) and 0x55555555'i32)
v = (v and 0x33333333'i32) +% ((v shr 2'i32) and 0x33333333'i32)
result = ((v +% (v shr 4'i32) and 0xF0F0F0F'i32) *% 0x1010101'i32) shr 24'i32
@@ -22,7 +22,13 @@ proc countBits64(n: int64): int {.compilerproc.} =
result = countBits32(toU32(n and 0xffffffff'i64)) +
countBits32(toU32(n shr 32'i64))
-proc cardSet(s: NimSet, len: int): int {.compilerproc.} =
- result = 0
+proc countBits8(n: uint8): int {.compilerproc.} =
+ var v = n
+ while v != 0:
+ result.inc
+ v = v and (v - 1)
+
+proc cardSet(s: NimSet, len: int): int {.compilerproc, inline.} =
for i in countup(0, len-1):
- inc(result, countBits32(int32(s[i])))
+ if likely(s[i] == 0): continue
+ inc(result, countBits8(s[i])) |
Araq
pushed a commit
that referenced
this issue
Feb 10, 2019
this speeds up the system.sets time from ~0.2 to ~0.06 in release mode. This is still slower than intsets and tables (which both are ~0.01). This assumes that most sets will be sparse. fixes #10617
Araq
pushed a commit
that referenced
this issue
Feb 18, 2019
this speeds up the system.sets time from ~0.2 to ~0.06 in release mode. This is still slower than intsets and tables (which both are ~0.01). This assumes that most sets will be sparse. fixes #10617
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For the code here:
https://gist.github.com/brentp/7d440560968c3ac97c40de17200e404a
running with
nim c -d:release -r
show that the system.sets are much faster than either hashsets or intsets as expected.however, when running with
nim c -d:card -d:release -r
they become ~20X slower.Possible Solution
This change:
improves the card speed almost 4X, but perhaps using a builtin popcount would be useful.
The text was updated successfully, but these errors were encountered: