-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected performance of hash maps #785
Comments
Thank you @jannisteunissen for this report. I tested your code on my machine, and I got similar results with |
I have limited knowledge of hash maps, but am puzzled by this small modification of @jannisteunissen's program. If replace the From inspection, it seems that for
Does anyone know if this is expected? @wclodius2 program chaining_hashmap_benchmark
use stdlib_kinds, only: dp, int8, int32
use stdlib_hashmaps, only : open_hashmap_type, chaining_hashmap_type
use stdlib_hashmap_wrappers
implicit none
!! Choice of hashmap -- should these do the same thing?
!type(open_hashmap_type) :: map
type(chaining_hashmap_type) :: map
integer, parameter :: n_max = 5*1000*1000
integer :: n
integer(int32), allocatable :: keys(:)
integer, allocatable :: key_counts(:)
real(dp) :: t_start, t_end
real(dp), allocatable :: r_uniform(:)
type(key_type) :: key
logical :: present
integer(int8) :: int32_as_int8(4)
call map%init(fnv_1_hasher, slots_bits=10)
allocate(keys(n_max), r_uniform(n_max))
call random_number(r_uniform)
keys = nint(r_uniform * n_max * 0.25_dp)
call cpu_time(t_start)
do n = 1, n_max
int32_as_int8 = transfer(keys(n), int32_as_int8)
call set(key, int32_as_int8)
call map%key_test(key, present)
if (present) then
!print*, map%entries()
call map%remove(key)
!print*, map%entries() ! Did this reduce?
else
call map%map_entry(key)
end if
end do
call cpu_time(t_end)
write(*, "(A,E12.4)") "Elapsed time (s) ", t_end - t_start
write(*, "(A,E12.4)") "Entries/s ", n_max/(t_end - t_start)
write(*, "(A,I12)") "n_occupied ", map%entries()
write(*, "(A,I12)") "n_buckets ", map%num_slots()
! Count number of keys that occur an odd number of times
allocate(key_counts(minval(keys):maxval(keys)))
key_counts = 0
do n = 1, n_max
key_counts(keys(n)) = key_counts(keys(n)) + 1
end do
n = sum(iand(key_counts, 1))
if (n /= map%entries()) then
error stop "FAILED"
else
print *, "PASSED"
end if
! Clean up allocated storage
deallocate(key_counts)
end program chaining_hashmap_benchmark |
@gareth-nx I spotted the same issue, and plan to submit a PR soon to fix it. |
@jvdp1 I was just looking at this, and found that if we append the line
right at the end of I guess we need to add some additional tests of the hashmaps! |
Glad to see that this test was also useful in other ways! I'd be happy to contribute it to to the test suite. Let me also mention that the code I compared with used open addressing (and quadratic probing), so in principle better performance should be possible for |
I have been very busy lately, and am far behind in responding to my emails. I apologize for only responding at this late a date. I had stopped participating in the stdlib effort, because other aspects of my life made it impractical to participate to the extent that I could still be useful.It looks like you have solved the problem, so my participation is not necessary. If you would find my participation useful I will see what I can do, but I am on vacation and don’t have my full resources, I am busy with the trip, and it will take time to come fully up to speed. On Apr 4, 2024, at 4:10 PM, Jannis Teunissen ***@***.***> wrote:
Glad to see that this test was also useful in other ways! I'd be happy to contribute it to to the test suite. Let me also mention that the code I compared with used open addressing (and quadratic probing), so in principle better performance should be possible for open_hashmap_type.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
No problem at all @wclodius2 -- enjoy your vacation -- and please don't feel obliged to respond unless you feel like it. |
Today I started exploring the stdlib (which I think is a great effort!) and I did a performance test of the hash map implementation. The test I did is rather simple: generate keys (with duplicates), and then add a key if it is not yet present, and remove a key if it is already present. The code is copied below. I compared performance against another Fortran hash map implementation I once made (https://github.com/jannisteunissen/ffhash), which was based on khash from https://github.com/attractivechaos/klib
With standard compile flags (-O2) and a test size of 5 million, on an Intel i7-1185G7 laptop CPU, I got the following numbers:
while the other code (ffhash, murmur3 hasher)
A factor 10 seems a bit much. I would expect some overhead due to the class-type interface, but not this much. Perhaps this benchmark can be useful in improving the performance.
The benchmark I compared against is https://github.com/jannisteunissen/ffhash/blob/master/example_benchmark.f90
The text was updated successfully, but these errors were encountered: