-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SIMD instructions to update data pointers. #153
Conversation
Signed-off-by: Nicola Torracca <nicola.torracca@gmail.com>
Do you know about latencies of [failed] store-forwarding? Read e.g. https://stackoverflow.com/questions/46135766/can-modern-x86-implementations-store-forward-from-more-than-one-prior-store :
OTOH, I don't know the rest of this code. The better strategy may be storing pointers in a SIMD register permanently and using e.g. PEXTRD to copy them into scalar registers. Or even using VPGATHER where it available: https://stackoverflow.com/questions/21774454/how-are-the-gather-instructions-in-avx2-implemented |
I'm familiar with store-forwarding, but it doesn't look like it should be a problem here. The pointers are updated only once at the end of the function, outside the hot loop. My patch it's mainly to reduce code size, not for performace. Also, as your quote says the store-fowarding fails if your read back a bigger chuck than the older write, so writing them as a single vector should help by having all the data in one or two store buffers. |
you are right, sorry for the noise. i didn't check the code before commenting :( |
No worries ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR. I left a comment.
Signed-off-by: Nicola Torracca <nicola.torracca@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more change needed, thanks
Fixed wrong vmov instruction type.
Thanks for the changes. I will squash the third commit into the first commit once this is merged. |
sha512_mb/sha512_mb_x2_avx.asm
Outdated
mov [STATE + _data_ptr_sha512 + 0*PTR_SZ], inp0 | ||
add inp1, IDX | ||
mov [STATE + _data_ptr_sha512 + 1*PTR_SZ], inp1 | ||
vmovq xmm0, IDX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After giving an extra thought, I don't think making this change is worth it, for two reasons:
1 - It's replacing 4 instructions with 5 instructions
2 - We are removing AVX code in the next few months (already mentioned it).
So could you drop the changes in the AVX implementations and leave just the other ones?
Many thanks for the work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I've just reverted the _avx functions to the old version.
Signed-off-by: Nicola Torracca <nicola.torracca@gmail.com>
Thanks for the work @Shark64. This is now merged in mainline. |
Hi, I've noticed that even if the hashes code uses vector instructions for the computation the update of the data pointers at the end is still done using scalar code. We can save some time and code size by loading the pointers as a vector, update them all together and store them back.
I've run ``make test'' and it worked without errors on Linux_x86-64 with RocketLake CPU.