-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use embedded broadcast for avx512 constants. #139
Conversation
Use embedded broadcast for constants & VPTERNLOG for 3-way bitwise logical operations.
Hi @Shark64. There is only one commit with all the changes proposed. Could you have multiple commits or better, multiple PRs to review this independently? Thanks! (also, the sign-off line should be at the end of the commit) |
Hi, ok, you mean a different PR for each hashing function? I made just one commit because it's the same kind of change for all the hashes, but i can make a separate PR for each one if you prefer. |
No need to do it for every hash. What I meant is that you are also doing more than using broadcasts. |
Oh yeah now i understand what you meant. I did that because they seemed small enough changes that a different pull request and commit would clutter the git log; Do you prefer a separate pull request for them? I'll try to see if i can revert them in this branch, i'm not that expert with git wizardry but i'll give it a try ;) |
Signed-off-by: Nicola Torracca <shark@bitchx.it>
Signed-off-by: Nicola Torracca <shark@bitchx.it>
Hi @Shark64. Right, "git add -p" is your friend :) You can rebase your branch on top of latest code and add your changes one by one, keeping only the ones for the broadcast first, create a commit and then same for the other changes. We have pushed a fix for the clang-format, so no need for your other commit and we are going to push changes on sm3_mb, so I suggest waiting for this change (will be done shortly). |
@@ -354,7 +354,7 @@ func(mh_sha1_murmur3_x64_128_block_avx512) | |||
VMOVPS HH3, [mh_digests_p + 64*3] | |||
VMOVPS HH4, [mh_digests_p + 64*4] | |||
;a mask used to transform to big-endian data | |||
vmovdqa64 SHUF_MASK, [PSHUFFLE_BYTE_FLIP_MASK] | |||
vbroadcasti32x4 SHUF_MASK, [PSHUFFLE_BYTE_FLIP_MASK] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these broadcasts can be part of the first commit.
@@ -219,7 +219,7 @@ section .text | |||
;; H becomes T2, then add T1 for A | |||
;; D becomes D + T1 for E | |||
|
|||
vpaddd T1, H, TMP3 ; T1 = H + Kt | |||
vpaddd T1, H, [TBL + ((%%ROUND)*4)]{1to16} ; T1 = H + Kt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you better leave this out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but I think that we need this to make broadcast work, otherwise we'll end loading the wrong constants (after removing the duplicated values in the .data section).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you are right, sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries :D
@@ -341,7 +341,7 @@ section .text | |||
%define %%WT %1 | |||
%define %%OFFSET %2 | |||
mov inp0, [IN + (%%OFFSET*8)] | |||
vmovups %%WT, [inp0+IDX] | |||
vmovdqu64 %%WT, [inp0+IDX] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be another commits, turning vmovups into vmovdqu64
vmovups W13,[inp5+IDX] | ||
vmovups W14,[inp6+IDX] | ||
vmovups W15,[inp7+IDX] | ||
lea IDX, [IN] ;; save the base address so the next load have 8bits offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave this as it is for now.
@@ -162,6 +162,9 @@ FIELD _rsp, 8, 8 | |||
%define %%t0 %17 | |||
%define %%t1 %18 | |||
|
|||
; keep the transpose masks in registers | |||
vmovdqa32 TMP2, [PSHUFFLE_TRANSPOSE16_MASK1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid potential conflicts, I would pass these TMP2 and TMP3 as parameters of the macro
@@ -335,7 +334,7 @@ func(mh_sha256_block_avx512) | |||
|
|||
vmovdqa32 TMP3, [TBL] ; First K |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be needed then?
Hi @Shark64. We are pretty almost at code freeze now. OK to leave it for next release? |
Hi @pablodelara, I was trying yesterday to rebase the code as you asked. Is the best way to just rebase my branch to the lastest code, save my patches somwhere else and then reapply them one by one? I ask because i ended up with having both my old commits and the new duplicate ones in the `git log', and i wasn't sure if it's the correct way of doing it. Thanks, sorry for the delay |
Yes, that's what I do. Apply them one by one and resolve the conflicts. |
Yes, that's what I do. Apply them one by one and resolve the conflicts. |
I am sure there are other tricks, but I don't know them :) Yes, those for now, thanks! |
I made another branch with just the embedded broadcast changes, this way the git log looks less polluted to me. If it's ok with you I'll open another pull request to merge that? |
Closing this PR, in favour of the new one. |
As asked in the closed pull request (#123) i've reworked the patch to just two commits: the first one is to replicate the constants using embedded broadcast for EVEX-coded instructions. This saves quite a lot of file size, especially for SHA512.
The second patch is for SM3: basically i've tried to use VPTERNLOG wherever possibile, removing a couple of instructions from the critical path in each macro.
On my system (Linux_x86-64, RocketLake CPU) `make test' passes all the tests without any errors, but I haven't tested it on other platforms.