-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is needed to make this crate ready #1
Comments
I've been substantially reworking One thing I could really use help with, is how to approach code sharing between BLAKE2b and BLAKE2s. I've seen people use giant macros to define the entire crate, such that you can substitute in |
I'll take a look at the latest code and see if I have any ideas. One other question, have you given any thought to being compatible https://crates.io/crates/digest from the rustcrypto project? this would be really useful as it makes it really nice in many places to abstract over the hash function. |
When I looked at it before, it wasn't clear to me that a complete implementation of BLAKE2 would fit well with the traits there. There's the variable output length, the other input parameters, and the final node flag. Those could be exposed through concrete methods apart from the traits, of course, but I wasn't sure the result would be graceful. For example, if I construct a digest with a short output length, but the digest type itself implements There's probably a reasonable answer to all of this, but I haven't thought about it long enough. |
I was just trying to figure out how to add more configuration into the blake impl in there: RustCrypto/hashes#75, I'll give it some more thought how this could fit in there effectively. I believe the current approach taken by the rustcrypto blake impl is that only the standard sizes support fixedoutput, and the configurable version gives you variable output, which seems to be good enough for most use cases. Now it would be great of course if you could do something like |
Yeah I assume a lot of this is going to get overhauled when Rust finally ships const generics. |
|
It looks like WireGuard is GPL. Since my crates are intended to be MIT, I'm afraid I can't use GPL code for inspiration. (Unless an OSS lawyer comes along and tells me I have that wrong?) It's awesome that it's there though. I'm getting very close to a big |
Yeah that is tricky to navigate :/ I have heard advice in both directions. Knowing it exists is great though, in any case.
awesome |
Not promising anything, but if I find the time I'll take a stab converting the existing impl to avx512 myself, I really want to know how fast this can go. |
If you look at By the way, this is where I think the |
lol no, ran into an issue with secret keys that's got me reworking the API again |
well, I am working on adding the avx512 intrinsics to rust in the first place, no "quick experiment" :D |
First benchmarks are looking very promising:
Code: https://github.com/dignifiedquire/blake2s_simd/blob/feat/avx512/src/avx512.rs |
I...wow...that's kind of beyond my wildest dreams for the instruction set. How can it be more than a factor of two? Does that mean it's faster to put a 256-bit vector in a 512-bit register and waste half of it, than it is to work on it in a 256-bit register? |
there are two possible reasons (1) I messed up somewhere |
Good point! God damn I'm so excited to try this out. |
Those instructions are making a good difference, I added a version to the before
after
|
Oh interesting, I might be confused. Does AVX512 add rotation instructions that apply to 128-bit vectors? I didn't realize that. |
|
I think it should also be possible to construct an 8x version of blake2s and an 4x version for blake2b using avx512, by combining things inside a round, in the same way the SSE41 implementation works, or am I missing sth? This would be very nice, because it would allow to use the speed ups for Blake2bp and Blake2sp. Edit: With the difference that only 2 ops instead of 4 ops are combined per round. |
Yes, totally possible. The BLAKE2bp/sp case sounds like the best reason to do it, since that's restricted to 4/8 inputs. There could also be some speedups for the |
I'm really interested to see what speedup we can get for |
Finally finished the |
great stuff 💯 @oconnor663 any chance you could publish the current |
I'm a little hesitant to publish this repo as-is, since the current version isn't well tested. Would it be alright if I tried to get a new version out within the next few days, based on the current |
That would be really cool, if you can do that! |
@oconnor663 how are thing going? anything I can help with? |
It's going well. You can follow along in this branch: https://github.com/oconnor663/blake2b_simd/tree/copying_blake2s Fingers crossed I'll have a first published release of |
We're live! https://crates.io/crates/blake2s_simd |
Awesome, thank you!!! |
Let me know how the new interface works for you, especially if you touch anything in |
@dignifiedquire have you kept working on the AVX512 stuff? I've noticed an issue when I've tried to do it myself, and it seems to repro on your branch too. If I write a function like this:
The assembly output is this:
I think that Did you ever notice this corrupting your output? Is this some kind of LLVM bug |
I haven’t had time to continue yet unfortunately :/
I am not sure if I had the same issue or not, are you using my simd fork for the instructions? it could also be that sth in the bindings between rust and llvm gets screwed up
…On 22. Jun 2019, 23:37 +0200, oconnor663 ***@***.***>, wrote:
@dignifiedquire have you kept working on the AVX512 stuff? I've noticed an issue when I've tried to do it myself, and it seems to repro on your branch. If I write a function like this:
#[target_feature(enable = "avx512f")]
pub unsafe fn example(a: __m512i) -> __m512i {
_mm512_ror_epi32(a, 1)
}
The assembly output is this:
mov rax, rdi
vprorq zmm0, zmmword, ptr, [rsi], 1
vmovdqa64 zmmword, ptr, [rdi], zmm0
vzeroupper
ret
I think that vprorq is wrong. It should be vprord nor vprorq, right? For 32-bit "doublewords"? This seems to give wrong answers when I test it too.
Did you ever notice this corrupting your output? Is this some kind of LLVM bug
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Yeah I was experimenting with some bindings of my own and comparing that to your |
Oh I think I figured it out. The |
Hey, I am interested in using this code, as it is quite a lot faster than the implementation I currently rely on (https://github.com/rustcrypto/hashes). So I was wondering what it would take to get this code ready for publishing and use from your perspective. Happy to help if there is something that I can do.
The text was updated successfully, but these errors were encountered: