-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kyber: avoid ASAN error on arm64 #1914
Conversation
The FIPS202x2 implementation does not play well with ASAN. Specifically, when handling the trailing blocks that are not divisible by 8, the implementation uses an 8 byte load followed by a mask to process the trailing bytes of the input (https://github.com/open-quantum-safe/liboqs/blob/d93a431aaf9ac929f267901509e968a5727c053c/src/kem/kyber/oldpqclean_kyber512_aarch64/fips202x2.c#L475-L476) While this may work in practice, performing an 8 byte load on an input with fewer than 8 bytes remaining leads to UB (and an ASAN error) when the input size cannot accomodate such a load. A simple repro, built with `-fsanitize=address` shows this: ```c static void keypair(const char* alg) { OQS_KEM* kem = OQS_KEM_new(alg); unsigned char* public = malloc(kem->length_public_key); unsigned char* sec = malloc(kem->length_secret_key); kem->keypair(public, sec); free(sec); free(public); OQS_KEM_free(kem); } int main() { keypair(OQS_KEM_alg_kyber_512); keypair(OQS_KEM_alg_kyber_768); keypair(OQS_KEM_alg_kyber_1024); return 0; } ``` results in the following error: ``` ================================================================= ==613880==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xfffff57a5160 at pc 0x0000004e9ef4 bp 0xfffff57a5030 sp 0xfffff57a5020 READ of size 8 at 0xfffff57a5160 thread T0 #0 0x4e9ef0 in keccakx2_absorb.constprop.1 (/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4e9ef0) open-quantum-safe#1 0x4a608c in PQCLEAN_KYBER512_AARCH64_neon_kyber_shake128_absorb (/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a608c) open-quantum-safe#2 0x4a51a8 in PQCLEAN_KYBER512_AARCH64_gen_matrix (/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a51a8) open-quantum-safe#3 0x4a5360 in PQCLEAN_KYBER512_AARCH64_indcpa_keypair (/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a5360) open-quantum-safe#4 0x45e460 in PQCLEAN_KYBER512_AARCH64_crypto_kem_keypair (/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x45e460) open-quantum-safe#5 0x4075c0 in keypair (/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4075c0) open-quantum-safe#6 0x405598 in main (/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x405598) open-quantum-safe#7 0xfffcbbd772fc in __libc_start_call_main (/lib64/libc.so.6+0x272fc) open-quantum-safe#8 0xfffcbbd773d4 in __libc_start_main_impl (/lib64/libc.so.6+0x273d4) open-quantum-safe#9 0x40746c in _start (/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x40746c) Address 0xfffff57a5160 is located in stack of thread T0 at offset 64 in frame #0 0x4a5efc in PQCLEAN_KYBER512_AARCH64_neon_kyber_shake128_absorb (/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a5efc) This frame has 2 object(s): [32, 66) 'extseed1' (line 59) <== Memory access at offset 64 partially overflows this variable [112, 146) 'extseed2' (line 60) HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow (/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4e9ef0) in keccakx2_absorb.constprop.1 Shadow bytes around the buggy address: 0x200ffeaf49d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200ffeaf49e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200ffeaf49f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200ffeaf4a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200ffeaf4a10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x200ffeaf4a20: 00 00 00 00 f1 f1 f1 f1 00 00 00 00[02]f2 f2 f2 0x200ffeaf4a30: f2 f2 00 00 00 00 02 f3 f3 f3 f3 f3 00 00 00 00 0x200ffeaf4a40: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 0x200ffeaf4a50: f1 f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200ffeaf4a60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x200ffeaf4a70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==613880==ABORTING ``` This PR sidesteps this issue by ensuring that `extseed1` and `extseed2` are sized to be divisible by 8, which avoids the problematic codepath. Other alternatives considered for this PR were to modify the `keccakx2_absorb` implementation (e.g. to memcpy the remaining data into a local 8 byte buffer first) -- however this approach inhibited inlining of the absorb implementation and resulted in a call to `memcpy` which could not be optimized out due to the variable-length `inlen`. Tested with `ninja run_tests` and ensured that ASAN repro no longer fires. Co-Authored-By: Kyle Nekritz <knekritz@meta.com> Signed-off-by: Mingtao Yang <mingtao@meta.com>
b871739
to
ed86fd0
Compare
Looking more closely, it seems as if PQClean (upstream of this repository?) has this change already implemented. |
Hi @mingtaoy, thank you for identifying the issue and opening the PR!
Correct, PQClean is the upstream of the Kyber arm64 implementation. Does the upstream fix in your view solve the issue? If so, we can apply the upstream fix as a patch to our upstream import script. Please let me know if you'd like to do so, or otherwise I can assist. |
Upstream already changed Kyber to ML-KEM (draft), so we have to back-port the fix. |
So upstream doesn't support Kyber any more? How long does OQS want to do this then? Sounds like a constantly growing obligation, that... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes themselves are fine, but I think they will get wiped out next time we run copy_from_upstream. So they would need to be implemented in patch files, I think?
Good point. I'd suggest we set a timeline for removing Kyber, with input from any interested parties. I think the main reason for keeping Kyber was to maintain (for a time) compatibility with the TLS hybrid Kyber key exchanges that have been deployed in Chrome, Cloudflare, etc. If that's the main driver, then we could set a sunset timeline of say 4 months after we and Chrome ship the new ML-KEM hybrid key exchanges. |
I think that's fine, since the next
I think this would be the reason to implement the change in the patch file. The next time |
All very valid questions/concerns, @zxjtan . All these require manual effort and that is what I questioned above.
I'd second that approach -- under the assumption that interested parties asking for a longer support period than the proposal below contribute to the required work: I'm all for supporting user's needs but we need to balance this with the work load this puts on a small team: Particularly the latter lets me suggest asking for hands-on support by folks asking for specific properties/features. Created #1915 to track.
What about we remove Kyber 1 month after that from "main" -- it's then still in the previous release for users depending on that and we reduce the risk/security exposure to OQS by not suitably quickly following the latest upstream updates (or spend the work required to move all those updates into patches)? That said, I see a slight flaw in this logic:
IMO this is a very weak reason and really has nothing to do with From that perspective, I don't see any problem removing Kyber ASAP (if Chromium/CF TLS compatibility is indeed the main reason for keeping Kyber). |
The FIPS202x2 implementation does not play well with ASAN. Specifically, when handling the trailing blocks that are not divisible by 8, the implementation uses an 8 byte load followed by a mask to process the trailing bytes of the input
(
liboqs/src/kem/kyber/oldpqclean_kyber512_aarch64/fips202x2.c
Lines 475 to 476 in d93a431
While this may work in practice, performing an 8 byte load on an input with fewer than 8 bytes remaining leads to UB (and an ASAN error) when the input size cannot accomodate such a load.
A simple repro, built with
-fsanitize=address
shows this:results in the following error:
This PR sidesteps this issue by ensuring that
extseed1
andextseed2
are sized to be divisible by 8, which avoids the problematic codepath.Other alternatives considered for this PR were to modify the
keccakx2_absorb
implementation (e.g. to memcpy the remaining data into a local 8 byte buffer first) -- however this approach inhibited inlining of the absorb implementation and resulted in a call tomemcpy
which could not be optimized out due to the variable-lengthinlen
.Tested with
ninja run_tests
and ensured that ASAN repro no longer fires.