Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kyber: avoid ASAN error on arm64 #1914

Closed
wants to merge 1 commit into from

Conversation

mingtaoy
Copy link

@mingtaoy mingtaoy commented Sep 4, 2024

The FIPS202x2 implementation does not play well with ASAN. Specifically, when handling the trailing blocks that are not divisible by 8, the implementation uses an 8 byte load followed by a mask to process the trailing bytes of the input
(

a = vld1_u64((uint64_t *)&in0[pos]);
b = vld1_u64((uint64_t *)&in1[pos]);
)

While this may work in practice, performing an 8 byte load on an input with fewer than 8 bytes remaining leads to UB (and an ASAN error) when the input size cannot accomodate such a load.

A simple repro, built with -fsanitize=address shows this:

static void keypair(const char* alg) {
        OQS_KEM* kem = OQS_KEM_new(alg);
        unsigned char* public = malloc(kem->length_public_key);
        unsigned char* sec = malloc(kem->length_secret_key);

        kem->keypair(public, sec);
        free(sec);
        free(public);
        OQS_KEM_free(kem);
}
int main() {
        keypair(OQS_KEM_alg_kyber_512);
        keypair(OQS_KEM_alg_kyber_768);
        keypair(OQS_KEM_alg_kyber_1024);
        return 0;
}

results in the following error:

=================================================================
==613880==ERROR: AddressSanitizer: stack-buffer-overflow on address
0xfffff57a5160 at pc 0x0000004e9ef4 bp 0xfffff57a5030 sp 0xfffff57a5020
READ of size 8 at 0xfffff57a5160 thread T0
    #0 0x4e9ef0 in keccakx2_absorb.constprop.1
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4e9ef0)
    #1 0x4a608c in PQCLEAN_KYBER512_AARCH64_neon_kyber_shake128_absorb
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a608c)
    #2 0x4a51a8 in PQCLEAN_KYBER512_AARCH64_gen_matrix
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a51a8)
    #3 0x4a5360 in PQCLEAN_KYBER512_AARCH64_indcpa_keypair
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a5360)
    #4 0x45e460 in PQCLEAN_KYBER512_AARCH64_crypto_kem_keypair
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x45e460)
    #5 0x4075c0 in keypair
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4075c0)
    #6 0x405598 in main
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x405598)
    #7 0xfffcbbd772fc in __libc_start_call_main
(/lib64/libc.so.6+0x272fc)
    #8 0xfffcbbd773d4 in __libc_start_main_impl
(/lib64/libc.so.6+0x273d4)
    #9 0x40746c in _start
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x40746c)

Address 0xfffff57a5160 is located in stack of thread T0 at offset 64 in
frame
    #0 0x4a5efc in PQCLEAN_KYBER512_AARCH64_neon_kyber_shake128_absorb
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a5efc)

  This frame has 2 object(s):
    [32, 66) 'extseed1' (line 59) <== Memory access at offset 64
partially overflows this variable
    [112, 146) 'extseed2' (line 60)
HINT: this may be a false positive if your program uses some custom
stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4e9ef0) in
keccakx2_absorb.constprop.1
Shadow bytes around the buggy address:
  0x200ffeaf49d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf49e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf49f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf4a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf4a10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x200ffeaf4a20: 00 00 00 00 f1 f1 f1 f1 00 00 00 00[02]f2 f2 f2
  0x200ffeaf4a30: f2 f2 00 00 00 00 02 f3 f3 f3 f3 f3 00 00 00 00
  0x200ffeaf4a40: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  0x200ffeaf4a50: f1 f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf4a60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf4a70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==613880==ABORTING

This PR sidesteps this issue by ensuring that extseed1 and extseed2 are sized to be divisible by 8, which avoids the problematic codepath.

Other alternatives considered for this PR were to modify the keccakx2_absorb implementation (e.g. to memcpy the remaining data into a local 8 byte buffer first) -- however this approach inhibited inlining of the absorb implementation and resulted in a call to memcpy which could not be optimized out due to the variable-length inlen.

Tested with ninja run_tests and ensured that ASAN repro no longer fires.

  • Does this PR change the input/output behaviour of a cryptographic algorithm (i.e., does it change known answer test values)? (If so, a version bump will be required from x.y.z to x.(y+1).0.) -> NO
  • Does this PR change the list of algorithms available -- either adding, removing, or renaming? Does this PR otherwise change an API? (If so, PRs in fully supported downstream projects dependent on these, i.e., oqs-provider will also need to be ready for review and merge by the time this is merged.) -> NO

The FIPS202x2 implementation does not play well with ASAN. Specifically,
when handling the trailing blocks that are not divisible by 8, the
implementation uses an 8 byte load followed by a mask to process the
trailing bytes of the input
(https://github.com/open-quantum-safe/liboqs/blob/d93a431aaf9ac929f267901509e968a5727c053c/src/kem/kyber/oldpqclean_kyber512_aarch64/fips202x2.c#L475-L476)

While this may work in practice, performing an 8 byte load on an input
with fewer than 8 bytes remaining leads to UB (and an ASAN error) when
the input size cannot accomodate such a load.

A simple repro, built with `-fsanitize=address` shows this:

```c

static void keypair(const char* alg) {
        OQS_KEM* kem = OQS_KEM_new(alg);
        unsigned char* public = malloc(kem->length_public_key);
        unsigned char* sec = malloc(kem->length_secret_key);

        kem->keypair(public, sec);
        free(sec);
        free(public);
        OQS_KEM_free(kem);
}
int main() {
        keypair(OQS_KEM_alg_kyber_512);
        keypair(OQS_KEM_alg_kyber_768);
        keypair(OQS_KEM_alg_kyber_1024);
        return 0;
}
```

results in the following error:

```
=================================================================
==613880==ERROR: AddressSanitizer: stack-buffer-overflow on address
0xfffff57a5160 at pc 0x0000004e9ef4 bp 0xfffff57a5030 sp 0xfffff57a5020
READ of size 8 at 0xfffff57a5160 thread T0
    #0 0x4e9ef0 in keccakx2_absorb.constprop.1
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4e9ef0)
    open-quantum-safe#1 0x4a608c in PQCLEAN_KYBER512_AARCH64_neon_kyber_shake128_absorb
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a608c)
    open-quantum-safe#2 0x4a51a8 in PQCLEAN_KYBER512_AARCH64_gen_matrix
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a51a8)
    open-quantum-safe#3 0x4a5360 in PQCLEAN_KYBER512_AARCH64_indcpa_keypair
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a5360)
    open-quantum-safe#4 0x45e460 in PQCLEAN_KYBER512_AARCH64_crypto_kem_keypair
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x45e460)
    open-quantum-safe#5 0x4075c0 in keypair
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4075c0)
    open-quantum-safe#6 0x405598 in main
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x405598)
    open-quantum-safe#7 0xfffcbbd772fc in __libc_start_call_main
(/lib64/libc.so.6+0x272fc)
    open-quantum-safe#8 0xfffcbbd773d4 in __libc_start_main_impl
(/lib64/libc.so.6+0x273d4)
    open-quantum-safe#9 0x40746c in _start
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x40746c)

Address 0xfffff57a5160 is located in stack of thread T0 at offset 64 in
frame
    #0 0x4a5efc in PQCLEAN_KYBER512_AARCH64_neon_kyber_shake128_absorb
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4a5efc)

  This frame has 2 object(s):
    [32, 66) 'extseed1' (line 59) <== Memory access at offset 64
partially overflows this variable
    [112, 146) 'extseed2' (line 60)
HINT: this may be a false positive if your program uses some custom
stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow
(/data/users/mingtao/3/oqsrepro/buildopt/oqsrepro+0x4e9ef0) in
keccakx2_absorb.constprop.1
Shadow bytes around the buggy address:
  0x200ffeaf49d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf49e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf49f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf4a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf4a10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x200ffeaf4a20: 00 00 00 00 f1 f1 f1 f1 00 00 00 00[02]f2 f2 f2
  0x200ffeaf4a30: f2 f2 00 00 00 00 02 f3 f3 f3 f3 f3 00 00 00 00
  0x200ffeaf4a40: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  0x200ffeaf4a50: f1 f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf4a60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200ffeaf4a70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==613880==ABORTING
```
This PR sidesteps this issue by ensuring that `extseed1` and `extseed2`
are sized to be divisible by 8, which avoids the problematic codepath.

Other alternatives considered for this PR were to modify the
`keccakx2_absorb` implementation (e.g. to memcpy the remaining data
into a local 8 byte buffer first) -- however this approach inhibited
inlining of the absorb implementation and resulted in a call to `memcpy`
which could not be optimized out due to the variable-length `inlen`.

Tested with `ninja run_tests` and ensured that ASAN repro no longer
fires.

Co-Authored-By: Kyle Nekritz <knekritz@meta.com>

Signed-off-by: Mingtao Yang <mingtao@meta.com>
@mingtaoy mingtaoy marked this pull request as ready for review September 4, 2024 20:36
@mingtaoy mingtaoy changed the title Kyber symmetric shake: avoid ASAN error Kyber symmetric shake: avoid ASAN error on arm64 Sep 4, 2024
@mingtaoy mingtaoy changed the title Kyber symmetric shake: avoid ASAN error on arm64 Kyber: avoid ASAN error on arm64 Sep 4, 2024
@mingtaoy
Copy link
Author

mingtaoy commented Sep 4, 2024

Looking more closely, it seems as if PQClean (upstream of this repository?) has this change already implemented.

@bhess
Copy link
Member

bhess commented Sep 5, 2024

Hi @mingtaoy, thank you for identifying the issue and opening the PR!

Looking more closely, it seems as if PQClean (upstream of this repository?) has this change already implemented.

Correct, PQClean is the upstream of the Kyber arm64 implementation. Does the upstream fix in your view solve the issue? If so, we can apply the upstream fix as a patch to our upstream import script. Please let me know if you'd like to do so, or otherwise I can assist.
Directly modifying the sources in liboqs will trigger a CI failure since it will go out of sync with the upstream repository.

@baentsch
Copy link
Member

baentsch commented Sep 5, 2024

we can apply the upstream fix as a patch to our upstream import script.

Isn't this a bit complicated? If it's already landed in PQClean (?), we should be able to simply do a new copy_from_upstream, no, @bhess?

@bhess
Copy link
Member

bhess commented Sep 5, 2024

Isn't this a bit complicated? If it's already landed in PQClean (?), we should be able to simply do a new copy_from_upstream, no, @bhess?

Upstream already changed Kyber to ML-KEM (draft), so we have to back-port the fix.

@baentsch
Copy link
Member

baentsch commented Sep 5, 2024

Upstream already changed Kyber to ML-KEM (draft), so we have to back-port the fix.

So upstream doesn't support Kyber any more? How long does OQS want to do this then? Sounds like a constantly growing obligation, that...

Copy link
Member

@dstebila dstebila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes themselves are fine, but I think they will get wiped out next time we run copy_from_upstream. So they would need to be implemented in patch files, I think?

@dstebila
Copy link
Member

dstebila commented Sep 5, 2024

Upstream already changed Kyber to ML-KEM (draft), so we have to back-port the fix.

So upstream doesn't support Kyber any more? How long does OQS want to do this then? Sounds like a constantly growing obligation, that...

Good point. I'd suggest we set a timeline for removing Kyber, with input from any interested parties. I think the main reason for keeping Kyber was to maintain (for a time) compatibility with the TLS hybrid Kyber key exchanges that have been deployed in Chrome, Cloudflare, etc. If that's the main driver, then we could set a sunset timeline of say 4 months after we and Chrome ship the new ML-KEM hybrid key exchanges.

@zxjtan
Copy link
Contributor

zxjtan commented Sep 5, 2024

The changes themselves are fine, but I think they will get wiped out next time we run copy_from_upstream.

I think that's fine, since the next copy_from_upstream run would bring in the upstream fix.

Directly modifying the sources in liboqs will trigger a CI failure since it will go out of sync with the upstream repository.

I think this would be the reason to implement the change in the patch file.

The next time copy_from_upstream is run, is there some mechanism to notify the maintainer to exclude this change from the patch? A new upstream sync may introduce many merge conflicts with patch files, how would a maintainer know in general which ones to fix/resolve and which ones to take the incoming upstream change?

@baentsch
Copy link
Member

baentsch commented Sep 6, 2024

The next time copy_from_upstream is run, is there some mechanism to notify the maintainer to exclude this change from the patch? A new upstream sync may introduce many merge conflicts with patch files, how would a maintainer know in general which ones to fix/resolve and which ones to take the incoming upstream change?

All very valid questions/concerns, @zxjtan . All these require manual effort and that is what I questioned above.

I'd suggest we set a timeline for removing Kyber, with input from any interested parties

I'd second that approach -- under the assumption that interested parties asking for a longer support period than the proposal below contribute to the required work: I'm all for supporting user's needs but we need to balance this with the work load this puts on a small team: Particularly the latter lets me suggest asking for hands-on support by folks asking for specific properties/features.

Created #1915 to track.

we could set a sunset timeline of say 4 months after we and Chrome ship the new ML-KEM hybrid key exchanges.

What about we remove Kyber 1 month after that from "main" -- it's then still in the previous release for users depending on that and we reduce the risk/security exposure to OQS by not suitably quickly following the latest upstream updates (or spend the work required to move all those updates into patches)?

That said, I see a slight flaw in this logic:

I think the main reason for keeping Kyber was to maintain (for a time) compatibility with the TLS hybrid Kyber key exchanges that have been deployed in Chrome, Cloudflare, etc.

IMO this is a very weak reason and really has nothing to do with liboqs: Anyone interested in still running that could easily build oqsprovider (that actually implements this functionality) with an older release/commit of liboqs (still containing Kyber). Its ability to build against downlevel (possibly pre-installed) liboqs releases is one further reason as to why oqsprovider contains no algorithm-dependent code.

From that perspective, I don't see any problem removing Kyber ASAP (if Chromium/CF TLS compatibility is indeed the main reason for keeping Kyber).

@baentsch
Copy link
Member

Closed via #1922. Thanks again for the heads up and contribution @mingtaoy !

@baentsch baentsch closed this Sep 13, 2024
@mingtaoy mingtaoy deleted the kyber-asan-fix branch September 13, 2024 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants