Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valgrind Massif Tool Breaks During Verify Operation of Falcon Algorithms on Raspberry Pi #1761

Open
crt26 opened this issue Apr 22, 2024 · 9 comments
Assignees
Labels
wontfix This will not be worked on

Comments

@crt26
Copy link

crt26 commented Apr 22, 2024

Issue Description

When using the Valgrind memory profiler (version - 3.19.0) with the Massif tool to gain information on the maxHeap and maxStack for an algorithm/operation combination, any Falcon algorithm variation with the verify operation will cause a break in Valgrind and produces the error below.

Main Error:

vex: priv/host_arm64_defs.c:2829 (genSpill_ARM64): Assertion `offsetB < 4096' failed.
vex storage: T total 140151576 bytes allocated
vex storage: P total 0 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

The functions related to the issues according to the Valgrind Outputs are:

  • PQCLEAN_FALCON512_AARCH64_is_short*
  • do_verify
  • OQS_SIG_verify

*This comes from the example output given, in the verbose output files the functions with issues for the other Falcon variations are detailed.

This issue is persistent across Raspberry Pi 4 and 5 models and various build configurations. When using the test_sig_mem script by itself, the verification completes without issue. This is also the case when using the Valgrind profiler by itself without Massif. Furthermore, the issue is the same regardless if the ARM PMU is enabled or disabled.

Standard test_sig_mem Verify Output

Command:

./test_sig_mem Falcon-512 2

Output

Configuration info
==================
Target platform:  aarch64-Linux-6.6.20+rpt-rpi-2712 - ARM PMU options enabled
Compiler:         gcc (12.2.0)
Compile options:  [-march=armv8-a+crypto;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.10.1-dev
Git commit:       6b4e692b8083f391d181087f500b3389ffb007d8 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.2.1 30 Jan 2024)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_SPEED_USE_ARM_PMU OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release
CPU exts active:  AES SHA2 NEON
verification passes as expected

Valgrind Output without Massif

Command:

valgrind ./test_sig_mem Falcon-512 2

Output:

==352773== Memcheck, a memory error detector
==352773== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==352773== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==352773== Command: ./test_sig_mem Falcon-512 2
==352773==
Configuration info
==================
Target platform:  aarch64-Linux-6.6.20+rpt-rpi-2712 - ARM PMU options enabled
Compiler:         gcc (12.2.0)
Compile options:  [-march=armv8-a+crypto;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.10.1-dev
Git commit:       6b4e692b8083f391d181087f500b3389ffb007d8 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.2.1 30 Jan 2024)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_SPEED_USE_ARM_PMU OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release
CPU exts active:  AES SHA2 NEON
verification passes as expected
==352773==
==352773== HEAP SUMMARY:
==352773==     in use at exit: 0 bytes in 0 blocks
==352773==   total heap usage: 15 allocs, 15 frees, 22,622 bytes allocated
==352773==
==352773== All heap blocks were freed -- no leaks are possible
==352773==
==352773== For lists of detected and suppressed errors, rerun with: -s
==352773== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Valgrind with Massif Tool Error Full

Command:

valgrind --tool=massif --stacks=yes ./test_sig_mem Falcon-512 2

Output:

==352334== Massif, a heap profiler
==352334== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==352334== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==352334== Command: ./test_sig_mem Falcon-512 2
==352334==
Configuration info
==================
Target platform:  aarch64-Linux-6.6.20+rpt-rpi-2712 - ARM PMU options enabled
Compiler:         gcc (12.2.0)
Compile options:  [-march=armv8-a+crypto;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.10.1-dev
Git commit:       6b4e692b8083f391d181087f500b3389ffb007d8 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.2.1 30 Jan 2024)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_SPEED_USE_ARM_PMU OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release
CPU exts active:  AES SHA2 NEON

vex: priv/host_arm64_defs.c:2829 (genSpill_ARM64): Assertion `offsetB < 4096' failed.
vex storage: T total 140151576 bytes allocated
vex storage: P total 0 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==352334==    at 0x58009114: show_sched_status_wrk (m_libcassert.c:406)
==352334==    by 0x58009263: report_and_quit (m_libcassert.c:477)
==352334==    by 0x5800949B: panic (m_libcassert.c:553)
==352334==    by 0x5800949B: vgPlain_core_panic_at (m_libcassert.c:558)
==352334==    by 0x580094BF: vgPlain_core_panic (m_libcassert.c:563)
==352334==    by 0x5808277B: failure_exit (m_translate.c:761)
==352334==    by 0x580EDD27: vex_assert_fail (main_util.c:245)
==352334==    by 0x5814C973: genSpill_ARM64 (host_arm64_defs.c:2829)
==352334==    by 0x581435DB: spill_vreg (host_generic_reg_alloc3.c:338)
==352334==    by 0x58144B3F: doRegisterAllocation_v3 (host_generic_reg_alloc3.c:1280)
==352334==    by 0x580EC843: libvex_BackEnd (main_main.c:1133)
==352334==    by 0x580EC843: LibVEX_Translate (main_main.c:1236)
==352334==    by 0x58084F6F: vgPlain_translate (m_translate.c:1831)
==352334==    by 0x5805664B: handle_chain_me (scheduler.c:1169)
==352334==    by 0x58059227: vgPlain_scheduler (scheduler.c:1514)
==352334==    by 0x580A861F: thread_wrapper (syswrap-linux.c:101)
==352334==    by 0x580A861F: run_a_thread_NORETURN (syswrap-linux.c:154)
==352334==    by 0xFFFFFFFFFFFFFFFF: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 352334)
==352334==    at 0x18ADC8: PQCLEAN_FALCON512_AARCH64_is_short (in /home/cuserp3/work/pqc-evaluation-tools/lib/liboqs/build/tests/test_sig_mem)
==352334==    by 0x131F1F: do_verify (in /home/cuserp3/work/pqc-evaluation-tools/lib/liboqs/build/tests/test_sig_mem)
==352334==    by 0x124217: OQS_SIG_verify (in /home/cuserp3/work/pqc-evaluation-tools/lib/liboqs/build/tests/test_sig_mem)
==352334==    by 0x1231AB: main (in /home/cuserp3/work/pqc-evaluation-tools/lib/liboqs/build/tests/test_sig_mem)
client stack range: [0x1FFF000000 0x1FFF003FFF] client SP: 0x1FFF002040
valgrind stack range: [0x1002C18000 0x1002D17FFF] top usage: 19776 of 1048576

Verbose Outputs

An output of the issue with the verbose flag alongside outputs for each affected Falcon variation can be found in the following text files:

Build Configurations

The error is persistent across various setups using differing build flags:

Default Setup

Build Commands:

 cmake -GNinja ..
 ninja && sudo ninja install

Configuration Details:

Target platform:  aarch64-Linux-6.6.20+rpt-rpi-2712 - ARM PMU options enabled
Compiler:         gcc (12.2.0)
Compile options:  [-march=armv8-a+crypto;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.10.1-dev
Git commit:       7b6d9f3326295fc80ea0c9026f3dd9d57f8436de
OpenSSL enabled:  Yes (OpenSSL 3.0.11 19 Sep 2023)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_SPEED_USE_ARM_PMU OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release
CPU exts active:  AES SHA2 NEON

Custom Setup

Build Commands (Contained within the script, but all paths variables have been verified):

cmake -GNinja  -S "$liboqs_source/" -B "$liboqs_path/build" \
-DCMAKE_INSTALL_PREFIX="$liboqs_path" \
-DOQS_SPEED_USE_ARM_PMU=ON \
-DOQS_USE_OPENSSL=ON \
-DOPENSSL_ROOT_DIR="$open_ssl_path"

ninja && sudo ninja install

Configuration Details:

Target platform:  aarch64-Linux-6.6.20+rpt-rpi-2712 - ARM PMU options enabled
Compiler:         gcc (12.2.0)
Compile options:  [-march=armv8-a+crypto;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.10.1-dev
Git commit:       6b4e692b8083f391d181087f500b3389ffb007d8 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.2.1 30 Jan 2024)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_SPEED_USE_ARM_PMU OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release
CPU exts active:  AES SHA2 NEON

Expected behaviour

The setup and configurations detailed in this bug report were also tested on a Debian 12 x86 machine, and the issue was not present. The environment that this setup, when executed on a Debian 12 x86 system, gives the following outputs:

Valgrind with Massif Output

==55823== Massif, a heap profiler
==55823== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==55823== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==55823== Command: ./test_sig_mem Falcon-512 2
==55823==
Configuration info
==================
Target platform:  x86_64-Linux-6.1.0-20-amd64
Compiler:         gcc (12.2.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.10.1-dev
Git commit:       6b4e692b8083f391d181087f500b3389ffb007d8
OpenSSL enabled:  Yes (OpenSSL 3.2.1 30 Jan 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release
CPU exts active:  AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
verification

Massif File after Passing to ms_print:
x86-ms_print.txt

Build details for the x86 machine are detailed below

Environment Details

The issue was tested on both Raspberry Pi 4 and 5 models with the following details:

Pi-4:

  • OS - Raspbian OS Lite 64bit (Debian 11 Bullseye)
  • Kernel Version - 6.1.21-v8+
  • Architecture - aarch64
  • OpenSSL Version - OpenSSL 1.1.1w (Default Configuration) and OpenSSL 3.2.1 (Custom Configuration)
  • Compiler - gcc (10.2.1)
  • Build Variables - See Above
  • Liboqs Version - 0.10.0

Pi-5:

  • OS - Raspbian OS Lite 64bit (Debian 12 Bookworm)
  • Kernel Version - 6.6.20+rpt-rpi-2712
  • Architecture - aarch64
  • OpenSSL Version - OpenSSL 3.0.11 (Default Configuration) and OpenSSL 3.2.1 (Custom Configuration)
  • Compiler - gcc (12.2.0)
  • Build Variables - See Above
  • Liboqs Version - 0.10.0

Debian x86 Machine used to Verify Issue:

  • OS - Debian GNU/Linux 12 (bookworm)
  • Kernel Version - 6.1.0-20-amd64
  • Architecture - x86_64
  • OpenSSL Version - OpenSSL 3.2.1 (Custom Configuration)
  • Compiler - gcc (12.2.0)
  • Liboqs Version - 0.10.0

Debian x86 Build Commands

cmake -GNinja  -S "$liboqs_source/" -B "$liboqs_path/build" \
-DCMAKE_INSTALL_PREFIX="$liboqs_path" \
-DOQS_USE_OPENSSL=ON \
-DOPENSSL_ROOT_DIR="$open_ssl_path"

Additional context

I would be happy to provide any additional information or outputs for this issue and, if necessary, the current developing branch for the repository where this environment is set up and run.

@cothan cothan self-assigned this Apr 22, 2024
@cothan
Copy link
Member

cothan commented Apr 22, 2024

Hi @crt26 ,

Thanks for the detailed report. As far as I see, the stack trace is at the tool itself.
I double-check the function is_short:

https://github.com/PQClean/PQClean/blob/11441c50730b46a8ba45013a8a055319070ae83e/crypto_sign/falcon-512/aarch64/common.c#L258

It simply loads data from memory and performs computation with the bound is FALCON_N, the increment is 128. Falcon_N = {512, 1024} thus this bound would be properly aligned.
No memory is needed, neither stack nor heap.

I don't know enough about Massif tool to comment.
I will try to reproduce sometime soon.

@SWilson4
Copy link
Member

SWilson4 commented Apr 24, 2024

@cothan thanks for volunteering to take a look at this. Are you OK with my assigning the issue to you?

EDIT: never mind, I missed the self-assignment... thanks again!

@cothan
Copy link
Member

cothan commented Jun 22, 2024

Hi @crt26 ,

Thanks for the detail information.
I can reproduce the bug in my Rpi 5.

The Falcon ARM code was developed by me, I review my code and I don't know why it causes bugs with tool=massif.
Errors exist in both Verify Operation when FALCON_N = 512, 1024.

Command with default build instruction in README.md:

$ valgrind --tool=massif --stacks=yes ./test_sig_mem Falcon-512 2
$ valgrind --tool=massif --stacks=yes ./test_sig_mem Falcon-1024 2

Output:

================================================================================
Executing verify for SIGALG Falcon-512
================================================================================

vex: priv/host_arm64_defs.c:2829 (genSpill_ARM64): Assertion `offsetB < 4096' failed.
vex storage: T total 140602256 bytes allocated
vex storage: P total 0 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==57126==    at 0x58009114: show_sched_status_wrk (m_libcassert.c:406)
==57126==    by 0x58009263: report_and_quit (m_libcassert.c:477)
==57126==    by 0x5800949B: panic (m_libcassert.c:553)
==57126==    by 0x5800949B: vgPlain_core_panic_at (m_libcassert.c:558)
==57126==    by 0x580094BF: vgPlain_core_panic (m_libcassert.c:563)
==57126==    by 0x5808277B: failure_exit (m_translate.c:761)
==57126==    by 0x580EDD27: vex_assert_fail (main_util.c:245)
==57126==    by 0x5814C973: genSpill_ARM64 (host_arm64_defs.c:2829)
==57126==    by 0x581435DB: spill_vreg (host_generic_reg_alloc3.c:338)
==57126==    by 0x58144B3F: doRegisterAllocation_v3 (host_generic_reg_alloc3.c:1280)
==57126==    by 0x580EC843: libvex_BackEnd (main_main.c:1133)
==57126==    by 0x580EC843: LibVEX_Translate (main_main.c:1236)
==57126==    by 0x58084F6F: vgPlain_translate (m_translate.c:1831)
==57126==    by 0x5805664B: handle_chain_me (scheduler.c:1169)
==57126==    by 0x58059227: vgPlain_scheduler (scheduler.c:1514)
==57126==    by 0x580A861F: thread_wrapper (syswrap-linux.c:101)
==57126==    by 0x580A861F: run_a_thread_NORETURN (syswrap-linux.c:154)
==57126==    by 0xFFFFFFFFFFFFFFFF: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 57126)
==57126==    at 0x18AE78: PQCLEAN_FALCON512_AARCH64_is_short (in /home/cothan/Work/liboqs/build/tests/test_sig_mem)
==57126==    by 0x131F9F: do_verify (in /home/cothan/Work/liboqs/build/tests/test_sig_mem)
==57126==    by 0x124297: OQS_SIG_verify (in /home/cothan/Work/liboqs/build/tests/test_sig_mem)
==57126==    by 0x1231A3: main (in /home/cothan/Work/liboqs/build/tests/test_sig_mem)
client stack range: [0x1FFF000000 0x1FFF003FFF] client SP: 0x1FFF001F40
valgrind stack range: [0x1002C18000 0x1002D17FFF] top usage: 19776 of 1048576
================================================================================
Executing verify for SIGALG Falcon-1024
================================================================================

vex: priv/host_arm64_defs.c:2829 (genSpill_ARM64): Assertion `offsetB < 4096' failed.
vex storage: T total 139649944 bytes allocated
vex storage: P total 0 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==57362==    at 0x58009114: show_sched_status_wrk (m_libcassert.c:406)
==57362==    by 0x58009263: report_and_quit (m_libcassert.c:477)
==57362==    by 0x5800949B: panic (m_libcassert.c:553)
==57362==    by 0x5800949B: vgPlain_core_panic_at (m_libcassert.c:558)
==57362==    by 0x580094BF: vgPlain_core_panic (m_libcassert.c:563)
==57362==    by 0x5808277B: failure_exit (m_translate.c:761)
==57362==    by 0x580EDD27: vex_assert_fail (main_util.c:245)
==57362==    by 0x5814C973: genSpill_ARM64 (host_arm64_defs.c:2829)
==57362==    by 0x581435DB: spill_vreg (host_generic_reg_alloc3.c:338)
==57362==    by 0x58144B3F: doRegisterAllocation_v3 (host_generic_reg_alloc3.c:1280)
==57362==    by 0x580EC843: libvex_BackEnd (main_main.c:1133)
==57362==    by 0x580EC843: LibVEX_Translate (main_main.c:1236)
==57362==    by 0x58084F6F: vgPlain_translate (m_translate.c:1831)
==57362==    by 0x5805928F: handle_tt_miss (scheduler.c:1141)
==57362==    by 0x5805928F: vgPlain_scheduler (scheduler.c:1503)
==57362==    by 0x580A861F: thread_wrapper (syswrap-linux.c:101)
==57362==    by 0x580A861F: run_a_thread_NORETURN (syswrap-linux.c:154)
==57362==    by 0xFFFFFFFFFFFFFFFF: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 57362)
==57362==    at 0x13F808: PQCLEAN_FALCON1024_AARCH64_verify_raw (in /home/cothan/Work/liboqs/build/tests/test_sig_mem)
==57362==    by 0x13CA07: do_verify (in /home/cothan/Work/liboqs/build/tests/test_sig_mem)
==57362==    by 0x124297: OQS_SIG_verify (in /home/cothan/Work/liboqs/build/tests/test_sig_mem)
==57362==    by 0x1231A3: main (in /home/cothan/Work/liboqs/build/tests/test_sig_mem)
client stack range: [0x1FFF000000 0x1FFF003FFF] client SP: 0x1FFF000F60
valgrind stack range: [0x1002C18000 0x1002D17FFF] top usage: 19776 of 1048576

When I build with cmake -DCMAKE_BUILD_TYPE=Debug -GNinja .., no problem shows. Obviously, this is not optimized code.

cothan@pi5:~/Work/liboqs/build/tests $ valgrind --tool=massif --stacks=yes ./test_sig_mem Falcon-512 2
==61948== Massif, a heap profiler
==61948== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==61948== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==61948== Command: ./test_sig_mem Falcon-512 2
==61948==
Configuration info
==================
Target platform:  aarch64-Linux-6.6.31+rpt-rpi-2712
Compiler:         gcc (12.2.0)
Compile options:  [-march=armv8-a+crypto;-Wa,--noexecstack;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version:      0.10.2-dev
Git commit:       e3f05cbfba4552067e2c0de524c1049a864c5f2d
OpenSSL enabled:  Yes (OpenSSL 3.0.11 19 Sep 2023)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Debug
CPU exts active:  AES SHA2 NEON
================================================================================
Executing verify for SIGALG Falcon-512
================================================================================
verification passes as expected

I tried another build option, to show exact location where it's crash:
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja ..

cothan@pi5:~/Work/liboqs/build/tests $ valgrind --tool=massif --stacks=yes ./test_sig_mem Falcon-512 2
==76067== Massif, a heap profiler
==76067== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==76067== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==76067== Command: ./test_sig_mem Falcon-512 2
==76067==
Configuration info
==================
Target platform:  aarch64-Linux-6.6.31+rpt-rpi-2712
Compiler:         gcc (12.2.0)
Compile options:  [-march=armv8-a+crypto;-Wa,--noexecstack;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version:      0.10.2-dev
Git commit:       e3f05cbfba4552067e2c0de524c1049a864c5f2d
OpenSSL enabled:  Yes (OpenSSL 3.0.11 19 Sep 2023)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=RelWithDebInfo
CPU exts active:  AES SHA2 NEON
================================================================================
Executing verify for SIGALG Falcon-512
================================================================================

vex: priv/host_arm64_defs.c:2829 (genSpill_ARM64): Assertion `offsetB < 4096' failed.
vex storage: T total 139932256 bytes allocated
vex storage: P total 0 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==76067==    at 0x58009114: show_sched_status_wrk (m_libcassert.c:406)
==76067==    by 0x58009263: report_and_quit (m_libcassert.c:477)
==76067==    by 0x5800949B: panic (m_libcassert.c:553)
==76067==    by 0x5800949B: vgPlain_core_panic_at (m_libcassert.c:558)
==76067==    by 0x580094BF: vgPlain_core_panic (m_libcassert.c:563)
==76067==    by 0x5808277B: failure_exit (m_translate.c:761)
==76067==    by 0x580EDD27: vex_assert_fail (main_util.c:245)
==76067==    by 0x5814C973: genSpill_ARM64 (host_arm64_defs.c:2829)
==76067==    by 0x581435DB: spill_vreg (host_generic_reg_alloc3.c:338)
==76067==    by 0x58144B3F: doRegisterAllocation_v3 (host_generic_reg_alloc3.c:1280)
==76067==    by 0x580EC843: libvex_BackEnd (main_main.c:1133)
==76067==    by 0x580EC843: LibVEX_Translate (main_main.c:1236)
==76067==    by 0x58084F6F: vgPlain_translate (m_translate.c:1831)
==76067==    by 0x5805928F: handle_tt_miss (scheduler.c:1141)
==76067==    by 0x5805928F: vgPlain_scheduler (scheduler.c:1503)
==76067==    by 0x580A861F: thread_wrapper (syswrap-linux.c:101)
==76067==    by 0x580A861F: run_a_thread_NORETURN (syswrap-linux.c:154)
==76067==    by 0xFFFFFFFFFFFFFFFF: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 76067)
==76067==    at 0x12E9CC: PQCLEAN_FALCON512_AARCH64_verify_raw (vrfy.c:54)
==76067==    by 0x12CD83: do_verify (pqclean.c:279)
==76067==    by 0x12419F: OQS_SIG_verify (sig.c:445)
==76067==    by 0x12319F: sig_test_correctness (test_sig_mem.c:129)
==76067==    by 0x12319F: main (test_sig_mem.c:196)
client stack range: [0x1FFF000000 0x1FFF003FFF] client SP: 0x1FFF001F50
valgrind stack range: [0x1002F88000 0x1003087FFF] top usage: 19776 of 1048576

The output assembly from Rizin, where it's crashed:

[0x08000740]> pdf
┌ int sym.PQCLEAN_FALCON512_AARCH64_is_short(int16_t *s1, int16_t *s2);
│           ; arg int16_t *s1 @ x0
│           ; arg int16_t *s2 @ x1
│           0x08000740      movi  v1.4, 0
│           0x08000744      add   x5, x0, 0x400                        ; s1
│           0x08000748      mov   v0.16b, v1.16b
│       ┌─> 0x0800074c      mov   x2, x0                               ; s1
│       ╎   0x08000750      add   x4, x0, 0x80                         ; s1
│       ╎   0x08000754      add   x3, x0, 0xc0                         ; s1
│       ╎   0x08000758      add   x0, x0, 0x100                        ; s1
│       ╎   0x0800075c      ld1   { v16.8h, v17.8h, v18.8h, v19.8h }, [x2], 64
│       ╎   0x08000760      ld1   { v4.8h, v5.8h, v6.8h, v7.8h }, [x4]
│       ╎   0x08000764      sqdmlal v0.4, v16.4h, v16.4h
│       ╎   0x08000768      sqdmlal2 v1.4, v16.8h, v16.8h
│       ╎   0x0800076c      ld1   { v20.8h, v21.8h, v22.8h, v23.8h }, [x2]
│       ╎   0x08000770      sqdmlal v0.4, v17.4h, v17.4h
│       ╎   0x08000774      sqdmlal2 v1.4, v17.8h, v17.8h
│       ╎   0x08000778      sqdmlal v0.4, v18.4h, v18.4h
│       ╎   0x0800077c      sqdmlal2 v1.4, v18.8h, v18.8h
│       ╎   0x08000780      sqdmlal v0.4, v19.4h, v19.4h
│       ╎   0x08000784      sqdmlal2 v1.4, v19.8h, v19.8h
│       ╎   0x08000788      ld1   { v16.8h, v17.8h, v18.8h, v19.8h }, [x3]
│       ╎   0x0800078c      sqdmlal v0.4, v20.4h, v20.4h
│       ╎   0x08000790      sqdmlal2 v1.4, v20.8h, v20.8h
│       ╎   0x08000794      sqdmlal v0.4, v21.4h, v21.4h
│       ╎   0x08000798      sqdmlal2 v1.4, v21.8h, v21.8h
│       ╎   0x0800079c      sqdmlal v0.4, v22.4h, v22.4h
│       ╎   0x080007a0      sqdmlal2 v1.4, v22.8h, v22.8h
│       ╎   0x080007a4      sqdmlal v0.4, v23.4h, v23.4h
│       ╎   0x080007a8      sqdmlal2 v1.4, v23.8h, v23.8h
│       ╎   0x080007ac      sqdmlal v0.4, v4.4h, v4.4h
│       ╎   0x080007b0      sqdmlal2 v1.4, v4.8h, v4.8h
│       ╎   0x080007b4      sqdmlal v0.4, v5.4h, v5.4h
│       ╎   0x080007b8      sqdmlal2 v1.4, v5.8h, v5.8h
│       ╎   0x080007bc      sqdmlal v0.4, v6.4h, v6.4h
│       ╎   0x080007c0      sqdmlal2 v1.4, v6.8h, v6.8h
│       ╎   0x080007c4      sqdmlal v0.4, v7.4h, v7.4h
│       ╎   0x080007c8      sqdmlal2 v1.4, v7.8h, v7.8h
│       ╎   0x080007cc      sqdmlal v0.4, v16.4h, v16.4h
│       ╎   0x080007d0      sqdmlal2 v1.4, v16.8h, v16.8h
│       ╎   0x080007d4      sqdmlal v0.4, v17.4h, v17.4h
│       ╎   0x080007d8      sqdmlal2 v1.4, v17.8h, v17.8h
│       ╎   0x080007dc      sqdmlal v0.4, v18.4h, v18.4h
│       ╎   0x080007e0      sqdmlal2 v1.4, v18.8h, v18.8h
│       ╎   0x080007e4      sqdmlal v0.4, v19.4h, v19.4h
│       ╎   0x080007e8      sqdmlal2 v1.4, v19.8h, v19.8h
│       ╎   0x080007ec      cmp   x0, x5                               ; s1
│       └─< 0x080007f0      b.ne  0x800074c
│           0x080007f4      add   x4, x1, 0x400                        ; s2
│       ┌─> 0x080007f8      mov   x0, x1                               ; s2
│       ╎   0x080007fc      add   x3, x1, 0x80                         ; s2
│       ╎   0x08000800      add   x2, x1, 0xc0                         ; s2
│       ╎   0x08000804      add   x1, x1, 0x100                        ; s2
│       ╎   0x08000808      ld1   { v16.8h, v17.8h, v18.8h, v19.8h }, [x0], 64
│       ╎   0x0800080c      ld1   { v4.8h, v5.8h, v6.8h, v7.8h }, [x3]
│       ╎   0x08000810      sqdmlal v0.4, v16.4h, v16.4h
│       ╎   0x08000814      sqdmlal2 v1.4, v16.8h, v16.8h
│       ╎   0x08000818      ld1   { v20.8h, v21.8h, v22.8h, v23.8h }, [x0]
│       ╎   0x0800081c      sqdmlal v0.4, v17.4h, v17.4h
│       ╎   0x08000820      sqdmlal2 v1.4, v17.8h, v17.8h
│       ╎   0x08000824      sqdmlal v0.4, v18.4h, v18.4h
│       ╎   0x08000828      sqdmlal2 v1.4, v18.8h, v18.8h
│       ╎   0x0800082c      sqdmlal v0.4, v19.4h, v19.4h
│       ╎   0x08000830      sqdmlal2 v1.4, v19.8h, v19.8h
│       ╎   0x08000834      ld1   { v16.8h, v17.8h, v18.8h, v19.8h }, [x2]
│       ╎   0x08000838      sqdmlal v0.4, v20.4h, v20.4h
│       ╎   0x0800083c      sqdmlal2 v1.4, v20.8h, v20.8h
│       ╎   0x08000840      sqdmlal v0.4, v21.4h, v21.4h
│       ╎   0x08000844      sqdmlal2 v1.4, v21.8h, v21.8h
│       ╎   0x08000848      sqdmlal v0.4, v22.4h, v22.4h
│       ╎   0x0800084c      sqdmlal2 v1.4, v22.8h, v22.8h
│       ╎   0x08000850      sqdmlal v0.4, v23.4h, v23.4h
│       ╎   0x08000854      sqdmlal2 v1.4, v23.8h, v23.8h
│       ╎   0x08000858      sqdmlal v0.4, v4.4h, v4.4h
│       ╎   0x0800085c      sqdmlal2 v1.4, v4.8h, v4.8h
│       ╎   0x08000860      sqdmlal v0.4, v5.4h, v5.4h
│       ╎   0x08000864      sqdmlal2 v1.4, v5.8h, v5.8h
│       ╎   0x08000868      sqdmlal v0.4, v6.4h, v6.4h
│       ╎   0x0800086c      sqdmlal2 v1.4, v6.8h, v6.8h
│       ╎   0x08000870      sqdmlal v0.4, v7.4h, v7.4h
│       ╎   0x08000874      sqdmlal2 v1.4, v7.8h, v7.8h
│       ╎   0x08000878      sqdmlal v0.4, v16.4h, v16.4h
│       ╎   0x0800087c      sqdmlal2 v1.4, v16.8h, v16.8h
│       ╎   0x08000880      sqdmlal v0.4, v17.4h, v17.4h
│       ╎   0x08000884      sqdmlal2 v1.4, v17.8h, v17.8h
│       ╎   0x08000888      sqdmlal v0.4, v18.4h, v18.4h
│       ╎   0x0800088c      sqdmlal2 v1.4, v18.8h, v18.8h
│       ╎   0x08000890      sqdmlal v0.4, v19.4h, v19.4h
│       ╎   0x08000894      sqdmlal2 v1.4, v19.8h, v19.8h
│       ╎   0x08000898      cmp   x4, x1                               ; s2
│       └─< 0x0800089c      b.ne  0x80007f8
│           0x080008a0      shadd v0.4, v0.4s, v1.4s
│           0x080008a4      mov   w1, 0x5426                           ; '&T'
│           0x080008a8      movk  w1, 0x207, lsl 16
│           0x080008ac      mov   d1, v0.d[1]
│           0x080008b0      sqadd v0.2, v1.2s, v0.2s
│           0x080008b4      mov   s1, v0.s[1]
│           0x080008b8      sqadd s0, s1, s0
│           0x080008bc      fmov  w0, s0
│           0x080008c0      cmp   w0, w1
│           0x080008c4      cset  w0, ls
└           0x080008c8      ret

As shown in the ASM output, I confirm it does not use additional memory.

I honestly don't know why and don't know enough about Valgrind and Massif to make the current code works with Massif.

Since Valgrind and massif to get the stack/heap usage, I suggest you replace these function with non optimized C function. I hope it helps.

@cothan cothan added the wontfix This will not be worked on label Jun 22, 2024
@cothan
Copy link
Member

cothan commented Jun 22, 2024

I mark this issue as won't fix due to the bug is from external tool.

@baentsch
Copy link
Member

I mark this issue as won't fix due to the bug is from external tool.

@cothan Thanks for looking into this -- I'm just not sure I understand: Are you saying this is a bug in valgrind and not in the ARM-optimized Falcon code? What makes you think so? Can you reproduce the bug with non-Falcon code? If so, would it be worth while reporting this to the maintainers of valgrind?

@cothan
Copy link
Member

cothan commented Jun 23, 2024

Hi @baentsch ,

I won't dare to say the bug is in Valgrind itself, the bug is in massif tool. TLDR: massif is a memory profiler tool, the outcome of massif is to view how stack/heap memory usage in the program.

Here is the run with pure Valgrind:

cothan@pi5 ~/W/l/b/tests (main) [1]> valgrind  ./test_sig_mem Falcon-1024 2
==81716== Memcheck, a memory error detector
==81716== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==81716== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==81716== Command: ./test_sig_mem Falcon-1024 2
==81716==
Configuration info
==================
Target platform:  aarch64-Linux-6.6.31+rpt-rpi-2712
Compiler:         gcc (12.2.0)
Compile options:  [-march=armv8-a+crypto;-Wa,--noexecstack;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version:      0.10.2-dev
Git commit:       e3f05cbfba4552067e2c0de524c1049a864c5f2d
OpenSSL enabled:  Yes (OpenSSL 3.0.11 19 Sep 2023)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=RelWithDebInfo
CPU exts active:  AES SHA2 NEON
================================================================================
Executing verify for SIGALG Falcon-1024
================================================================================
verification passes as expected
==81716==
==81716== HEAP SUMMARY:
==81716==     in use at exit: 0 bytes in 0 blocks
==81716==   total heap usage: 15 allocs, 15 frees, 25,252 bytes allocated
==81716==
==81716== All heap blocks were freed -- no leaks are possible
==81716==
==81716== For lists of detected and suppressed errors, rerun with: -s
==81716== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

And here is the run with Valgrind and massif.

cothan@pi5 ~/W/l/b/tests (main)> valgrind --tool=massif --stacks=yes ./test_sig_mem Falcon-1024 2

==81727== Massif, a heap profiler
==81727== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==81727== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==81727== Command: ./test_sig_mem Falcon-1024 2
==81727==
Configuration info
==================
Target platform:  aarch64-Linux-6.6.31+rpt-rpi-2712
Compiler:         gcc (12.2.0)
Compile options:  [-march=armv8-a+crypto;-Wa,--noexecstack;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version:      0.10.2-dev
Git commit:       e3f05cbfba4552067e2c0de524c1049a864c5f2d
OpenSSL enabled:  Yes (OpenSSL 3.0.11 19 Sep 2023)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=RelWithDebInfo
CPU exts active:  AES SHA2 NEON
================================================================================
Executing verify for SIGALG Falcon-1024
================================================================================

vex: priv/host_arm64_defs.c:2829 (genSpill_ARM64): Assertion `offsetB < 4096' failed.
vex storage: T total 139470880 bytes allocated
vex storage: P total 0 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==81727==    at 0x58009114: show_sched_status_wrk (m_libcassert.c:406)
==81727==    by 0x58009263: report_and_quit (m_libcassert.c:477)
==81727==    by 0x5800949B: panic (m_libcassert.c:553)
==81727==    by 0x5800949B: vgPlain_core_panic_at (m_libcassert.c:558)
==81727==    by 0x580094BF: vgPlain_core_panic (m_libcassert.c:563)
==81727==    by 0x5808277B: failure_exit (m_translate.c:761)
==81727==    by 0x580EDD27: vex_assert_fail (main_util.c:245)
==81727==    by 0x5814C973: genSpill_ARM64 (host_arm64_defs.c:2829)
==81727==    by 0x581435DB: spill_vreg (host_generic_reg_alloc3.c:338)
==81727==    by 0x58144B3F: doRegisterAllocation_v3 (host_generic_reg_alloc3.c:1280)
==81727==    by 0x580EC843: libvex_BackEnd (main_main.c:1133)
==81727==    by 0x580EC843: LibVEX_Translate (main_main.c:1236)
==81727==    by 0x58084F6F: vgPlain_translate (m_translate.c:1831)
==81727==    by 0x5805928F: handle_tt_miss (scheduler.c:1141)
==81727==    by 0x5805928F: vgPlain_scheduler (scheduler.c:1503)
==81727==    by 0x580A861F: thread_wrapper (syswrap-linux.c:101)
==81727==    by 0x580A861F: run_a_thread_NORETURN (syswrap-linux.c:154)
==81727==    by 0xFFFFFFFFFFFFFFFF: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 81727)
==81727==    at 0x13402C: PQCLEAN_FALCON1024_AARCH64_verify_raw (vrfy.c:54)
==81727==    by 0x1323EB: do_verify (pqclean.c:279)
==81727==    by 0x12419F: OQS_SIG_verify (sig.c:445)
==81727==    by 0x12319F: sig_test_correctness (test_sig_mem.c:129)
==81727==    by 0x12319F: main (test_sig_mem.c:196)
client stack range: [0x1FFF000000 0x1FFF003FFF] client SP: 0x1FFF000F70
valgrind stack range: [0x1002F88000 0x1003087FFF] top usage: 19776 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

I think the bug is definitely in the massif tool. The bug is not in ARM-optimized code.

I don't know if this is worthwhile to report to maintainers of valgrind. I will let @crt26 decide.

@baentsch
Copy link
Member

Well, I consider massif to be an integral part of valgrind, hence my comment. Sorry for the ambiguity.

The bug is not in ARM-optimized code.

How did you deduce this? The line valgrind died in seems to be

return PQCLEAN_FALCON1024_AARCH64_is_short(tt, s2);

At first blush, this looks like a massive piece of ARM specific code (depending on where within this return function the error occurred), no? Thus, it could be a valgrind issue but also one of the code, no? Why for example does the report list an allocation of nearly 140MB on an embedded platform? Or an apparent offset overflow (>4096)? Why not ask the valgrind folks for their opinion?

You're the expert on this code of course and Raspberry Pi is not a platform supported by liboqs, so this is my last comment on the issue.

@cothan
Copy link
Member

cothan commented Jun 25, 2024

Oh yeah, I didn't notice "139,470,880" bytes numbers (~140 Mb). My Falcon ARM never uses such large memory.
I have no idea where does the number come from. My guess is that massif emulates NEON instructions and at some point its buffer exploded.

@crt26
Copy link
Author

crt26 commented Aug 31, 2024

Hi @cothan, apologies for the length of time getting back. I saw this issue had been moved to Liboqs planning and labelled as wont fix. If the consensus is that the fault lies with the massif tool, then I would be happy to try and report the issue to the Valgrind team like you had mentioned in your previous message and see what they suggest. Thank you for taking a look at this issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
Status: Todo
Development

No branches or pull requests

4 participants