This note explains the format of the "signature" which is inserted in pointers using the PACxx instructions in the Arm64 instruction set. This signature is indifferently named "pointer authentication code" or PAC.
Contents:
The PACxx instructions take a 64-bit address as input and insert a signature of the address value in the upper bits of this address value. The result is a "signed pointer" or "authenticated pointer".
In practice, the virtual space of a process never uses 2^64 bytes. This means that the upper bits of an address are not used and can be used for various purposes. This is where the PAC is stored.
The number of bits in the PAC is variable, typically from 7 to 16 bits.
The actual number of bits in the PAC depends on several factors:
- The size of the virtual address space for the processes, typically 47 or 48 bits. For the record, 47 and 48 bits respectively address 128 and 256 TB of virtual memory, which is probably sufficient for most applications.
- The usage of other security mechanisms which also use the upper bits. The optional Memory Tagging Extension (MTE) - which is a more recent Arm extension than PAC - uses the upper 8 bits of data addresses to "tag" their usage and check illegal mixing of data spaces. When MTE is used, the PAC cannot use the upper 8 bits and must be relocated to lower bits (typically bits 54:48 or 54:47).
- Other operating system constraints which are not fully clear.
The location and size of the PAC also depends on two additional independent factors:
- Instructions vs. data: The "instructions" addresses are signed by instructions PACIA and PACIB. The "data" addresses are signed by instructions PACDA and PACDB.
- Lower vs. upper addresses: Although no operating system use all 64 address bits, some systems use "lower" (addresses starting in 0x0000...) and "upper" (addresses starting in 0xFFFF...) spaces. The upper bits of the plain addresses are not identical. When they are overwritten by a PAC, the processor must keep the information of the original value of the upper bits, either 0x0000 or 0xFFFF.
This means that, on a given system, there are four different combinations of instructions / data and lower / upper addresses. Each combination can have a specific PAC size and location. Additionally, when several virtual machines run on the same processor, each operating system (host and virtual machines) may have its own specific combination.
Understanding the exact size and location of the PAC for a given address is not easy.
There is only one invariant: the bit 55 is never modified by a PACxx instruction. For the record, bit 0 is the least significant one and bit 63 is the most significant one. Bit 55 is consequently the last bit before the last byte. This bit is used to differentiate lower 0x000... addresses and upper 0xFFFF... addresses. This bit is 0 in lower addresses and 1 in upper addresses. It is always left unchanged by the PACxx instructions. Its value is used by the AUTxx instructions and other authentication instructions to determine the original address range, lower or upper.
Note that the processor uses bit 55 and not bit 63 to differentiate between lower and upper addresses. We can speculate the reason for this: when MTE is used, the upper byte (bits 63:56) is used for the tag. In that case, bit 55 is the last bit before the tag.
The pseudo-code in the Arm Architecture Reference Manual is complex and depends on several system registers and optional architecture extensions.
The command sysregs -p
from this project emulates the Arm pseudo-code for the
four combinations of instructions / data and lower / upper addresses and displays
the PAC location. Note that this command can predict the PAC location at EL0 and 1
only (user and kernel). The hypervisor (EL2) and monitor (EL3) have additional
mechanisms to trap the usage of PAC instructions and key registers, leading to
potential different behaviors.
On an Apple M1 chip (MacBook laptop), the command sysregs -p
displays this in the macOS host system:
Data (lower): PAC size: 8 bits, bit range: 54:47 (top: 55, sel: 55, bottom: 47)
Data (upper): PAC size: 16 bits, bit range: 63:56,54:47 (top: 63, sel: 55, bottom: 47)
Instr (lower): PAC size: 16 bits, bit range: 63:56,54:47 (top: 63, sel: 63, bottom: 47)
Instr (upper): PAC size: 16 bits, bit range: 63:56,54:47 (top: 63, sel: 63, bottom: 47)
On the same machine with an Apple M1 chip, the same command displays this in a virtual machine running Linux Ubuntu 23 and the Linux kernel 6.2:
Data (lower): PAC size: 7 bits, bit range: 54:48 (top: 55, sel: 55, bottom: 48)
Data (upper): PAC size: 7 bits, bit range: 54:48 (top: 55, sel: 55, bottom: 48)
Instr (lower): PAC size: 7 bits, bit range: 54:48 (top: 55, sel: 55, bottom: 48)
Instr (upper): PAC size: 15 bits, bit range: 63:56,54:48 (top: 63, sel: 55, bottom: 48)
In all cases, we see that the bit 55 is left unused.
The upper byte is unused in one case in the macOS host while it is unused on three cases in the Linux virtual machine. Not using the upper byte for the PAC is required when memory tagging (MTE) is used. However, MTE is not implemented in the Apple M1 chip. The reason why the upper byte is not used by the PAC is some cases is unknown.
Another peculiarity is the number of virtual address bits. On macOS, the system uses 47 bits. On Linux, it uses 48 bits, reducing the entropy of the PAC by one bit. The reason for this is unknown (47 bits address 128 TB of virtual address space and very few systems require more than this).
Even more peculiar, on the same machine with an Apple M1 chip, in a virtual machine running Linux Debian 11 and an older Linux kernel 5.10, we get this:
Data (lower): PAC size: 7 bits, bit range: 54:48 (top: 55, sel: 55, bottom: 48)
Data (upper): PAC size: 15 bits, bit range: 63:56,54:48 (top: 63, sel: 55, bottom: 48)
Instr (lower): PAC size: 7 bits, bit range: 54:48 (top: 55, sel: 55, bottom: 48)
Instr (upper): PAC size: 15 bits, bit range: 63:56,54:48 (top: 63, sel: 55, bottom: 48)
The reason for using 15 bits instead of 7 for the PAC on upper data addresses, for different versions of the kernel, is also unknown.
Finally, still on the same MacBook machine with an Apple M1 chip, the command displays this in a Windows 11 virtual machine:
Data (lower): PAC size: 16 bits, bit range: 63:56,54:47 (top: 63, sel: 63, bottom: 47)
Data (upper): PAC size: 16 bits, bit range: 63:56,54:47 (top: 63, sel: 63, bottom: 47)
Instr (lower): PAC size: 16 bits, bit range: 63:56,54:47 (top: 63, sel: 63, bottom: 47)
Instr (upper): PAC size: 16 bits, bit range: 63:56,54:47 (top: 63, sel: 63, bottom: 47)
The PAC is supposed to use its maximum size. However, the practice demonstrates that the PAC instructions are inoperative. They leave the pointers unmodified, without signature.
We have seen two categories of signatures: pointers to instructions (PACIA and PACIB instructions) and pointers to data (PACDA and PACDB instructions). There is third category: the "generic" PAC computation, using the PACGA instruction.
The PACIx and PACDx instructions modify a pointer value and overwrite its upper bits. These upper bits must be initially unused or, more precisely, they must have a known value: all 0 for lower addresses, all 1 for upper addresses. Consequently, these instructions cannot sign an arbitrary 64-bit value without losing information in the upper bits. They can only sign address values.
The PACGA instruction is used to compute a keyed hash over general purpose data. It takes an arbitrary 64-bit value from a source register and generates a 32-bit output value in a different destination register. For some reason, the 32-bit PAC is written in the upper 32 bits of the destination register. The lower 32 bits of the destination register are zeroed by PACGA.
To compute a keyed hash over a memory area, simply iterate PACGA over that area, 64 bits at a time, reinjecting the 32-bit PAC in the next iteration.
The previously mentioned rules about the PAC size and location do not apply to PACGA. Its output is always a 32-bit value.
On Windows, the PACGA instruction works as expected. Only the PACIx and PACDx are inoperative.
The pointer authentication code of a value is a salted keyed hash of that value.
Computing a PAC uses a symmetric encryption algorithm as base. This encryption algorithm uses a 128-bit symmetric key.
Additionally, all PAC instructions use a "salt" value, also called "modifier" in the Arm Architecture Reference Manual, or "discriminator" in the Apple LLVM project, or "tweak" in the QARMA-64 algorithm. This salt / modifier / discriminator / tweak is a clear contextual value.
The Arm architecture leaves the choice of algorithm to the implementers of the CPU core.
Arm-designed cores use the QARMA-64 algorithm. This is a public, low-latency, multi-round, symmetric encryption algorithm. The initial Arm implementation uses QARMA5, the QARMA-64 algorithm with 5 rounds. The Arm architecture also allows the use of QARMA3, the same algorithm with 3 rounds only. The latter is faster but less secure than the former.
The Arm architecture also allows an "implementation-defined" algorithm. The Apple M1 and M2 chips use Apple-designed cores which use their own private implementation-defined algorithm for the PAC computation.
The algorithm is also displayed by the command sysregs -p
, along with the
various extensions to the PAC features.
On an Apple M1 chip, macOS host or Linux VM, the command displays this. We see that an implementation-defined algorithm is used.
Summary: PAC: yes, PACGA: yes
PAuth: yes, PAuth2: no, EPAC: yes, FPAC: no, FPACCOMBINE: no, CONSTPACFIELD: no
Algorithms: QARMA3: no, QARMA5: no, implementation-defined: yes
On an AWS Graviton3 chip, using an Arm-designed Neoverse V1 core, the same command displays this. We see that Arm-designed cores use QARMA5.
Summary: PAC: yes, PACGA: yes
PAuth: yes, PAuth2: yes, EPAC: yes, FPAC: no, FPACCOMBINE: no, CONSTPACFIELD: no
Algorithms: QARMA3: no, QARMA5: yes, implementation-defined: no
In the Arm architecture, there are 5 different key registers for PAC computation:
- Key IA, used by PACIA and AUTIA instructions.
- Key IB, used by PACIB and AUTIB instructions.
- Key DA, used by PACDA and AUTDA instructions.
- Key DB, used by PACDB and AUTDB instructions.
- The generic key, used by PACGA instruction.
These key registers are not accessible at EL0 (user mode). Typically, the kernel allocates 5 random keys for each newly created process. Each time this process is scheduled on a core, the 5 PAC key registers are set with these process-specific values.
On Linux, the PAC key registers are directly written by the kernel.
On macOS, an EL3 monitor configuration (probably using SCR_EL3, the Secure Configuration Register) prevents the kernel from accessing the PAC key registers. Reading or writing them at EL1 crashes the system, probably because the corresponding trap at EL3 is not a valid key change request.
In each PACxx or AUTxx instruction, the modifier (let's call it this way from now on) is explicitly provided through one of the following mechanisms:
- Zero. This constant hard-coded value is used by instructions such as PACIAZ, AUTIAZ or LDRAA. This is not used in practice in current compilers.
- The value from a register. This register is specified in the instruction and its value (the modifier) must have been computed by the generated code before executing the PACxx or AUTxx instruction.
- The stack pointer. This context-depended value is used by instructions such as PACIASP, AUTIASP or RETAA.
When a pointer is pushed on the stack, it is vulnerable to a stack overflow attack. Consequently, it is signed with a PAC. Each time a function is called, for instance, the returned address is signed before being pushed on stack. To add some entropy to the PAC, the value of the stack pointer before pushing the address is used as PAC modifier. Thus, the signature depends on the stack context and cannot be reused elsewhere.
For instance, the typical prolog and epilog of a function are illustrated here:
paciasp ; add a PAC to return address in x30 before pushing it on stack
; (key = IA, mod = current stack pointer)
stp x29, x30, [sp, #-16]! ; push the stack frame (previous frame pointer x29, signed return address x30)
mov x29, sp ; update the frame pointer to current stack frame
....
ldp x29, x30, [sp], #16 ; restore the previous frame pointer in x29 and signed returned address in x30
retaa ; authenticate the return address in x30 before jumping to it
; (generate an invalid return address and/or generate an exception
; if the value was overwritten during a stack overflow)
Another common practice consists in using the address where a value is stored as modifier when computing a PAC on that value. In other words, when a 64-bit pointer value V is stored at address A and we want to compute a PAC for V, we use A as modifier. The Apple LLVM article on pointer authentication calls this technique address diversity.
Additionally, Apple recommends to use an additional discriminator which depends on the semantic of the signed address. Typically, a pointer to a C++ virtual function, inside a vtable, is signed using a modifier which is built from the address where it is stored in the vtable (address diversity) and an additional discriminator which is built from the fully qualified names of the class and method.
Note that stack diversity is not exactly the same thing as address diversity since the modifier is the value of the stack pointer before pushing the value.
The program demo-pac
from this project demonstrates the computation of pointer
authentication codes in various situations:
- Instructions vs. data, lower vs. upper addresses, to exhibit the PAC size and location.
- Executing the PACxx instructions in user mode (EL0) and kernel mode (EL1), to check if the same PAC key is used in both modes.
- If the PAC algorithm is QARMA5 or QARMA3 and the PAC key registers are readable from EL1, compare the signed address with a software encryption using that algorithm.
The results of three successive executions of this program on each platform are available in the collect directory.
The following subsections are some observations from these results. The next section contains a full comparison of results per platform.
- The PAC values are different for the same input values each time the program is executed. Conclusion: the kernel of all operating systems provides new PAC keys for new processes.
- The PAC values are different when using PACIA, PACIB, PACDA, PACDB. Conclusion: the PAC keys have different values in each process.
- We verified that AUTxx matches the corresponding PACxx. We also verified that corrupting a signed pointer results in an invalid address after AUTxx. With cores implementing the FPAC feature, this AUTxx instruction should raise an exception. This cannot be verified for now.
- The bit 55 of all addresses is preserved, in all cases.
- The meaningful virtual address part of a pointer uses 48 bits, leaving 16 bits for PAC, MTE and selector bit.
- Most PAC values use 7 bits, except for pointers on upper addresses where the PAC uses either 7 or 15 bits, depending on the kernel version.
- Executing a PACxx instruction in user mode (EL0) and kernel mode (EL1) on the
same input value gives the same result, except with PACIA. Conclusion: only the
IA key is different between user and kernel. Each time an application switches
from user mode to kernel mode or vice versa, the value of the PACIA key is changed.
All other PAC keys remain unchanged.
- This is consistent with another observation: if we change the value of the
PACIA key using the kernel module of this project, the system crashes. All other
PAC keys can be safely changed. In standard Ubuntu distributions for arm64, the
kernel is now compiled with option
-mbranch-protection=pac-ret
, meaning that the call return addresses on stack are signed with the PACIA key. Changing this key while in kernel mode crashes on return. - As a final consequence, we may say that the kernel uses only one PAC key and updates it each time we enter kernel mode, leaving all other PACK keys unchanged.
- This is consistent with another observation: if we change the value of the
PACIA key using the kernel module of this project, the system crashes. All other
PAC keys can be safely changed. In standard Ubuntu distributions for arm64, the
kernel is now compiled with option
- The PAC key registers cannot be read or written in kernel mode (EL1). Trying to do so using the kernel extension of this project crashes the system. Access to these registers is probably configured at EL3 to trap at EL3.
- The meaningful virtual address part of a pointer uses 47 bits, leaving 17 bits for PAC, MTE and selector bit (compared to 48 and 16 bits, respectively, on Linux).
- Most PAC values use 16 bits, except for data pointers on lower addresses where the PAC use 8 bits.
- Unexpected note on the selector bit: in the case of pointer to instructions
on macOS, the command
sysregs -p
displays that the selector bit is 63 instead of 55. However, we can see that the PAC computation on pointers to instructions sometimes change the bit 63 while it always preserves the bit 55. Maybe this is an interpretation error on the nature of the selector bit or in the implementation insysregs
. - Executing any PACxx instruction in user mode (EL0) and kernel mode (EL1) on the
same input value give distinct results. Conclusion: the 5 PAC keys are different
between user and kernel. Each time an application switches from user mode to
kernel mode or vice versa, the values of all PAC keys are changed.
- As previously observed, this means than switching from EL0 to EL1 implies a detour through EL3 to reprogram the PAC key registers. And again when switching back from EL1 to EL0.
- Note a difference with Linux here. On macOS using the "arm64e" platform, the generated code uses three PAC keys, IA, IB, DA, but not DB and GA. Since the kernel is compiled for "arm64e", it uses these three PAC registers. However, unlike the Linux kernel, the macOS kernel choses to change all PAC keys when switching to kernel mode, even those it doesn't use.
- The PAC key registers can be read and modified. Their initial value in a process is zero. Modifying them is possible. The assigned value remains valid in the process.
- The PACIx and PACDx instructions are inoperative in practice. They leave the pointer unmodified.
- The computed PAC values match the QARMA5 software computation. This was expected but still nice to verify.
- The PAC computation uses an implementation-defined algorithm and it is not possible to anticipate a PAC value, even when the context is fully known. Whether this is a good or bad thing is left to the reader...
The following table compares various forms of PAC computations on several platforms.
Platform | macOS host (M1) | Linux VM (M1) | Linux VM (M1) | Windows VM (M1) | Linux host (G3) | Linux VM (G3) |
---|---|---|---|---|---|---|
OS | macOS | Linux (Ubuntu) | Linux (Debian) | Windows 11 | Linux (Amazon) | Linux (Ubuntu) |
OS kernel version | 13.3.1 | 6.2.0-20 | 5.10.0-23 | 10.0.22000 | 6.1.27-43.48 | 5.15.0-1030-aws |
Virtualization | Host | VM | VM | VM | Host | VM |
Processor chip | Apple M1 | Apple M1 | Apple M1 | Apple M1 | AWS Graviton 3 | AWS Graviton 3 |
CPU cores | Apple Firestorm | Apple Firestorm | Apple Firestorm | Apple Firestorm | Arm Neoverse V1 | Arm Neoverse V1 |
PAC algorithm | private | private | private | private | QARMA5 | QARMA5 |
PAuth / PAuth2 | yes / no | yes / no | yes / no | yes / no | yes / yes | yes / yes |
EPAC / FPAC | yes / no | yes / no | yes / no | yes / no | yes / no | yes / no |
MTE tagging | no | no | no | no | no | no |
PAC size | ||||||
data, lower | 8 bits | 7 bits | 7 bits | 16 bits | 7 bits | 7 bits |
data, upper | 16 bits | 7 bits | 15 bits | 16 bits | 7 bits | 7 bits |
instruction, lower | 16 bits | 7 bits | 7 bits | 16 bits | 7 bits | 7 bits |
instruction, upper | 16 bits | 15 bits | 15 bits | 16 bits | 15 bits | 15 bits |
PAC position | ||||||
data, lower | 54:47 | 54:48 | 54:48 | 63:56,54:47 | 54:48 | 54:48 |
data, upper | 63:56,54:47 | 54:48 | 63:56,54:48 | 63:56,54:47 | 54:48 | 54:48 |
instruction, lower | 63:56,54:47 | 54:48 | 54:48 | 63:56,54:47 | 54:48 | 54:48 |
instruction, upper | 63:56,54:47 | 63:56,54:48 | 63:56,54:48 | 63:56,54:47 | 63:56,54:48 | 63:56,54:48 |
PAC selector bit | ||||||
data, lower | bit 55 | bit 55 | bit 55 | bit 63 | bit 55 | bit 55 |
data, upper | bit 55 | bit 55 | bit 55 | bit 63 | bit 55 | bit 55 |
instruction, lower | bit 63 | bit 55 | bit 55 | bit 63 | bit 55 | bit 55 |
instruction, upper | bit 63 | bit 55 | bit 55 | bit 63 | bit 55 | bit 55 |
EL0/EL1 PAC keys | ||||||
DA | distinct keys | same key | same key | zero | same key | same key |
DB | distinct keys | same key | same key | zero | same key | same key |
IA | distinct keys | distinct keys | distinct keys | zero | distinct keys | distinct keys |
IB | distinct keys | same key | same key | zero | same key | same key |
Generic (PACGA) | distinct keys | same key | same key | zero | same key | same key |
TCR_EL1 register | ||||||
TBI0 | 1 | 1 | 1 | 0 | 1 | 1 |
TBID0 | 1 | 0 | 0 | 0 | 0 | 0 |
T0SZ | 17 | 16 | 16 | 17 | 16 | 16 |
TBI1 | 0 | 1 | 0 | 0 | 1 | 1 |
TBID1 | 0 | 1 | 0 | 0 | 1 | 1 |
T1SZ | 17 | 16 | 16 | 17 | 16 | 16 |