Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ACPI: Only create numa nodes from entries in SRAT or SRAT emulation.
Here, I will use the term Proximity Domains for the ACPI description and Numa Nodes for the in kernel representation. Until ACPI 6.3 it was arguably possible to interpret the specification as allowing _PXM in DSDT and similar to define additional Proximity Domains. The reality was that was never the intent, and a 'clarification' was added in ACPI 6.3 [1]. In practice I think the kernel has never allowed any other interpretaion, except possibly on adhoc base within some out of tree driver (using it very very carefully given potential to crash when using various standard calls such as devm_kzalloc). Proximity Domains are always defined in SRAT. In ACPI, there are methods defined in ACPI to allow their characteristics to be tweaked later but Proximity Domains have to be referenced in this table at boot, thus allowing Linux to instantiate relevant Numa Node data structures. We ran into a problem when enabling _PXM handling for PCI devices and found there were boards out there advertising devices in proximity domains that didn't exist [2]. The fix suggested here is to modfiy the function acpi_map_pxm_to_node. This function is both used to create and lookup proximity domains. A parameter is added to specify whether it should create a new proximity domain when it encounters a Proximity Domain ID that it hasn't seen before. Naturally there is a quirk. For SRAT ITS entries on ARM64 the handling is done with an additional pass of SRAT, potentially later in the boot. We could modify that behaviour so we could identify the existence of Proximity Domains unique to the ITS structures, and handle them as a special case of a Genric Initiator (once support for those merges) however... Currently (5.8-rc2) setting the Proximity Domain of an ITS to one that hasn't been instantiated by being specified in another type of SRAT resource entry results in: ITS [mem 0x202100000-0x20211ffff] ITS@0x0000000202100000: Using ITS number 0 Unable to handle kernel paging request at virtual address 0000000000001a08 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [0000000000001a08] user address but active_mm is swapper Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G A 5.8.0-rc2 torvalds#483 pstate: 80000089 (Nzcv daIf -PAN -UAO BTYPE=--) pc : __alloc_pages_nodemask+0xe8/0x338 lr : __alloc_pages_nodemask+0xc0/0x338 sp : ffffa81540c139b0 x29: ffffa81540c139b0 x28: 0000000000000001 x27: 0000000000000100 x26: ffffa81540c1ad38 x25: 0000000000000000 x24: 0000000000000000 x23: ffffa81540c23c00 x22: 0000000000000004 x21: 0000000000000002 x20: 0000000000001a00 x19: 0000000000000100 x18: 0000000000000010 x17: 000000000001f000 x16: 000000000000007f x15: ffffa81540c24070 x14: ffffffffffffffff x13: ffffa815c0c137d7 x12: ffffa81540c137e4 x11: ffffa81540c3e000 x10: ffffa81540ecee68 x9 : ffffa8153f0f61d8 x8 : ffffa81540ecf000 x7 : 0000000000000141 x6 : ffffa81540ecf401 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000081 x0 : 0000000000001a00 Call trace: __alloc_pages_nodemask+0xe8/0x338 alloc_pages_node.constprop.0+0x34/0x40 its_probe_one+0x2f8/0xb18 gic_acpi_parse_madt_its+0x108/0x150 acpi_table_parse_entries_array+0x17c/0x264 acpi_table_parse_entries+0x48/0x6c acpi_table_parse_madt+0x30/0x3c its_init+0x1c4/0x644 gic_init_bases+0x4b8/0x4ec gic_acpi_init+0x134/0x264 acpi_match_madt+0x4c/0x84 acpi_table_parse_entries_array+0x17c/0x264 acpi_table_parse_entries+0x48/0x6c acpi_table_parse_madt+0x30/0x3c __acpi_probe_device_table+0x8c/0xe8 irqchip_init+0x3c/0x48 init_IRQ+0xcc/0x100 start_kernel+0x33c/0x548 As we die in this case in existing kernels, we can be fairly sure that no one actually has such a firmware in production. As such this patch avoids the complexity that would be needed to handle this corner case, and simply does not allow the ITS entry parsing code to instantiate new Numa Nodes. If one is encountered that does not already exist, then NO_NUMA_NODE is assigned and a warning printed just as if the value had been greater than allowed Numa Nodes. "SRAT: Invalid NUMA node -1 in ITS affinity" I have only tested this for now on our ARM64 Kunpeng920 servers. Open questions: * should we warn about a broken firmware or insufficent value of NUMA_NODES_SHIFT if we find a firmware trying to assign any device to a non existent Proximity Domain. * previously an smmuv3 in IORT with a Proximity Domain set to a non existent value would have resulted in a failure to add the device. After this change it will be added to the default node. Is that a problem? [1] Note in ACPI Specification 6.3 5.2.16 System Resource Affinity Table (SRAT) [2] https://patchwork.kernel.org/patch/10597777/ Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
- Loading branch information