-
Notifications
You must be signed in to change notification settings - Fork 176
Simulating complex memory with Qemu
EFI attributes may be used to mark some memory ranges as "soft-reserved" instead of normal RAM so that the kernel doesn't use them by default. This is useful for memory with different performance that should be reserved to specific uses/applications. They are exposed as DAX by default and possibly as NUMA node later. This requires to boot in UEFI (instead of legacy BIOS) and pass something like efi_fake_mem=1G@4G:0x40000 to mark 4-5GB range as soft-reserved.
The 0-4GB physical memory range is quite complicated when booting Qemu since it contains lots of reserved ranges, including 3-4GB reserved for PCI stuff. It's better to use ranges after 4GB to find large ranges of normal memory. So make the first NUMA node 3GB and use other nodes, they will be mapped after the PCI stuff, after 4GB.
If two NUMA nodes whose memory ranges are consecutive are marked as soft-reserved, it looks like we get a single range with the locality of the first one. So if you want too separate memories, don't use consecutive ranges, for instance two non-consecutive NUMA-node.
kvm \
-drive if=pflash,format=raw,file=./OVMF.fd \
-drive media=disk,format=qcow2,file=efi.qcow2 \
-smp 4 -m 6G \
-object memory-backend-ram,size=3G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-object memory-backend-ram,size=1G,id=m2 \
-object memory-backend-ram,size=1G,id=m3 \
-numa node,nodeid=0,memdev=m0,cpus=0-1 \
-numa node,nodeid=1,memdev=m1,cpus=2-3 \
-numa node,nodeid=2,memdev=m2 \
-numa node,nodeid=3,memdev=m3
OVMF is required for booting in UEFI mode (during both VM install and later).
On the kernel boot command-line, pass efi_fake_mem=1G@4G:0x40000,1G@6G:0x40000
to make NUMA node#1 (one with CPUs) and #3 (CPU-less) as soft-reserved. Their memory disappears, and a DAX device appears.
% cat /proc/iomem
100000000-13fffffff : hmem.0 <- node #1 is soft-reserved
100000000-13fffffff : Soft Reserved
100000000-13fffffff : dax0.0
140000000-17fffffff : System RAM <- node #2 is normal memory
180000000-1bfffffff : hmem.1 <- node #3 is soft-reserved
180000000-1bfffffff : Soft Reserved
180000000-1bfffffff : dax1.0
Those DAX devices under /sys/bus/dax/devices point to platform hmem devices but there isn't much useless in there.
dax0.0 -> ../../../devices/platform/hmem.0/dax0.0
dax1.0 -> ../../../devices/platform/hmem.1/dax1.0
dax0.0 has target_node=numa_node=1
in its sysfs attributes because node1 is online thanks to existing CPUs.
dax1.0 is offline since it contains neither CPUs nor RAM. It has target_node=3
as expected, but numa_node=0
since this must be a online node during boot. node#0 was chosen because it's close (we didn't specify any distance matrix on the Qemu command-line, the default 10=local, 20=remote is used, hence 20 is the minimal distance from node#3 to online nodes, and node#0 is the first one of those).
% daxctl reconfigure-device --mode=system-ram all
% cat /proc/iomem
[...]
100000000-13fffffff : hmem.0
100000000-13fffffff : Soft Reserved
100000000-13fffffff : dax0.0
100000000-13fffffff : System RAM (kmem) <- node#1 is back as a NUMA node
140000000-17fffffff : System RAM
180000000-1bfffffff : hmem.1
180000000-1bfffffff : Soft Reserved
180000000-1bfffffff : dax1.0
180000000-1bfffffff : System RAM (kmem) <- node#3 is back as a NUMA node