-
-
Notifications
You must be signed in to change notification settings - Fork 604
Memory Management
Upon boot time on x86_64
, in arch_setup_free_memory()
, OSv discovers how much physical memory is available by reading ent820 entries and then linearly maps the identified memory ranges by calling memory::free_initial_memory_range()
. On aarch64
, the available physical memory information is retrieved from DTB as coded in dtb_setup()
in arch/aarch64/arch-dtb.cc
. Once the memory is discovered, all corresponding memory ranges are ultimately registered in memory::free_page_ranges
of type page_range_allocator
that effectively tracks all used/free physical memory and implements lowest level memory allocation logic. The key fields of page_range_allocator
are _free_huge
and _free
. The first one is an intrusive multiset of page ranges of size >= 256 MB, the latter is an array of 16 intrusive lists where each stores page ranges of corresponding logarithmic size. At this level, memory is tracked/allocated/freed in 4K chunks (pages) aligned at 0x...000 addresses, which means that individual page range is a contiguous area of physical memory N-pages long.
So for example, given 100MB from the host on QEMU on x86_64, OSv would find 3 memory ranges - smaller ~640KB in lower memory, medium 1MB located in the 2nd MB, and the largest one starting at wherever loader.elf
ends - roughly 9.5MB offset and ending at 100MB. With OSv running with 100MB of RAM and gdb paused right after arch_setup_free_memory()
, the free_page_ranges
looks like this:
(gdb) osv heap
0x0000400000001000 0x000000000009e000 // Lower RAM < 640KB
0x0000400000100000 0x0000000000100000 // 2nd MB - ends right below the kernel
0x000040000094d000 0x0000000005a90000 // Starts right above the kernel
For more details on how memory is managed and set up at the lowest level, please read Managing Memory Pages.
From this point on, OSv is ready to handle "malloc/free" family and memory::alloc_page()/free_page()
calls by drawing/releasing memory from/to free_page_ranges
in form of page_range
objects (see methods page_range_allocator::alloc()
, alloc_aligned()
and free()
) and mapping to virtual address ranges. However until much later when SMP is enabled (multiple vCPUs are fully activated), the allocations would be handled at a different granularity than after SMP is on. In addition in the first phase (pre-SMP enabled) the allocations draw pages directly from the free_page_ranges
object, whereas after SMP is enabled they draw memory from L1/L2 pools.
There are as many L1 pools as vCPUs (per-cpu
construct) and a single global L2 pool - global_l2
. The L1 pools draw pages in form of page_batch
from the global L2 pool which in turn draws page ranges from free_page_ranges
. Both L1 and L2 pools operate at page size level and implement low/high watermark algorithm (for example L1 pools keep at least 128 pages of memory available). The high-level memory allocation functions (like malloc
) draw memory from L1 pool using untracked_alloc_page
and untracked_free_page
.
TODO: Describe L1 and L2 in more detail
It is also worth noting that most malloc functions (except for malloc_large
) end up calling std_malloc()
that allocates virtual memory in different ways depending on whether we are in pre/post-SMP enabled mode and depending on the size of the memory request. The sizes ranges are:
- x <= 1024 (page size/4)
- 1024 < x <= 4096
- x > 4096
If we are in SMP-enabled mode and the requested size is less or equal to 1024 bytes, the allocation is going to be delegated to malloc pools. Malloc pools are setup per-CPU and dedicated to specific size range (2^(k-1) < x <=2^k where k is less or equal 10). The way std_malloc()
handles <= 4K allocations directly impacts varying degrees of underlying physical memory utilization. For example, any request above 1024 bytes will use the whole page and in the worst-case scenario waste 3K of physical memory. Similarly, malloc pool allocations in worst-case scenarios may waste up to half of the 2^k-1 segment size.
TODO: Describe how exactly pre-SMP and post-SMP memory allocation differs.
The malloc_large/free_large()
functions draw/release memory directly from/to free_page_ranges
in both pre-and post-SMP-enabled phases.
vaddr paddr size perm memattr name
40200000 200000 67c434 rwxp normal kernel
400000000000 0 40000000 rwxp normal main
4000000f0000 f0000 10000 rwxp normal dmi
4000000f5a10 f5a10 17c rwxp normal smbios
400040000000 40000000 3ffdd000 rwxp normal main
40007fe00000 7fe00000 200000 rwxp normal acpi
4000feb91000 feb91000 1000 rwxp normal pci_bar
4000feb92000 feb92000 1000 rwxp normal pci_bar
4000fec00000 fec00000 1000 rwxp normal ioapic
500000000000 0 40000000 rwxp normal page
500040000000 40000000 3ffdd000 rwxp normal page
600000000000 0 40000000 rwxp normal mempool
600040000000 40000000 3ffdd000 rwxp normal mempool
vaddr paddr size perm memattr name
8000000 8000000 10000 rwxp dev gic_dist
8010000 8010000 10000 rwxp dev gic_cpu
9000000 9000000 1000 rwxp dev pl011
9010000 9010000 1000 rwxp dev pl031
10000000 10000000 2eff0000 rwxp dev pci_mem
3eff0000 3eff0000 10000 rwxp dev pci_io
fc0000000 40000000 7de000 rwxp normal kernel
4010000000 4010000000 10000000 rwxp dev pci_cfg
40000a000000 a000000 200 rwxp normal virtio_mmio_cfg
40000a000200 a000200 200 rwxp normal virtio_mmio_cfg
40000a000400 a000400 200 rwxp normal virtio_mmio_cfg
40000a000600 a000600 200 rwxp normal virtio_mmio_cfg
40000a000800 a000800 200 rwxp normal virtio_mmio_cfg
40000a000a00 a000a00 200 rwxp normal virtio_mmio_cfg
40000a000c00 a000c00 200 rwxp normal virtio_mmio_cfg
40000a000e00 a000e00 200 rwxp normal virtio_mmio_cfg
4000407de000 407de000 7f822000 rwxp normal main
5000407de000 407de000 7f822000 rwxp normal page
6000407de000 407de000 7f822000 rwxp normal mempool
All the non-linear mappings fall within 0x000000000000 : 0x400000000000
minus any collisions with the devices memory linear mappings.
(gdb) osv mmap
0x0000000000000000 0x0000000000000000 [0.0 kB] flags=none perm=none
0x0000100000000000 0x0000100000009000 [36.0 kB] flags=fpmF perm=r offset=0x00000000 path=/usr/lib/fs/libsolaris.so
0x0000100000009000 0x000010000009c000 [588.0 kB] flags=fpmF perm=rx offset=0x00009000 path=/usr/lib/fs/libsolaris.so
0x000010000009c000 0x00001000000c4000 [160.0 kB] flags=fpmF perm=r offset=0x0009c000 path=/usr/lib/fs/libsolaris.so
0x00001000000c4000 0x00001000000c6000 [8.0 kB] flags=fpmF perm=r offset=0x000c3000 path=/usr/lib/fs/libsolaris.so
0x00001000000c6000 0x00001000000c9000 [12.0 kB] flags=fpmF perm=rw offset=0x000c5000 path=/usr/lib/fs/libsolaris.so
0x00001000000c9000 0x00001000000e2000 [100.0 kB] flags=fp perm=rw
0x00001000000e2000 0x00001000000e3000 [4.0 kB] flags=fmF perm=r offset=0x00000000 path=/libvdso.so
0x00001000000e3000 0x00001000000e4000 [4.0 kB] flags=fmF perm=rx offset=0x00001000 path=/libvdso.so
0x00001000000e4000 0x00001000000e5000 [4.0 kB] flags=fmF perm=r offset=0x00002000 path=/libvdso.so
0x00001000000e5000 0x00001000000e6000 [4.0 kB] flags=fmF perm=r offset=0x00002000 path=/libvdso.so
0x00001000000e6000 0x00001000000e7000 [4.0 kB] flags=fmF perm=rw offset=0x00003000 path=/libvdso.so
0x00001000000e7000 0x00001000000e8000 [4.0 kB] flags=fmF perm=r offset=0x00000000 path=/hello
0x00001000000e8000 0x00001000000e9000 [4.0 kB] flags=fmF perm=rx offset=0x00001000 path=/hello
0x00001000000e9000 0x00001000000ea000 [4.0 kB] flags=fmF perm=r offset=0x00002000 path=/hello
0x00001000000ea000 0x00001000000eb000 [4.0 kB] flags=fmF perm=r offset=0x00002000 path=/hello
0x00001000000eb000 0x00001000000ec000 [4.0 kB] flags=fmF perm=rw offset=0x00003000 path=/hello
0x0000200000000000 0x0000200000001000 [4.0 kB] flags=p perm=none
0x0000200000001000 0x0000200000002000 [4.0 kB] flags=p perm=none
0x0000200000002000 0x0000200000101000 [1020.0 kB] flags=p perm=rw // Most likely stack
0x0000200000101000 0x0000200000102000 [4.0 kB] flags=p perm=none
0x0000200000102000 0x0000200000201000 [1020.0 kB] flags=p perm=rw // Most likely stack
0x0000400000000000 0x0000400000000000 [0.0 kB] flags=none perm=none