Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libmem: implement policy-agnostic memory allocation/accounting. #332

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Commits on Sep 17, 2024

  1. resmgr: use better container names in dumps.

    Use PrettyName()-compatible container name dumps together with
    container IDs in NRI evnet dumps.
    
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    klihub committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    fdb6660 View commit details
    Browse the repository at this point in the history
  2. resmgr: use module-global logger.

    Having loggers embedded in resmgr types as members was a bad idea.
    Replace them with module global logger instances.
    
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    klihub committed Sep 17, 2024
    Configuration menu
    Copy the full SHA
    d399547 View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2024

  1. resmgr: lifecycle overlap detection and workaround.

    We recently discovered a problem with the generated stream of
    container lifecycle events with some runtime versions. A side
    effect of this is that we get Create/Stop events for multiple
    container instances with seemingly overlapping lifecycle: the
    latter instance get created before the former one is stopped.
    
    When undetected, such a false overlap might cause overcommit
    of resources, with both instances temporarily using the full
    resource set of the container. As a workaround, we now track
    containers also by fully qualified name ($namespace/pod/ctr)
    and internally generate an event for releasing the resources
    if the old instance whenever we notice that a creation event
    would cause a duplicate instance for the same name.
    
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    d551afc View commit details
    Browse the repository at this point in the history
  2. topology-aware: check grants, look for stales duplicates.

    Check grants, looking for grants with stale allocations
    or duplicate containers (detected using fully qualified
    names). Dump total memory and CPU granted.
    
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    f609d93 View commit details
    Browse the repository at this point in the history
  3. libmem: initial policy agnostic memory allocator.

    Initial implementation of a policy agnostic memory accounting
    and allocation library.
    
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    f301e5d View commit details
    Browse the repository at this point in the history
  4. libmem: add unit tests and sample sysfs test data.

    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    bca8148 View commit details
    Browse the repository at this point in the history
  5. libmem: add a short description of libmem.

    Add a brief description of libmem, its basic ideas and core
    concepts, as package level documentation.
    
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    b7a7750 View commit details
    Browse the repository at this point in the history
  6. topology-aware: initial libmem conversion.

    Cut out the original memory accounting and allocation code.
    Plug in a libmem-based memory allocator instead.
    
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    22831c7 View commit details
    Browse the repository at this point in the history
  7. topology-aware: update unit tests.

    Update unit tests after libmem conversion.
    
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    08c8711 View commit details
    Browse the repository at this point in the history
  8. balloons: initial libmem conversion.

    Plug in libmem-based memory allocation (and accounting).
    
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    029e531 View commit details
    Browse the repository at this point in the history
  9. balloons,cache: support memory-type annotations.

    Add support for per balloon type memory configuration and per
    container overrides using pod annotations. Pass configured or
    annotated memory types to libmem for allocation.
    
    TODO(klihub): per balloon configuration still missing (?)
    
    Co-authored-by: Krisztian Litkey <krisztian.litkey@intel.com>
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    askervin and klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    ec6c303 View commit details
    Browse the repository at this point in the history
  10. e2e: add topology-aware memory allocation tests.

    Add first e2e tests for topology-aware policy memory allocation
    and type control.
    
    Co-authored-by: Krisztian Litkey <krisztian.litkey@intel.com>
    Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
    askervin and klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    fade0eb View commit details
    Browse the repository at this point in the history
  11. e2e: support memory-type annotations in created pods

    "MEMTYPE=x create ..." in test scripts creates a guaranteed or
    burstable pod with "memory-type...: x" annotation effective for all
    containers in the pod.
    
    Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
    askervin authored and klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    1585cb6 View commit details
    Browse the repository at this point in the history
  12. e2e: add fuzz test sources for hbm/dram/pmem pods

    Adds fuzz test generator script, model and runner for topology-aware
    policy reliability tests on HBM+DRAM+PMEM platform.
    
    Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
    askervin authored and klihub committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    0bb2c73 View commit details
    Browse the repository at this point in the history