From 39ff40d45ee7603f5bac01f7f38e79cff02deb09 Mon Sep 17 00:00:00 2001 From: "Wang, Yu W" Date: Wed, 6 Apr 2022 16:10:28 +0800 Subject: [PATCH] publish from 0ba1c207 (gopub-stage) --- deployment/README.md | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/deployment/README.md b/deployment/README.md index 491584fce..391614eaa 100644 --- a/deployment/README.md +++ b/deployment/README.md @@ -1,5 +1,12 @@ +# Intel XPU Manager +Intel XPU Manager is an in-band node-level tool that provides local/remote GPU management. It is easily integrated into the cluster management solutions and cluster scheduler. GPU users may use it to manage Intel GPUs, locally. +It supports local command line interface, local library call and remote RESFTul API interface. + +***So far, this container image is targeted as a Prometheus exporter.*** + +The Intel XPU Manager source repository can be found at [intel/xpumanager](https://github.com/intel/xpumanager/). + # Run XPU Manager in Docker -*So far, XPUM container image is targeted as a Prometheus exporter.* ## Enable TLS Generate certificate for TLS and configure REST user credential: @@ -51,3 +58,28 @@ docker run --rm --cap-drop ALL --cap-add=SYS_ADMIN \ -e XPUM_REST_PORT=12345 \ ${xpum_image} ``` + +## Support PCIe Throughput +PCIe throughput metrics collection depends on the kernel module '**msr**'. It should be loaded on the host by "***modprobe msr***". + +And this metrics collection is not started in XPUM by default. To make XPUM start to collect it, user needs to pass environment variable ***XPUM_METRICS*** which includes the PCIe throughput metrics index. + +This example shows how to get the list of metrics index from XPUM daemon help text: +```sh +docker run --rm --entrypoint /opt/xpum/bin/xpumd ${xpum_image} -h +``` +This example shows how to make XPUM in container start to collect PCIe throughput metrics by passing the environment variable ***XPUM_METRICS***: +```sh +docker run --rm --cap-drop ALL --cap-add=SYS_ADMIN \ +--cap-add=SYS_RAWIO \ +--publish 29999:29999 \ +--device /dev/dri:/dev/dri \ +--device /dev/cpu:/dev/cpu \ +-v /sys/firmware/acpi/tables/MCFG:/pcm/sys/firmware/acpi/tables/MCFG:ro \ +-v /proc/bus/pci/:/pcm/proc/bus/pci/ \ +-v /proc/sys/kernel/nmi_watchdog:/pcm/proc/sys/kernel/nmi_watchdog \ +-v $(pwd)/rest/conf:/opt/xpum/rest/conf:ro \ +-e XPUM_REST_NO_TLS=1 \ +-e XPUM_METRICS=0-37 \ +${xpum_image} +``` \ No newline at end of file