The acct_gather_profile/influxdb plugin uses the same base as the HDF5 profiling plugin. It allows Slurm to coordinate collecting data on jobs it runs on a cluster that is more detailed than is practical to include in its database. The data comes from periodically sampling various performance data either collected by Slurm, the operating system, or component software. The plugin will record the data from each source as a Time Series into a custom InfluxDB server.
Collects exactly the same information as the HDF5 plugin:
Measurement | Description |
---|---|
CPUFrequency | CPU Frequency at time of sample |
CPUTime | Seconds of CPU time used during the sample |
CPUUtilization | CPU Utilization during the interval |
Pages | Pages used in sample |
ReadMB | Number of megabytes read from local disk |
RSS | Value of RSS at time of sample |
VMSize | Value of VM Size at time of sample |
WriteMB | Number of megabytes written to local disk |
A small buffer (16KB) is used to avoid sending data for every sample collected. After task ended, plugin will send buffered data.
Information is sent to the central server using libcurl-devel library, so you should use this configure option:
--with-libcurl
It is a good idea to have a web layer over your InfluxDB server, such as Grafana, in order to visualize the data.
Here you can find some Screenshots.
Please, refer to INSTALL.md for installation instructions.