PMIx storage sytem query support #277

shanedsnyder · 2020-09-04T19:12:55Z

Overview

The PMIx storage WG is investigating how to integrate storage system support into the PMIx standard. As a precursor to introducing broader storage APIs into the standard, we would like to first propose extensions to PMIx's existing query interface, not only to provide PMIx users functionality for querying and learning more about available storage resources on a given system, but also to flesh out necessary PMIx constructs for implementing storage support.

Motivation

An ability to query storage system characteristics, parameters, and other state is critical to ensuring efficient use of HPC storage resources by application developers, I/O libraries and middleware services, and workflow management systems. This is due to an increasingly complex storage landscape in which a hierarchy of available storage systems are presented to users, each with distinct usage and performance characteristics that may impact how they should be efficiently leveraged. For instance, many modern HPC systems are not only offering the large parallel file systems we are accustomed to seeing in HPC, but also new high-performance low-latency storage systems like burst buffers (which may be deployed on fabric with compute nodes or even directly on the compute nodes themselves). Furthermore, novel object-based storage systems (e.g., DAOS) are likely to play a more prominent role in HPC data management going forward, yet traditional file system users may struggle to understand the characteristics of these systems and how they compare to their file system counterparts.

The primary motivation of this work is to formalize the state and characteristics of these diverse storage systems available on different platforms and to make this information available to PMIx users via the existing query interface defined in the standard (Chapter 7.1). This query ability will help inform storage system users about which storage systems are most suitable for their I/O needs. For instance, knowing general storage system capacity and bandwidth characteristics could inform users on how much data can fit in different storage tiers and at what speed this data can be accessed.

Discussion Items

At a high-level, our integration into the existing query interface just requires the specification of new query attributes for describing information we want to query from storage systems and new query qualifiers for describing additional storage system context needed for handling queries.

To provide more concrete details on what we are proposing, I've defined new and existing query attributes/qualifiers we intend to utilize for describing storage queries below:

New query qualifiers

PMIX_STORAGE_ID: a unique ID string (either assigned by administrator or generated by PMIx) to reference a given storage system -- this is the primary way for identifying distinct PMIx storage systems
PMIX_STORAGE_PATH: secondary ID method allowing for referencing POSIX-based file systems by mount point for convenience, rather than using PMIX_STORAGE_ID
PMIX_STORAGE_TYPE: qualifier to limit the query to a particular storage type (e.g., lustre, DAOS, PFS, burst buffer, …)
PMIX_STORAGE_HOST: qualifier to limit the query to a particular storage host
PMIX_STORAGE_DEVICE: qualifier to limit the query to a particular storage device

PMIX_STORAGE_ID and PMIX_STORAGE_PATH are the most fundamental qualifiers, used to direct queries at specific storage systems the user is aware of.

We are in the process of trying to better define PMIX_STORAGE_TYPE, but it will most likely be split up into separate qualifiers that are more appropriately scoped (e.g., local or remote storage, file or object storage, etc.)

PMIX_STORAGE_HOST and PMIX_STORAGE_DEVICE are used to answer some queries at higher-granularity. For instance, for Lustre file systems, these qualifiers could restrict queries to a particular OSS and or OST rather than an entire system.

Existing query qualifiers

PMIX_USERID: restrict given query to a specific user
PMIX_GRPID: restrict given query to a specific group (project)

These two qualifiers will allow some storage system queries to be restricted to certain users and/or projects. For instance, if we wanted to know the available capacity of a given user's home file system rather than the total capacity of the entire mount, we would pass their user ID in PMIX_USERID.

New query attributes

PMIX_QUERY_STORAGE_LIST: Comma-delimited list of identifiers for all available storage systems (e.g, “gpfs-mirafs0,lus-thetafs0”)

This query would be used to provide the user with the list of storage systems PMIx is aware of.

PMIX_STORAGE_CAPACITY_LIMIT: overall capacity (in MB) of specified storage system
PMIX_STORAGE_CAPACITY_USED: used capacity (in MB) of specified storage system
PMIX_STORAGE_OBJECT_LIMIT: overall limit on number of objects (e.g., inodes) of specified storage system
PMIX_STORAGE_OBJECTS_USED: number of used objects (e.g., inodes) of specified storage system
PMIX_STORAGE_XFER_SIZE: optimal transfer size (in KB) of specified storage system

These queries are all inspired by traditional statfs(), but attempt to generalize the concept beyond file-based storage systems to object-based storage systems. They are mainly to inform on total number of resources (storage capacity and number of objects) and number of consumed resources, as well as optimal transfer size for the underlying storage system.

PMIX_STORAGE_BW_LIMIT: overall bandwidth limit (in MB/sec) of specified storage system
PMIX_STORAGE_BW: overall observed bandwidth (in MB/sec) of specified storage system

These two query attributes are used to indicate a peak bandwidth and a currently observed bandwidth for a given storage system, to inform on system limits as well as current system state.

PMIX_STORAGE_ID: identifier of the storage system being referenced
PMIX_STORAGE_PATH: Mount point corresponding to a specified storage ID
PMIX_STORAGE_TYPE: qualifier to limit the query to a particular storage type (e.g., lustre, DAOS, PFS, burst buffer, …)

These query attributes are just used to map between different qualifiers discussed previously. For instance, we could query a PMIX_STORAGE_ID by providing a PMIX_STORAGE_PATH as a qualifier.

Please let me know if I can provide more details.

The text was updated successfully, but these errors were encountered:

adilger · 2020-09-04T22:20:49Z

I would also add a PMIX_STORAGE_IOPS_LIMIT and PMIX_STORAGE_IOPS values, since these may be considerably different from the BW numbers. You may consider to name these PMIX_STORAGE_IOPS_{MAX,CUR} or similar? It probably makes sense to add caveats that *_CUR is the current observed value and may be subject to rapid changes, while *_MAX may be an observed limit or a theoretical maximum, but together can be used by applications to make a judgement on the current utilization of the storage system (i.e. either "available or maximum expected throughput" and/or "percent busy").

The intent is to allow applications to verify that the storage is suitable for their needs (e.g. not accidentally writing TB to an NFS share with BW=50MB/s instead of a PFS with BW=50GB/s). In addition, applications with flexibility in their IO scheduling (e.g. whether to make a checkpoint for the current timestep) could use the storage utilization to wait for a relatively idle period.

shanedsnyder · 2020-09-09T21:36:34Z

@adilger, that makes a lot of sense about adding an IOPS equivalent to what we are doing with bandwidth. I'll go ahead and add it and bounce it off others at our meeting next week, but I don't imagine that's a controversial addition.

I think the suggestion to convert to a naming convention of PMIX_STORAGE_{BW,IOPS}_{MAX,CUR} rather than what we currently have is also probably a good idea -- it's a bit more explicitly named and leaves less to the imagination. We'll make sure the actual definitions of the counters make clear that one is for a limit or some theoretical max, while the other is a recently observed value.

One question is whether we should include some sort of qualifier attribute that can be used with the CUR attribute variants that enforces over what time interval the measurement is made or whether we should just pick a reasonable value and force all storage systems to provide the measurement over that duration (i.e., as part of the attribute definition, specify it's for the most recent 5-second interval or 1-minute interval, something like that). Any thoughts? If we do want to allow users to specify the interval, I think we'll need to define an additional attribute for that.

shanedsnyder · 2020-09-09T22:20:40Z

We should also consider defining qualifier attributes specifying access type for BW and IOPs calculations, right? Something like PMIX_STORAGE_ACCESS_TYPE, with possible values PMIX_STORAGE_ACCESS_{RD,WR,RDWR} (with RDWR as default, just providing calculations over both read and write accesses). That was brought up in a previous meeting w.r.t. PMIX_STORAGE_XFER_SIZE (the idea being some systems maybe would have different optimal transfer sizes depending on the access), but obviously it could also be useful to specifically query read or write bandwidths, for instance.

adilger · 2020-09-10T08:57:43Z

Yes, separating read and write requests makes sense, since the performance for the two can be quite different.

jjhursey · 2023-01-12T16:16:57Z

Storage query support was added as Provisional in v4.1 (#280 and #346). Please reopen or file a new ticket if/when further work is required in this area.

jjhursey closed this as completed Jan 12, 2023

jjhursey added this to the PMIx v4.1 Standard milestone Jan 12, 2023

jjhursey added enhancement RFC Request for Comment labels Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PMIx storage sytem query support #277

PMIx storage sytem query support #277

shanedsnyder commented Sep 4, 2020

adilger commented Sep 4, 2020

shanedsnyder commented Sep 9, 2020

shanedsnyder commented Sep 9, 2020

adilger commented Sep 10, 2020

jjhursey commented Jan 12, 2023

PMIx storage sytem query support #277

PMIx storage sytem query support #277

Comments

shanedsnyder commented Sep 4, 2020

Overview

Motivation

Discussion Items

New query qualifiers

Existing query qualifiers

New query attributes

adilger commented Sep 4, 2020

shanedsnyder commented Sep 9, 2020

shanedsnyder commented Sep 9, 2020

adilger commented Sep 10, 2020

jjhursey commented Jan 12, 2023