Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMIx storage sytem query support #277

Closed
shanedsnyder opened this issue Sep 4, 2020 · 5 comments
Closed

PMIx storage sytem query support #277

shanedsnyder opened this issue Sep 4, 2020 · 5 comments
Labels
enhancement RFC Request for Comment

Comments

@shanedsnyder
Copy link
Contributor

Overview

The PMIx storage WG is investigating how to integrate storage system support into the PMIx standard. As a precursor to introducing broader storage APIs into the standard, we would like to first propose extensions to PMIx's existing query interface, not only to provide PMIx users functionality for querying and learning more about available storage resources on a given system, but also to flesh out necessary PMIx constructs for implementing storage support.

Motivation

An ability to query storage system characteristics, parameters, and other state is critical to ensuring efficient use of HPC storage resources by application developers, I/O libraries and middleware services, and workflow management systems. This is due to an increasingly complex storage landscape in which a hierarchy of available storage systems are presented to users, each with distinct usage and performance characteristics that may impact how they should be efficiently leveraged. For instance, many modern HPC systems are not only offering the large parallel file systems we are accustomed to seeing in HPC, but also new high-performance low-latency storage systems like burst buffers (which may be deployed on fabric with compute nodes or even directly on the compute nodes themselves). Furthermore, novel object-based storage systems (e.g., DAOS) are likely to play a more prominent role in HPC data management going forward, yet traditional file system users may struggle to understand the characteristics of these systems and how they compare to their file system counterparts.

The primary motivation of this work is to formalize the state and characteristics of these diverse storage systems available on different platforms and to make this information available to PMIx users via the existing query interface defined in the standard (Chapter 7.1). This query ability will help inform storage system users about which storage systems are most suitable for their I/O needs. For instance, knowing general storage system capacity and bandwidth characteristics could inform users on how much data can fit in different storage tiers and at what speed this data can be accessed.

Discussion Items

At a high-level, our integration into the existing query interface just requires the specification of new query attributes for describing information we want to query from storage systems and new query qualifiers for describing additional storage system context needed for handling queries.

To provide more concrete details on what we are proposing, I've defined new and existing query attributes/qualifiers we intend to utilize for describing storage queries below:

New query qualifiers

  • PMIX_STORAGE_ID: a unique ID string (either assigned by administrator or generated by PMIx) to reference a given storage system -- this is the primary way for identifying distinct PMIx storage systems
  • PMIX_STORAGE_PATH: secondary ID method allowing for referencing POSIX-based file systems by mount point for convenience, rather than using PMIX_STORAGE_ID
  • PMIX_STORAGE_TYPE: qualifier to limit the query to a particular storage type (e.g., lustre, DAOS, PFS, burst buffer, …)
  • PMIX_STORAGE_HOST: qualifier to limit the query to a particular storage host
  • PMIX_STORAGE_DEVICE: qualifier to limit the query to a particular storage device

PMIX_STORAGE_ID and PMIX_STORAGE_PATH are the most fundamental qualifiers, used to direct queries at specific storage systems the user is aware of.

We are in the process of trying to better define PMIX_STORAGE_TYPE, but it will most likely be split up into separate qualifiers that are more appropriately scoped (e.g., local or remote storage, file or object storage, etc.)

PMIX_STORAGE_HOST and PMIX_STORAGE_DEVICE are used to answer some queries at higher-granularity. For instance, for Lustre file systems, these qualifiers could restrict queries to a particular OSS and or OST rather than an entire system.

Existing query qualifiers

  • PMIX_USERID: restrict given query to a specific user
  • PMIX_GRPID: restrict given query to a specific group (project)

These two qualifiers will allow some storage system queries to be restricted to certain users and/or projects. For instance, if we wanted to know the available capacity of a given user's home file system rather than the total capacity of the entire mount, we would pass their user ID in PMIX_USERID.

New query attributes

  • PMIX_QUERY_STORAGE_LIST: Comma-delimited list of identifiers for all available storage systems (e.g, “gpfs-mirafs0,lus-thetafs0”)

This query would be used to provide the user with the list of storage systems PMIx is aware of.

  • PMIX_STORAGE_CAPACITY_LIMIT: overall capacity (in MB) of specified storage system
  • PMIX_STORAGE_CAPACITY_USED: used capacity (in MB) of specified storage system
  • PMIX_STORAGE_OBJECT_LIMIT: overall limit on number of objects (e.g., inodes) of specified storage system
  • PMIX_STORAGE_OBJECTS_USED: number of used objects (e.g., inodes) of specified storage system
  • PMIX_STORAGE_XFER_SIZE: optimal transfer size (in KB) of specified storage system

These queries are all inspired by traditional statfs(), but attempt to generalize the concept beyond file-based storage systems to object-based storage systems. They are mainly to inform on total number of resources (storage capacity and number of objects) and number of consumed resources, as well as optimal transfer size for the underlying storage system.

  • PMIX_STORAGE_BW_LIMIT: overall bandwidth limit (in MB/sec) of specified storage system
  • PMIX_STORAGE_BW: overall observed bandwidth (in MB/sec) of specified storage system

These two query attributes are used to indicate a peak bandwidth and a currently observed bandwidth for a given storage system, to inform on system limits as well as current system state.

  • PMIX_STORAGE_ID: identifier of the storage system being referenced
  • PMIX_STORAGE_PATH: Mount point corresponding to a specified storage ID
  • PMIX_STORAGE_TYPE: qualifier to limit the query to a particular storage type (e.g., lustre, DAOS, PFS, burst buffer, …)

These query attributes are just used to map between different qualifiers discussed previously. For instance, we could query a PMIX_STORAGE_ID by providing a PMIX_STORAGE_PATH as a qualifier.

Please let me know if I can provide more details.

@adilger
Copy link

adilger commented Sep 4, 2020

I would also add a PMIX_STORAGE_IOPS_LIMIT and PMIX_STORAGE_IOPS values, since these may be considerably different from the BW numbers. You may consider to name these PMIX_STORAGE_IOPS_{MAX,CUR} or similar? It probably makes sense to add caveats that *_CUR is the current observed value and may be subject to rapid changes, while *_MAX may be an observed limit or a theoretical maximum, but together can be used by applications to make a judgement on the current utilization of the storage system (i.e. either "available or maximum expected throughput" and/or "percent busy").

The intent is to allow applications to verify that the storage is suitable for their needs (e.g. not accidentally writing TB to an NFS share with BW=50MB/s instead of a PFS with BW=50GB/s). In addition, applications with flexibility in their IO scheduling (e.g. whether to make a checkpoint for the current timestep) could use the storage utilization to wait for a relatively idle period.

@shanedsnyder
Copy link
Contributor Author

@adilger, that makes a lot of sense about adding an IOPS equivalent to what we are doing with bandwidth. I'll go ahead and add it and bounce it off others at our meeting next week, but I don't imagine that's a controversial addition.

I think the suggestion to convert to a naming convention of PMIX_STORAGE_{BW,IOPS}_{MAX,CUR} rather than what we currently have is also probably a good idea -- it's a bit more explicitly named and leaves less to the imagination. We'll make sure the actual definitions of the counters make clear that one is for a limit or some theoretical max, while the other is a recently observed value.

One question is whether we should include some sort of qualifier attribute that can be used with the CUR attribute variants that enforces over what time interval the measurement is made or whether we should just pick a reasonable value and force all storage systems to provide the measurement over that duration (i.e., as part of the attribute definition, specify it's for the most recent 5-second interval or 1-minute interval, something like that). Any thoughts? If we do want to allow users to specify the interval, I think we'll need to define an additional attribute for that.

@shanedsnyder
Copy link
Contributor Author

We should also consider defining qualifier attributes specifying access type for BW and IOPs calculations, right? Something like PMIX_STORAGE_ACCESS_TYPE, with possible values PMIX_STORAGE_ACCESS_{RD,WR,RDWR} (with RDWR as default, just providing calculations over both read and write accesses). That was brought up in a previous meeting w.r.t. PMIX_STORAGE_XFER_SIZE (the idea being some systems maybe would have different optimal transfer sizes depending on the access), but obviously it could also be useful to specifically query read or write bandwidths, for instance.

@adilger
Copy link

adilger commented Sep 10, 2020

Yes, separating read and write requests makes sense, since the performance for the two can be quite different.

@jjhursey
Copy link
Member

Storage query support was added as Provisional in v4.1 (#280 and #346). Please reopen or file a new ticket if/when further work is required in this area.

@jjhursey jjhursey added this to the PMIx v4.1 Standard milestone Jan 12, 2023
@jjhursey jjhursey added enhancement RFC Request for Comment labels Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement RFC Request for Comment
Projects
None yet
Development

No branches or pull requests

3 participants