Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor nccl_ofi_idpool to C++ #830

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AvivBenchorin
Copy link
Contributor

Refactors the nccl_ofi_idpool implementation to C++, creating a nccl_ofi_idpool_t class with std::vector and std::mutex member variables and thread-safe wrappers to interact with the std:vector. Updates all usage of the previous nccl_ofi_idpool C implementation to use the new class.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@AvivBenchorin AvivBenchorin added the enhancement New feature or request label Mar 26, 2025
@AvivBenchorin AvivBenchorin requested a review from a team as a code owner March 26, 2025 22:40
@AvivBenchorin AvivBenchorin marked this pull request as draft March 27, 2025 15:26
@AvivBenchorin AvivBenchorin force-pushed the feature/cpp_migration_idpool branch 2 times, most recently from b131013 to 99fcd75 Compare March 27, 2025 19:59
class nccl_ofi_idpool_t {
public:
/* Disable default constructor to require passing in a size */
nccl_ofi_idpool_t() = delete;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you provide the constructor below, you shouldn't need to explicitly delete the default constructor. I think.

On the other hand, we probably do want to delete the implicit copy constructor and just not have one. I'm not sure I can come up with reasonable copy semantics for an id pool without way overcomplicating the id pool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it and you are correct, I wouldn't need to explicitly delete the default constructor. And makes sense for the implicit copy constructor, I'll delete it along with the copy assignment operator:

nccl_ofi_idpool_t(const nccl_ofi_idpool_t&) = delete;
nccl_ofi_idpool_t& operator=(const nccl_ofi_idpool_t&) = delete;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in force push.

free(idpool->ids);
idpool->ids = NULL;
return ret;
if (size_ % 8 != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw a comment in here about what this is doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in force push, although my comment may need a revision for clarity.


/* Iterate over each of the idpool_ vector elements */
for (size_t i = 0; i < idpool_.size(); i++) {
entry_index = __builtin_ffsll(idpool_.at(i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a vector. I'd just write this and line 53 below as idpool[i].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in force push. I had originally used .at since it has built-in bound-checking, but it would make sense with our current exception handling that we would rather then the bounds ourselves instead of throwing an exception.

rauteric
rauteric previously approved these changes Mar 28, 2025
Refactors the nccl_ofi_idpool implementation to C++, creating a
nccl_ofi_idpool_t class with std::vector and std::mutex member variables
and thread-safe wrappers to interact with the std:vector. Updates all
usage of the previous nccl_ofi_idpool C implementation to use the new
class.

Signed-off-by: Aviv Benchorin <benchori@amazon.com>
@AvivBenchorin AvivBenchorin force-pushed the feature/cpp_migration_idpool branch from 4b0ba16 to e2e92d8 Compare March 31, 2025 20:24
@AvivBenchorin
Copy link
Contributor Author

I rebased the draft PR on top of head of master and addressed previous feedback, moving PR to "ready-for-review".

@AvivBenchorin AvivBenchorin marked this pull request as ready for review March 31, 2025 20:34
#include <mutex>
#include <vector>

#include <unistd.h>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the testing for this PR, I had ran into an issue where ssize_t was not defined on Ubuntu 22.04 without including unistd.h despite it being defined on AL2/AL2023. Given this, would it be appropriate for me to still include errno.h for the error codes and stdint.h for uint64_t in case an OS would not have those definitions by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the ssize_t build failure again here, the difference for whether ssize_t was defined without including unistd.h was not between Ub22 vs AL2/AL2023, and instead was between which compiler and compiler version was used on Ub22 (gcc-latest failed the build, while gcc-legacy/clang-legacy/clang-latest succeeded).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should include cstddef, not unistd.h.

* @return 0 on success
* non-zero on error
*/
int free_id(size_t id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any time that returning non-zero here should be anything other than a catastrophic failure? Wondering if this shouldn't just be void free_id() and throw an exception if something's really bad.

#include <mutex>
#include <vector>

#include <unistd.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should include cstddef, not unistd.h.


/* Return the idpool element value of a valid vector index, return
negative error code on out-of-bound index */
uint64_t get_element(size_t index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of making public interface functions just for testing. I think instead I'd make the variables protected instead of private, and write a class in your test function that inherits from idpool and provides accessors to idpool. That way, you don't have this weird interface that confuses everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants