-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor nccl_ofi_idpool to C++ #830
base: master
Are you sure you want to change the base?
Refactor nccl_ofi_idpool to C++ #830
Conversation
b131013
to
99fcd75
Compare
include/nccl_ofi_idpool.h
Outdated
class nccl_ofi_idpool_t { | ||
public: | ||
/* Disable default constructor to require passing in a size */ | ||
nccl_ofi_idpool_t() = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you provide the constructor below, you shouldn't need to explicitly delete the default constructor. I think.
On the other hand, we probably do want to delete the implicit copy constructor and just not have one. I'm not sure I can come up with reasonable copy semantics for an id pool without way overcomplicating the id pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it and you are correct, I wouldn't need to explicitly delete the default constructor. And makes sense for the implicit copy constructor, I'll delete it along with the copy assignment operator:
nccl_ofi_idpool_t(const nccl_ofi_idpool_t&) = delete;
nccl_ofi_idpool_t& operator=(const nccl_ofi_idpool_t&) = delete;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in force push.
src/nccl_ofi_idpool.cpp
Outdated
free(idpool->ids); | ||
idpool->ids = NULL; | ||
return ret; | ||
if (size_ % 8 != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw a comment in here about what this is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in force push, although my comment may need a revision for clarity.
src/nccl_ofi_idpool.cpp
Outdated
|
||
/* Iterate over each of the idpool_ vector elements */ | ||
for (size_t i = 0; i < idpool_.size(); i++) { | ||
entry_index = __builtin_ffsll(idpool_.at(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a vector. I'd just write this and line 53 below as idpool[i]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in force push. I had originally used .at
since it has built-in bound-checking, but it would make sense with our current exception handling that we would rather then the bounds ourselves instead of throwing an exception.
99fcd75
to
4b0ba16
Compare
Refactors the nccl_ofi_idpool implementation to C++, creating a nccl_ofi_idpool_t class with std::vector and std::mutex member variables and thread-safe wrappers to interact with the std:vector. Updates all usage of the previous nccl_ofi_idpool C implementation to use the new class. Signed-off-by: Aviv Benchorin <benchori@amazon.com>
4b0ba16
to
e2e92d8
Compare
I rebased the draft PR on top of head of master and addressed previous feedback, moving PR to "ready-for-review". |
#include <mutex> | ||
#include <vector> | ||
|
||
#include <unistd.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the testing for this PR, I had ran into an issue where ssize_t
was not defined on Ubuntu 22.04 without including unistd.h
despite it being defined on AL2/AL2023. Given this, would it be appropriate for me to still include errno.h
for the error codes and stdint.h
for uint64_t
in case an OS would not have those definitions by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the ssize_t
build failure again here, the difference for whether ssize_t
was defined without including unistd.h
was not between Ub22 vs AL2/AL2023, and instead was between which compiler and compiler version was used on Ub22 (gcc-latest
failed the build, while gcc-legacy
/clang-legacy
/clang-latest
succeeded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should include cstddef
, not unistd.h
.
* @return 0 on success | ||
* non-zero on error | ||
*/ | ||
int free_id(size_t id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any time that returning non-zero here should be anything other than a catastrophic failure? Wondering if this shouldn't just be void free_id()
and throw an exception if something's really bad.
#include <mutex> | ||
#include <vector> | ||
|
||
#include <unistd.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should include cstddef
, not unistd.h
.
|
||
/* Return the idpool element value of a valid vector index, return | ||
negative error code on out-of-bound index */ | ||
uint64_t get_element(size_t index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of making public interface functions just for testing. I think instead I'd make the variables protected instead of private, and write a class in your test function that inherits from idpool and provides accessors to idpool. That way, you don't have this weird interface that confuses everyone.
Refactors the nccl_ofi_idpool implementation to C++, creating a nccl_ofi_idpool_t class with std::vector and std::mutex member variables and thread-safe wrappers to interact with the std:vector. Updates all usage of the previous nccl_ofi_idpool C implementation to use the new class.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.