-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI/TCP/FS for NCCL-init #632
Conversation
I tried to use NCCL instead of MPI but realized there are some hard dependencies:
So as mentioned above we're left with MPI, filesysystem sync, TCP. Will test the latter 2 tomorrow. |
The PR is tested and ready @karpathy - comments / feedback welcome! I personally find TCP setup the most useful since I don't have the shared filesystem. The only variable that one needs to set is a single IP address, so fairly simple to run. |
7907176
to
fbf4f59
Compare
NOTE: Sockets are currently not cross-platform that's why Windows test is failing. Linux & Mac are supported. So we either have to tweak it a bit to make it x-platform OR we remove support on Windows OR we remove sockets altogether. |
printf("Failed to send nccl_id"); | ||
exit(EXIT_FAILURE); | ||
} | ||
close(client_sockets[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closeCheck?
// Step 6) accept connections from clients | ||
printf("Waiting for clients to connect...\n"); | ||
while (num_clients < MAX_CLIENTS) { | ||
if ((new_socket = accept(server_socket, (struct sockaddr *)&address, (socklen_t*)&addrlen)) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is addrlen of type int
and not socklen_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have a satisfying answer other than it's a pattern i've observed people use everywhere, see:
- https://gist.github.com/SkrewEverything/2c535e83a3a7b8e5b7aa490009a87fbb
- https://stackoverflow.com/questions/61106749/how-to-read-data-from-socket-correctly
- your favorite LLM
it is casted to socklen_t
before calling accept
.
train_gpt2.cu
Outdated
int gpus_per_node = 8; // this should be set by the slurm environment | ||
char nccl_init_method[256] = "mpi"; // "tcp" or "fs" or "mpi" | ||
char server_ip[256] = "-1"; // used if init_method set to "tcp" -> set to your server ip address | ||
char fs_path[256] = "/tmp"; // used if init_method set to "fs" -> set to a shared filesystem path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't /tmp usually be a path local to the current node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of agree, in a previous PR it was /dfs
or so, worried that a person might get a wrong impression of this flag if they see its default as /tmp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to "" (empty) as it doesn't matter since the init method is set to "mpi" by default. This var is used only when init is set to "fs".
llmc/zero.cuh
Outdated
@@ -193,6 +212,126 @@ ncclUniqueId get_nccl_id_via_tcp(MultiGpuConfig* result, const char* server_ip) | |||
return nccl_id; | |||
} | |||
|
|||
#ifdef _WIN32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any way to put this stuff into unistd.h, which is our windows-specific code? I think it would need a tiny refactor so that the needed imports are minimal, e.g. maybe not passing in a MultiGPUConfig but its members more directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can do that in the follow-up PR if you're ok with that? e.g. @rosslwheeler has a windows setup ready and he said he can pick it up
4e7980a
to
2667aa1
Compare
Instead of mixing NCCL & Open MPI during training: let's transition to using only NCCL. To the best of my knowledge there are no downsides here, they're equivalent and speedwise i couldn't observe any difference.
By doing this we scope down our MPI dependence to
multi_gpu_config_init
.Context: why are we trying to reduce dependence on Open MPI?
slurm-wlm
package - which is the easiest and thus likely the most frequent way folks setup their slurm clusters - dropped PMIx support which means we can't use slurm in a multi node setup together with Open MPI.Regarding
multi_gpu_config_init
we have a few options as far as i can tell:So: