Skip to content

Commit 6250b0d

Browse files
committedJul 15, 2024
[Cleanup] Remove libuv from run_llama_train.sh
libuv is now enabled by default. we can proably do without the educational blurb there, and don't need the env either since the default has landed. ghstack-source-id: 68c8d2abe7eb0777e2add8df7634367c31b7ec06 Pull Request resolved: pytorch#453
1 parent c26e5b3 commit 6250b0d

File tree

3 files changed

+0
-5
lines changed

3 files changed

+0
-5
lines changed
 

‎create_seed_checkpoint.sh

-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@
1818

1919
set -ex
2020

21-
export USE_LIBUV=1
2221
TRAINER_DIR=${1:-/home/$USER/local/torchtitan}
2322
NGPU=1
2423
LOG_RANK=0

‎multinode_trainer.slurm

-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,6 @@ export NCCL_SOCKET_IFNAME="eth0,en,eth,em,bond"
5353
export NCCL_BUFFSIZE=2097152
5454
#export TORCH_DIST_INIT_BARRIER=1
5555
export FI_EFA_SET_CUDA_SYNC_MEMOPS=0
56-
#export USE_LIBUV=1
5756
CONFIG_FILE=${CONFIG_FILE:-"./train_configs/llama2_13b.toml"}
5857

5958
dcgmi profile --pause

‎run_llama_train.sh

-3
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,6 @@
77

88
set -ex
99

10-
# libUV is a scalable backend for TCPStore which is used in processGroup
11-
# rendezvous. This is the recommended backend for distributed training.
12-
export USE_LIBUV=1
1310
TRAINER_DIR=${TRAINER_DIR:-/home/$USER/local/torchtitan}
1411

1512
# use envs as local overrides for convenience

0 commit comments

Comments
 (0)