-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocm docker and test scripts #5
rocm docker and test scripts #5
Conversation
26f78e6
to
c4f0b5f
Compare
c4f0b5f
to
aeed2cd
Compare
Dockerfile_amd
Outdated
############################################################################## | ||
# Apex | ||
############################################################################## | ||
#RUN git clone https://github.com/ROCm/apex.git ${STAGE_DIR}/apex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this code, if its not needed.
train_llama.sh
Outdated
TE_FP16="${TE_FP16:-1}" | ||
|
||
|
||
export CUDA_DEVICE_MAX_CONNECTIONS=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this one on ROCm?
train_llama.sh
Outdated
wget -O $TOKENIZER_MODEL https://huggingface.co/NousResearch/Llama-2-7b-chat-hf/resolve/main/tokenizer.model | ||
fi | ||
|
||
# Prepare the dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seperate out the dataset preparation into a different script, so that we dont' need to download it everytime or make some changes to this script. We can use it as is without any modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that once the dataset is downloaded, it will not need to download it again when we rerun the script.
bdf4c6b
to
6dbde6a
Compare
Dockerfile_amd
Outdated
@@ -0,0 +1,84 @@ | |||
ARG BASE_DOCKER=rocm/pytorch:latest | |||
#ARG BASE_DOCKER=rocm/pytorch-private:exec_dashboard_nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the commented out lines.
Dockerfile_amd
Outdated
WORKDIR $WORKSPACE_DIR | ||
RUN git clone https://github.com/ROCm/Megatron-LM.git Megatron-LM &&\ | ||
cd Megatron-LM &&\ | ||
git checkout rocm_megatron_lm_upstream &&\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we will use rocm_dev
as the main branch.
train_llama.sh
Outdated
SEQ_PARALLEL="${SEQ_PARALLEL:-1}" | ||
CONTI_PARAMS="${CONTI_PARAMS:-0}" | ||
OPTIMIZER="${OPTIMIZER:-sgd}" | ||
TE_FP16="${TE_FP16:-1}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name TE_FP16
is confusing because I think it is actually using bf16.
train_llama.sh
Outdated
|
||
|
||
# Change for multinode config | ||
MASTER_ADDR=localhost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we intend to use this script for single-node only or do we also want to use it for multi-node? If latter, we'd better make those multi-node related options also be able to be specified from the command line.
@gurpreet-dhami Put these scripts under examples/llama folder similarly how other workloads are arranged and create a README.md on how to create dataset, and add the script there. |
ec8bfa6
to
4ec12e4
Compare
4ec12e4
to
77113cc
Compare
examples/llama2/train_llama2.sh
Outdated
--eval-iters -1 | ||
" | ||
|
||
# --save-interval $TOTAL_ITERS \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the commented lines.
examples/llama2/train_llama2.sh
Outdated
--no-masked-softmax-fusion \ | ||
--overlap-grad-reduce \ | ||
" | ||
# --no-masked-softmax-fusion \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the commented lines.
examples/llama2/train_llama2.sh
Outdated
|
||
MEAN_LOG_SCRIPT=examples/llama2/mean_log_value.py | ||
TMP_FILE=${TMP_DIR}/tmp.txt | ||
# echo '============================================================================================================' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the commented lines.
examples/llama2/train_llama2.sh
Outdated
echo "throughput per GPU (TFLOPs/GPU): ${THROUGHPUT}" | ||
rm $TMP_FILE | ||
|
||
# echo '============================================================================================================' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the commented lines.
No description provided.