Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ascend NPU accelerator support #1676

Merged
merged 4 commits into from
Jul 12, 2023
Merged

Conversation

statelesshz
Copy link
Contributor

@statelesshz statelesshz commented Jul 2, 2023

What this PR do?

According to the review of the previous PR(see), if i want to use Ascend NPUs to train 🤗 Transformers models, the support should be added in Accelerate first and then will come in the Trainer for free.
This PR will support Ascend NPU accelerator:

  1. Sample config after running the accelerate config command:
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: multi-NPU
downcast_bf16: 'no'
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 8
use_cpu: false
  1. run nlp_example.py with NPUs.
time accelerate launch  nlp_example.py

compare with A100

Device Training+Evaluation Time(seconds) Accuracy post training %
NPUs(8-cards) 67s 77.70
A100-80G(8-cards) 55s 75.74

Below are the output logs:

Found cached dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 712.59it/s]
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-4000ef9af4a1aaa0.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-31fde3331ae1cb26.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-f2650fc966d327c7.arrow
Found cached dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 687.29it/s]
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-4000ef9af4a1aaa0.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-31fde3331ae1cb26.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-f2650fc966d327c7.arrow
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[W LegacyTypeDispatch.h:79] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
[W LegacyTypeDispatch.h:79] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
epoch 0: {'accuracy': 0.6862745098039216, 'f1': 0.8134110787172011}
epoch 1: {'accuracy': 0.7598039215686274, 'f1': 0.8382838283828382}
epoch 2: {'accuracy': 0.7769607843137255, 'f1': 0.848585690515807}

real	1m7.440s
user	5m23.423s
sys	0m58.491
  1. about Ascend NPU
    Ascend NPU is a AI processor that support AI frameworks like PyTorch, TensorFlow, etc. So, i think its possible run Transformers/Accelerate on NPUs to train foundation model. Their website: https://www.hiascend.com/en/

@statelesshz statelesshz changed the title [wip]Add Ascend NPu accelerator support Add Ascend NPu accelerator support Jul 2, 2023
@statelesshz statelesshz changed the title Add Ascend NPu accelerator support [wip]Add Ascend NPu accelerator support Jul 2, 2023
@statelesshz statelesshz changed the title [wip]Add Ascend NPu accelerator support [wip]Add Ascend NPU accelerator support Jul 2, 2023
@statelesshz statelesshz closed this Jul 2, 2023
@statelesshz statelesshz reopened this Jul 3, 2023
@statelesshz statelesshz closed this Jul 3, 2023
@statelesshz statelesshz changed the title [wip]Add Ascend NPU accelerator support Add Ascend NPU accelerator support Jul 4, 2023
@statelesshz statelesshz reopened this Jul 4, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 4, 2023

The documentation is not available anymore as the PR was closed or merged.

@statelesshz
Copy link
Contributor Author

@sgugger Good day. Could you please review this PR

@muellerzr muellerzr self-requested a review July 5, 2023 14:30
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! @muellerzr can you also have a second look and make sure all slow tests pass as well (we don't have a way to test on NPUs but want to make sure this doesn't break existing stuff).

@muellerzr
Copy link
Collaborator

muellerzr commented Jul 11, 2023

@statelesshz can you solve the merge conflict please? :) Otherwise I'm running through the slow tests now, if those all pass and the merge conflict is resolved we're good! ✔️

Edit: can confirm that the tests pass, so let's go ahead and fix that merge conflict great job!

@statelesshz statelesshz force-pushed the npu-support branch 2 times, most recently from 29b4ca4 to bed1578 Compare July 12, 2023 03:59
@statelesshz
Copy link
Contributor Author

statelesshz commented Jul 12, 2023

@statelesshz can you solve the merge conflict please? :) Otherwise I'm running through the slow tests now, if those all pass and the merge conflict is resolved we're good! ✔️

Edit: can confirm that the tests pass, so let's go ahead and fix that merge conflict great job!

@muellerzr Thanks for your reply, I have rebased my commits to master HEAD to resolve merge conflict

@statelesshz statelesshz reopened this Jul 12, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 12, 2023

The documentation is not available anymore as the PR was closed or merged.

@muellerzr
Copy link
Collaborator

Great work! Thanks!

@muellerzr muellerzr merged commit c33adec into huggingface:main Jul 12, 2023
49 checks passed
@statelesshz statelesshz deleted the npu-support branch September 18, 2023 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants