Add Ascend NPU accelerator support #1676

statelesshz · 2023-07-02T07:22:03Z

What this PR do?

According to the review of the previous PR(see), if i want to use Ascend NPUs to train 🤗 Transformers models, the support should be added in Accelerate first and then will come in the Trainer for free.
This PR will support Ascend NPU accelerator：

Sample config after running the accelerate config command:

compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: multi-NPU
downcast_bf16: 'no'
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 8
use_cpu: false

run nlp_example.py with NPUs.

time accelerate launch  nlp_example.py

compare with A100

Device	Training+Evaluation Time(seconds)	Accuracy post training %
NPUs(8-cards)	67s	77.70
A100-80G(8-cards)	55s	75.74

Below are the output logs:

Found cached dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 712.59it/s]
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-4000ef9af4a1aaa0.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-31fde3331ae1cb26.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-f2650fc966d327c7.arrow
Found cached dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 687.29it/s]
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-4000ef9af4a1aaa0.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-31fde3331ae1cb26.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-f2650fc966d327c7.arrow
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[W LegacyTypeDispatch.h:79] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
[W LegacyTypeDispatch.h:79] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
epoch 0: {'accuracy': 0.6862745098039216, 'f1': 0.8134110787172011}
epoch 1: {'accuracy': 0.7598039215686274, 'f1': 0.8382838283828382}
epoch 2: {'accuracy': 0.7769607843137255, 'f1': 0.848585690515807}

real	1m7.440s
user	5m23.423s
sys	0m58.491

about Ascend NPU
Ascend NPU is a AI processor that support AI frameworks like PyTorch, TensorFlow, etc. So, i think its possible run Transformers/Accelerate on NPUs to train foundation model. Their website: https://www.hiascend.com/en/

HuggingFaceDocBuilderDev · 2023-07-04T01:09:43Z

The documentation is not available anymore as the PR was closed or merged.

statelesshz · 2023-07-04T01:26:22Z

@sgugger Good day. Could you please review this PR

sgugger

This is looking great! @muellerzr can you also have a second look and make sure all slow tests pass as well (we don't have a way to test on NPUs but want to make sure this doesn't break existing stuff).

muellerzr · 2023-07-11T17:47:27Z

@statelesshz can you solve the merge conflict please? :) Otherwise I'm running through the slow tests now, if those all pass and the merge conflict is resolved we're good! ✔️

Edit: can confirm that the tests pass, so let's go ahead and fix that merge conflict great job!

statelesshz · 2023-07-12T04:49:33Z

@statelesshz can you solve the merge conflict please? :) Otherwise I'm running through the slow tests now, if those all pass and the merge conflict is resolved we're good! ✔️

Edit: can confirm that the tests pass, so let's go ahead and fix that merge conflict great job!

@muellerzr Thanks for your reply, I have rebased my commits to master HEAD to resolve merge conflict

HuggingFaceDocBuilderDev · 2023-07-12T04:56:06Z

The documentation is not available anymore as the PR was closed or merged.

muellerzr · 2023-07-12T12:42:59Z

Great work! Thanks!

statelesshz changed the title ~~[wip]Add Ascend NPu accelerator support~~ Add Ascend NPu accelerator support Jul 2, 2023

statelesshz changed the title ~~Add Ascend NPu accelerator support~~ [wip]Add Ascend NPu accelerator support Jul 2, 2023

statelesshz changed the title ~~[wip]Add Ascend NPu accelerator support~~ [wip]Add Ascend NPU accelerator support Jul 2, 2023

statelesshz closed this Jul 2, 2023

statelesshz reopened this Jul 3, 2023

statelesshz closed this Jul 3, 2023

statelesshz changed the title ~~[wip]Add Ascend NPU accelerator support~~ Add Ascend NPU accelerator support Jul 4, 2023

statelesshz reopened this Jul 4, 2023

muellerzr self-requested a review July 5, 2023 14:30

sgugger approved these changes Jul 11, 2023

View reviewed changes

statelesshz and others added 3 commits July 12, 2023 11:46

add Ascend NPU accelerator support

c35f5f9

fix code styles

5a65c65

enable accelerate test on npu

3cb2c3c

statelesshz force-pushed the npu-support branch 2 times, most recently from 29b4ca4 to bed1578 Compare July 12, 2023 03:59

fix typo&code styles

dcb27c0

statelesshz force-pushed the npu-support branch from bed1578 to dcb27c0 Compare July 12, 2023 04:44

statelesshz closed this Jul 12, 2023

statelesshz reopened this Jul 12, 2023

muellerzr merged commit c33adec into huggingface:main Jul 12, 2023
49 checks passed

statelesshz mentioned this pull request Jul 18, 2023

add ascend npu accelerator support huggingface/transformers#24879

Merged

statelesshz deleted the npu-support branch September 18, 2023 03:24

statelesshz mentioned this pull request May 25, 2024

[HFLM]Add support for Ascend NPU EleutherAI/lm-evaluation-harness#1886

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ascend NPU accelerator support #1676

Add Ascend NPU accelerator support #1676

statelesshz commented Jul 2, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 4, 2023 •

edited

Loading

statelesshz commented Jul 4, 2023

sgugger left a comment

muellerzr commented Jul 11, 2023 •

edited

Loading

statelesshz commented Jul 12, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 12, 2023 •

edited

Loading

muellerzr commented Jul 12, 2023

Add Ascend NPU accelerator support #1676

Add Ascend NPU accelerator support #1676

Conversation

statelesshz commented Jul 2, 2023 • edited Loading

What this PR do?

HuggingFaceDocBuilderDev commented Jul 4, 2023 • edited Loading

statelesshz commented Jul 4, 2023

sgugger left a comment

Choose a reason for hiding this comment

muellerzr commented Jul 11, 2023 • edited Loading

statelesshz commented Jul 12, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Jul 12, 2023 • edited Loading

muellerzr commented Jul 12, 2023

statelesshz commented Jul 2, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 4, 2023 •

edited

Loading

muellerzr commented Jul 11, 2023 •

edited

Loading

statelesshz commented Jul 12, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 12, 2023 •

edited

Loading