[Accelerator] Cambricon MLU support #6472

Andy666G · 2024-09-02T03:40:51Z

Description

This PR includes Cambricon MLU accelerator support.
With this PR, DeepSpeed supports MLU as backend for training and inference tasks.

accelerator/mlu_accelerator.py

tjruwase · 2024-09-03T12:51:36Z

Please see here to fix formatting issues.

Also, consider updating the following accelerator docs:

Tutorial.
HW list.

loadams · 2024-09-11T23:22:27Z

@Andy666G - curious if you would be able to run the precommit formatter to resolve the formatting errors?

Andy666G · 2024-09-13T03:32:48Z

@Andy666G - curious if you would be able to run the precommit formatter to resolve the formatting errors?

Hi, the formatting errors have been resolved.

tjruwase · 2024-09-26T13:10:18Z

@Andy666G, do you plan to address my feedback regarding documentation in a separate PR?
#6472 (comment)

…ch workflow triggers (#6584) Changes from #6472 caused the no-torch workflow that is an example of how we build the DeepSpeed release package to fail (so we caught this before a release, see more in #6402). These changes also copy the style used to include torch in other accelerator op_builder implementations, such as npu [here](https://github.com/microsoft/DeepSpeed/blob/master/op_builder/npu/fused_adam.py#L8) and hpu [here](https://github.com/microsoft/DeepSpeed/blob/828ddfbbda2482412fffc89f5fcd3b0d0eba9a62/op_builder/hpu/fused_adam.py#L15). This also updates the no-torch workflow to run on all changes to the op_builder directory. The test runs quickly and shouldn't add any additional testing burden there. Resolves: #6576

[Accelerator] Cambricon MLU support

1ad2216

tjruwase reviewed Sep 3, 2024

View reviewed changes

accelerator/mlu_accelerator.py Show resolved Hide resolved

Update real_accelerator for MLU support

2473e9d

tjruwase approved these changes Sep 3, 2024

View reviewed changes

Merge branch 'master' into Cambricon-MLU-accelerator

f7a7d09

Andy666G added 2 commits September 13, 2024 11:27

Update mlu_accelerator.py

ee97f8d

Update fused_adam.py

12c943d

Andy666G and others added 4 commits September 13, 2024 11:34

Merge branch 'master' into Cambricon-MLU-accelerator

08a1276

fix formatting errors

21f83e4

Merge branch 'master' into Cambricon-MLU-accelerator

b0b5d32

Merge branch 'master' into Cambricon-MLU-accelerator

d8d66d0

tjruwase added this pull request to the merge queue Sep 26, 2024

Merged via the queue into microsoft:master with commit 0fbe96a Sep 26, 2024
12 checks passed

loadams mentioned this pull request Sep 27, 2024

Fix torch include in op_builder/mlu/fused_adam.py and update no-torch workflow triggers #6584

Merged

This was referenced Oct 24, 2024

[MLU] remove deepspeed-mlu dependency huggingface/transformers#34362

Open

[MLU] update deepspeed-mlu dependency huggingface/accelerate#3192

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Accelerator] Cambricon MLU support #6472

[Accelerator] Cambricon MLU support #6472

Andy666G commented Sep 2, 2024

tjruwase commented Sep 3, 2024

loadams commented Sep 11, 2024

Andy666G commented Sep 13, 2024

tjruwase commented Sep 26, 2024

[Accelerator] Cambricon MLU support #6472

[Accelerator] Cambricon MLU support #6472

Conversation

Andy666G commented Sep 2, 2024

Description

tjruwase commented Sep 3, 2024

loadams commented Sep 11, 2024

Andy666G commented Sep 13, 2024

tjruwase commented Sep 26, 2024