[Feature Request] Support deepspeed integration #627

nijkah · 2022-10-18T06:07:52Z

Describe the feature

Motivation
Nowadays, deepspeed became a fundamental framework that facilitates training and inference for large-scale or foundation models.
We are developing a feature for deepspeed integration into mmengine with support for a deepspeed-specified runner and optim_wrapper.

Does MMEngine have a plan to support deepspeed?
Then we can contribute to MMEngine with our implementation :)

Please let me know any guide, plan or opinion about this. :)

C1rN09 · 2022-10-18T08:39:49Z

Hi, @nijkah We welcome any kind of contribution, and deepspeed integration is definitely what we desire!
However, could you make it clearer about "deepspeed-specified runner and optim_wrapper"? If you are going to write a new runner that only serves deepspeed models, it seems not quite reasonable and we might need more discussion on it ^_^

C1rN09 · 2022-11-02T11:38:56Z

Hi, @nijkah Have you got any new progress on deepspeed integration? Hope we can discuss on it before you post a PR because it might not be a small & easy one. If you have any ideas/problems/progress, we are always open to have a discussion, either in this issue, or our discussion board.

nijkah · 2022-11-04T01:31:34Z

Hi, @C1rN09. Our integration development is almost done although there are still several choices left to consider.

Our current implementation supports

Enable ZeRO1, ZeRO2, ZeRO3
Saving a monolithic weight logic (DeepSpeed saves its model weights and optimizer's state in separate files, and the number of saved files are multiplied by the world_size.)

doesn't support yet

FP16 (There is a method to support it! But the solution is quite messy.)
Mixture of Experts
Pipeline Parallelism (It requires the logic to sequentialize MM models.

There are several reasons why we try to write a new deepspeed-dedicated runner.
Although we try to follow most of mmengine's Runner logic, there should be some modifications to support deepspeed.

Main logic of DeepSpeedRunner is like below,

        >>> self.model = self.build_model(model)
        >>> self.optim_wrapper = self.build_optim_wrapper(optim_wrapper)
        >>> ds_config = json.load(open(cfg.deepspeed_config))
        >>> self.model, optimizer = deepspeed.initialize(
        >>>    model=self.model,
        >>>    optimizer=self.optim_wrapper.optimizer,
        >>>    model_parameters=self.model.parameters(),
        >>>    config=ds_config)
        >>> self.optim_wrapper.optimizer = optimizer
        >>> self.inject_base_model_methods()

First, the order of logic should be changed when using deepspeed. There was a similar modification in your FSDP PR. It may be ignored in the future.
And, to use deepspeed, it seems better to use DeepSpeedEngine's inner logic for optimizers. Then we should give the optimizer variable to deepspeed.initialize or DeepSpeedEngine.

Moreover, DeepSpeedEngine requires users to update parameters by engine.step() which includes optimizer.step and related logic. It made us write a new class for DeepSpeedOptimWrapper.

I think it is better to share our prototype code when we are ready instead of explaining by writing.
We can share the link of our repo containing the code before posting the PR.

mm-assistant bot assigned zhouzaida Oct 18, 2022

zhouzaida assigned C1rN09 and unassigned zhouzaida Oct 18, 2022

C1rN09 added the planned feature label Oct 18, 2022

nijkah mentioned this issue Nov 21, 2022

[Feature] Add ApexOptimWrapper #742

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support deepspeed integration #627

[Feature Request] Support deepspeed integration #627

nijkah commented Oct 18, 2022

C1rN09 commented Oct 18, 2022

C1rN09 commented Nov 2, 2022

nijkah commented Nov 4, 2022

[Feature Request] Support deepspeed integration #627

[Feature Request] Support deepspeed integration #627

Comments

nijkah commented Oct 18, 2022

C1rN09 commented Oct 18, 2022

C1rN09 commented Nov 2, 2022

nijkah commented Nov 4, 2022