params partition for skip_init #4722

inkcherry · 2023-11-23T14:59:12Z

Some models use skip_init to initialize weights. skip_init first initializes on a meta device in __init__ of a module and then uses to_empty(). This conflicts with the deepspeed hook module.__init__ mechanism. it's necessary to wait for skip_init to finish before executing _post_init_method. However, the from ... import skip_init behavior typically occurs outside the context, there seems to be no good way to directly hook into skip_init. Hence, the approach here is to delay the execution of _post_init_method to resolve this issue.
Known affected models include HuggingFace models like chatglm2 and chatglm3."

tohtana · 2024-01-08T17:01:08Z

Hi @inkcherry,
Just for clarification, it seems that _post_init_method runs after the __init__ of the top-level module. Is this correct?
If so, can we reduce the peak memory footprint of parameters on initialization?

inkcherry · 2024-01-09T13:55:36Z

Hi @inkcherry, Just for clarification, it seems that _post_init_method runs after the __init__ of the top-level module. Is this correct? If so, can we reduce the peak memory footprint of parameters on initialization?

Thanks for the review. @tohtana
It prioritizing the next non-meta module initialization completion(prioritizing a peer module below, and if None, its container module ).
I think this is ok because the module order of entering and exiting the post_init_module remains unchanged, but anyway. peak memory usage might be higher because module(which may contains several layers) of skip_init call need to be fully placed on one rank and then partition. To reduce memory(after child layer initialization is completed on real device, partition them), I think some hook and restore is necessary during the skip_init lifespan. I think I could try to maintain the code within a condition scope . What do you think.

tohtana · 2024-01-09T20:18:52Z

Thank you for the clarification, @inkcherry.

I am wondering what happens if almost all modules are declared with skip_init. In this case, partitioning with zero.Init() won't work and we need the host memory of the size |all parameters| * |number of local GPUs (processes)| on a server, right?

I understand that it is difficult to set a hook in skip_init. But can we set one after skip_init as another approach?

inkcherry · 2024-01-12T09:39:45Z

@tohtana ， Thanks for your suggestion!
Yes, if the device is not set to a GPU, the host memory behaves like this.
I concern that adding logic after skip_init may also result in high memory usage(Before your reminder, I might have been more focused on functionality), For example, if this module includes a ModuleList of all Transformer blocks, essentially containing almost all parameters of the model, we also need num_processes * all_transformer_blocks_parameters memory on host with to_empty() call.

currently, I have implemented a hook, without encountering such memory issues[after the child module initialization is completed by _apply(empty_like) in to_empty, split it], and it provides better functionality. I decoupled it into a separate function.

I tested full-parameters finetuning with chatglm2 6B zero3, without using skip_init and this patch, and compared it with using the default skip_init of the model with this patch. The loss is exactly the same for the first 50 steps.

inkcherry · 2024-01-16T12:49:23Z

@tohtana I just make some changes ,could you please take a look, thanks!

tohtana · 2024-01-16T17:46:31Z

Thank you @inkcherry, the hook you implemented should work. This is an intricate and refined work!
It is good to me to merge after this change passes the tests.

inkcherry · 2024-01-18T02:02:10Z

@tohtana The CI has all passed. Just a reminder in case you missed it。
also, thanks for the internal help of @delock @guoyejun

tohtana · 2024-01-18T18:50:34Z

@inkcherry This PR was merged. Thank you for your great contribution!

Some models use ```skip_init``` to initialize weights. ```skip_init``` first initializes on a meta device in ```__init__``` of a module and then uses ```to_empty()```. This conflicts with the deepspeed hook ```module.__init__``` mechanism. it's necessary to wait for ```skip_init``` to finish before executing ```_post_init_method```. However, the ```from ... import skip_init``` behavior typically occurs outside the context, there seems to be no good way to directly hook into ```skip_init```. Hence, the approach here is to delay the execution of ```_post_init_method``` to resolve this issue. Known affected models include HuggingFace models like chatglm2 and chatglm3." --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>

inkcherry added 2 commits November 23, 2023 14:42

params partition for skip_init

ba8b1db

remove empty line

9c64155

inkcherry requested review from jeffra, tjruwase and mrwyattii as code owners November 23, 2023 14:59

inkcherry and others added 2 commits November 23, 2023 15:15

update

43d4eea

Merge branch 'master' into skip_init

008312b

tjruwase removed the request for review from jeffra January 5, 2024 20:57

Merge branch 'master' into skip_init

56d9c47

inkcherry requested a review from loadams as a code owner January 5, 2024 20:57

tjruwase assigned tohtana Jan 5, 2024

tjruwase requested review from tohtana and removed request for mrwyattii and loadams January 5, 2024 20:58

hook impl

1c68366

Merge branch 'master' into skip_init

41be8a9

tohtana approved these changes Jan 16, 2024

View reviewed changes

Merge branch 'master' into skip_init

bff007b

tjruwase added this pull request to the merge queue Jan 18, 2024

Merged via the queue into microsoft:master with commit 3110c38 Jan 18, 2024
12 checks passed

delock mentioned this pull request Sep 20, 2024

[TRACKER] Customer support related PR tracker for Intel devices #6556

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

params partition for skip_init #4722

params partition for skip_init #4722

inkcherry commented Nov 23, 2023

tohtana commented Jan 8, 2024

inkcherry commented Jan 9, 2024 •

edited

Loading

tohtana commented Jan 9, 2024

inkcherry commented Jan 12, 2024 •

edited

Loading

inkcherry commented Jan 16, 2024 •

edited

Loading

tohtana commented Jan 16, 2024

inkcherry commented Jan 18, 2024 •

edited

Loading

tohtana commented Jan 18, 2024

params partition for skip_init #4722

params partition for skip_init #4722

Conversation

inkcherry commented Nov 23, 2023

tohtana commented Jan 8, 2024

inkcherry commented Jan 9, 2024 • edited Loading

tohtana commented Jan 9, 2024

inkcherry commented Jan 12, 2024 • edited Loading

inkcherry commented Jan 16, 2024 • edited Loading

tohtana commented Jan 16, 2024

inkcherry commented Jan 18, 2024 • edited Loading

tohtana commented Jan 18, 2024

inkcherry commented Jan 9, 2024 •

edited

Loading

inkcherry commented Jan 12, 2024 •

edited

Loading

inkcherry commented Jan 16, 2024 •

edited

Loading

inkcherry commented Jan 18, 2024 •

edited

Loading