[BUG] Unused params lead to "still have inflight params" error #4094

tiwargau · 2023-08-04T23:44:46Z

Bug description
Context: Running inference on a multi-modal LLM , at each decoding step parts of the network are used and depends on the input modality at each step. In my second step, deepspeed goes ahead and fetches part of the network that ends up not being used. The code does assume that this can happen and correctly invalidates the trace. However, for the params that were prefetched but never used, at the end of the step, these are detected as in-flight and result in the RuntimeError(f"still have inflight params").

To Reproduce
My setup is a bit involved. I am thinking it is clear from the description what the issue is. However, if the team feels like they can benefit from a simple reproduction, I can work on creating one. Please let me know.

Expected behavior
I would have expected that when we notice the order of params isn't the same as before, it would be reasonable to also not demand that all the parameters be used. Right now, we tolerate different ordering but require that all the params previously used (hence prefetched) need to be used at some point.

ds_report output

Setting ds_accelerator to cuda (auto detect)--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.6'
DeepSpeed general environment info:
torch install path ............... ['/home/ec2-user/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/torch']
torch version .................... 1.13.0
deepspeed install path ........... ['/home/ec2-user/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.10.0, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

System info (please complete the following information):

OS: AL2 (Amazon Linux) 5.10.149-133.644.amzn2.x86_64 #1 SMP Tue Oct 18 16:52:42 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
GPU count and types: p3.16xlarge instance from aws, 8 V100 with 16 GB per device
Deepspeed version: 0.10.0
(if applicable) Hugging Face Transformers/Accelerate/etc. versions: transformers: 4.29.1 accelerate 0.21.0
Python version: 3.9.15

The text was updated successfully, but these errors were encountered:

alexwangmac · 2023-08-09T09:20:45Z

Have you solved the problem? My situation is exactly the same as yours.

tiwargau · 2023-08-10T18:48:00Z

Hi @alexwangmac I haven't really solved this problem, just worked around it with setting "stage3_prefetch_bucket_size": 0. This is not an ideal solution as you lose the efficiency.

Hoping deepspeed team can help with this soon.

hatrexltd · 2023-08-27T22:26:10Z

Same

haixpham · 2023-12-01T11:08:17Z

Hi @alexwangmac I haven't really solved this problem, just worked around it with setting "stage3_prefetch_bucket_size": 0. This is not an ideal solution as you lose the efficiency.

Hoping deepspeed team can help with this soon.

I ran into the same problem and your fix worked!
Indeed the problem arises if not all model params are used during inference.

siddk · 2023-12-18T21:42:50Z

Any update on this? Running into the same issue when I have unused parameters for a given forward pass!

haixpham · 2023-12-19T09:17:25Z

Any update on this? Running into the same issue when I have unused parameters for a given forward pass!

In the config json, set "stage3_prefetch_bucket_size": 0, that should work

andre-bauer · 2024-01-03T08:55:54Z

In the config json, set "stage3_prefetch_bucket_size": 0, that should work

While this might "work" this still not solves the problem for example with mixtral, since this kind of MoE does not properly work with deepspeed. Also I tried to use mixtral on a multi GPU setup and instead of getting this error message the process just hangs infinitely, most likely because parameters are fetched but not used and thus not released. Even with prefetch_bucket_size=0

BBerabi · 2024-01-11T12:40:28Z

In the config json, set "stage3_prefetch_bucket_size": 0, that should work

While this might "work" this still not solves the problem for example with mixtral, since this kind of MoE does not properly work with deepspeed. Also I tried to use mixtral on a multi GPU setup and instead of getting this error message the process just hangs infinitely, most likely because parameters are fetched but not used and thus not released. Even with prefetch_bucket_size=0

I have exactly the same issue, when will Mixtral support be added to deepspeed?

tohtana · 2024-01-16T22:14:04Z

(I posted a similar comment on #4808)
I will investigate this issue, but you can use DeepSpeed-FastGen (DeepSpeed-MII) for text generation. The example is available here. I verified that Mixtral works just by modifying the model name.
It is easier to use "non-persistent" mode for testing purpose, but "persistent" mode will give you the best performance. Please refer to DeepSpeed-MII for more details.

…oks (#4966) ZeRO3 does not work with MoE models because the order of executing modules can change at every forward/backward pass (#4094, #4808). This PR adds an API to stop breaking down a module for parameter fetching. The following shows an example of the usage: ```python import torch import deepspeed import deepspeed.comm as dist from transformers.deepspeed import HfDeepSpeedConfig from transformers import AutoTokenizer, AutoModelForCausalLM from transformers.models.mixtral.modeling_mixtral import MixtralSparseMoeBlock model_id = "mistralai/Mixtral-8x7B-v0.1" ds_config = { "bf16": { "enabled": True, }, "zero_optimization": { "stage": 3, }, "train_micro_batch_size_per_gpu": 1, } hfdsc = HfDeepSpeedConfig(ds_config) tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16) deepspeed.utils.set_z3_leaf_modules(model, [MixtralSparseMoeBlock]) model.eval() ds_engine = deepspeed.initialize(model=model, config_params=ds_config)[0] ds_engine.module.eval() model = ds_engine.module inputs = tokenizer.encode("DeepSpeed is", return_tensors="pt").to("cuda") outputs = model.generate(inputs, max_new_tokens=200) output_str = tokenizer.decode(outputs[0]) if dist.get_rank() == 0: print(f"output: {output_str}") ``` By passing names of modules to `set_z3_leaf_modules`, DeepSpeed engine stops breaking down the module. In this example, `MixtralSparseMoeBlock` has multiple experts as its submodule. Using `set_z3_leaf_modules`, the DeepSpeed engine fetches parameters of all the submodules when pre-fetching the parameters of `MixtralSparseMoeBlock`.

tohtana · 2024-01-19T17:40:41Z

Hi everyone,
#4966 should have fixed this issue. You can find working example there.
The PR was already merged into master. Please feel free to try, but I still recommend using DeepSpeed-FastGen for text generation.

…oks (deepspeedai#4966) ZeRO3 does not work with MoE models because the order of executing modules can change at every forward/backward pass (deepspeedai#4094, deepspeedai#4808). This PR adds an API to stop breaking down a module for parameter fetching. The following shows an example of the usage: ```python import torch import deepspeed import deepspeed.comm as dist from transformers.deepspeed import HfDeepSpeedConfig from transformers import AutoTokenizer, AutoModelForCausalLM from transformers.models.mixtral.modeling_mixtral import MixtralSparseMoeBlock model_id = "mistralai/Mixtral-8x7B-v0.1" ds_config = { "bf16": { "enabled": True, }, "zero_optimization": { "stage": 3, }, "train_micro_batch_size_per_gpu": 1, } hfdsc = HfDeepSpeedConfig(ds_config) tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16) deepspeed.utils.set_z3_leaf_modules(model, [MixtralSparseMoeBlock]) model.eval() ds_engine = deepspeed.initialize(model=model, config_params=ds_config)[0] ds_engine.module.eval() model = ds_engine.module inputs = tokenizer.encode("DeepSpeed is", return_tensors="pt").to("cuda") outputs = model.generate(inputs, max_new_tokens=200) output_str = tokenizer.decode(outputs[0]) if dist.get_rank() == 0: print(f"output: {output_str}") ``` By passing names of modules to `set_z3_leaf_modules`, DeepSpeed engine stops breaking down the module. In this example, `MixtralSparseMoeBlock` has multiple experts as its submodule. Using `set_z3_leaf_modules`, the DeepSpeed engine fetches parameters of all the submodules when pre-fetching the parameters of `MixtralSparseMoeBlock`.

matthewdm0816 · 2024-03-04T00:11:43Z

Hi, I also found this problem also in my experiments. It seems in generation some parameters are not used.
Except the PR, a simple workaround can be passing a dummy input to invoke that unused parameter in inference.
While warnings like "Invalidate trace cache @ step 1: expected module 1704, but got module 1703" still appears, but the training and generation seems to be fine.

tiwargau added bug Something isn't working inference labels Aug 4, 2023

tiwargau changed the title ~~[BUG] Unused params lead to "till have inflight params" error~~ [BUG] Unused params lead to "still have inflight params" error Aug 4, 2023

HeyangQin self-assigned this Aug 10, 2023

LZHgrla mentioned this issue Dec 19, 2023

[BUG] Deepspeed Zero 3 Inference InFlight Params with new HuggingFace Mixtral Model #4808

Open

tohtana mentioned this issue Jan 17, 2024

Add API to set a module as a leaf node when recursively setting Z3 hooks #4966

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unused params lead to "still have inflight params" error #4094

[BUG] Unused params lead to "still have inflight params" error #4094

tiwargau commented Aug 4, 2023

alexwangmac commented Aug 9, 2023

tiwargau commented Aug 10, 2023

hatrexltd commented Aug 27, 2023

haixpham commented Dec 1, 2023

siddk commented Dec 18, 2023

haixpham commented Dec 19, 2023

andre-bauer commented Jan 3, 2024 •

edited

Loading

BBerabi commented Jan 11, 2024

tohtana commented Jan 16, 2024

tohtana commented Jan 19, 2024 •

edited

Loading

matthewdm0816 commented Mar 4, 2024

[BUG] Unused params lead to "still have inflight params" error #4094

[BUG] Unused params lead to "still have inflight params" error #4094

Comments

tiwargau commented Aug 4, 2023

alexwangmac commented Aug 9, 2023

tiwargau commented Aug 10, 2023

hatrexltd commented Aug 27, 2023

haixpham commented Dec 1, 2023

siddk commented Dec 18, 2023

haixpham commented Dec 19, 2023

andre-bauer commented Jan 3, 2024 • edited Loading

BBerabi commented Jan 11, 2024

tohtana commented Jan 16, 2024

tohtana commented Jan 19, 2024 • edited Loading

matthewdm0816 commented Mar 4, 2024

andre-bauer commented Jan 3, 2024 •

edited

Loading

tohtana commented Jan 19, 2024 •

edited

Loading