Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: "Only able to place X replicas, but Y replicas were requested" #381

Open
spring1915 opened this issue Jan 19, 2024 · 2 comments
Open
Assignees

Comments

@spring1915
Copy link

I ran

client = mii.serve("mistralai/Mistral-7B-Instruct-v0.2")
response = client.generate(inputs, max_new_tokens=128, tensor_parallel=2,  replica_num=2)

on AWS ml.g5.12xlarge with 4 GPUs on one instance. I got this error Only able to place 1 replicas, but 2 replicas were requested. A similar error (Only able to place 1 replicas, but 4 replicas were requested.) also occurred when I used client.generate(inputs, max_new_tokens=128, replica_num=4).

I used AWS DJL DeepSpeed to run, with this serving.properties file:

engine=DeepSpeed
option.entrypoint=model.py

model.py is a customized file, containing the code above and other simple scripts needed when using the DJL server.

@spring1915 spring1915 changed the title Error: "Only able to place X replicas, but 4 replicas were requested" Error: "Only able to place X replicas, but Y replicas were requested" Jan 19, 2024
@mrwyattii
Copy link
Contributor

Hi @spring1915 the tensor_parallel and replica_num values should be passed to mii.serve. I've updated MII in #386 to error out when providing extra kwargs that we do not support to the generate method. Can you please update your code to the following and try again?

client = mii.serve("mistralai/Mistral-7B-Instruct-v0.2", tensor_parallel=2,  replica_num=2)
response = client.generate(inputs, max_new_tokens=128)

@mrwyattii mrwyattii self-assigned this Jan 23, 2024
@gangooteli
Copy link

gangooteli commented Feb 19, 2024

Getting below error :

python3 -m api_server
[2024-02-19 20:51:50,842] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
args.replica_num  1
[2024-02-19 20:51:51,516] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
[2024-02-19 20:51:51,516] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/api_server.py", line 192, in <module>
    mii.serve(args.model,
  File "/usr/local/lib/python3.10/dist-packages/mii/api.py", line 124, in serve
    import_score_file(mii_config.deployment_name, DeploymentType.LOCAL).init()
  File "/tmp/mii_cache/deepspeed-mii/score.py", line 33, in init
    mii.backend.MIIServer(mii_config)
  File "/usr/local/lib/python3.10/dist-packages/mii/backend/server.py", line 44, in __init__
    mii_config.generate_replica_configs()
  File "/usr/local/lib/python3.10/dist-packages/mii/config.py", line 302, in generate_replica_configs
    replica_pool = _allocate_devices(self.hostfile,
  File "/usr/local/lib/python3.10/dist-packages/mii/config.py", line 350, in _allocate_devices
    raise ValueError(
ValueError: Only able to place 0 replicas, but 1 replicas were requested.

Is deepspeedmii not suitable for single gpu env A40

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A40                     On  | 00000000:00:07.0 Off |                    0 |
|  0%   53C    P8              23W / 300W |      4MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Please provide help on the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants