[Usage]: No tokenizer while ray usage #3517

dmayboroda · 2024-03-20T01:22:30Z

Your current environment

PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-1014-aws-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 550.54.14
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      48 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             16
On-line CPU(s) list:                0-15
Vendor ID:                          AuthenticAMD
Model name:                         AMD EPYC 7R32
CPU family:                         23
Model:                              49
Thread(s) per core:                 2
Core(s) per socket:                 8
Socket(s):                          1
Stepping:                           0
BogoMIPS:                           5599.92
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid
Hypervisor vendor:                  KVM
Virtualization type:                full
L1d cache:                          256 KiB (8 instances)
L1i cache:                          256 KiB (8 instances)
L2 cache:                           4 MiB (8 instances)
L3 cache:                           32 MiB (2 instances)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-15
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; safe RET, no microcode
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.1.2
[pip3] triton==2.1.0
[conda] Could not collectROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.3.3
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	0-15	0		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

How would you like to use vllm

I am trying to run a Mistral with vllm with this command

python -u -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --model mistralai/Mistral-7B-Instruct-v0.2 --max-model-len 15000 --port 80 --enforce-eager --device cuda --engine-use-ray --tokenizer mistralai/Mistral-7B-Instruct-v0.2 --tokenizer-mode auto

I have Nvidia A10 running and if i will not use ray, everything works perfectly, but if i activate ray i got this error
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__ return await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 83, in __call__ await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 758, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 74, in app response = await func(request) File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 168, in create_chat_completion generator = await openai_serving_chat.create_chat_completion( File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 68, in create_chat_completion request, self.engine.get_tokenizer())) File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 337, in get_tokenizer return self.engine.tokenizer.tokenizer File "/usr/local/lib/python3.10/dist-packages/ray/actor.py", line 1368, in __getattr__ raise AttributeError( AttributeError: 'ActorHandle' object has no attribute 'tokenizer'

The text was updated successfully, but these errors were encountered:

esmeetu · 2024-03-20T04:35:23Z

Hi. @dmayboroda This should be resolved by #3301.

whybeyoung · 2024-09-11T08:43:19Z

the same issue when deploy qwen72b

github-actions · 2024-12-11T02:07:26Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

ruisearch42 · 2024-12-21T00:38:45Z

Closing this issue since this should be fixed by #3301 , feel free to reopen if the issue still exists.

dmayboroda added the usage How to use vllm label Mar 20, 2024

syGOAT mentioned this issue Apr 24, 2024

[Bug]: 'ActorHandle' object has no attribute 'decoding_config' when setting --engine-use-ray #4317

Closed

github-actions bot added the stale Over 90 days of inactivity label Dec 11, 2024

richardliaw added the ray anything related with ray label Dec 11, 2024

github-actions bot added unstale Recieved activity after being labelled stale and removed stale Over 90 days of inactivity labels Dec 12, 2024

ruisearch42 closed this as completed Dec 21, 2024

hmellor moved this to Backlog in Ray Feb 28, 2025

hmellor added this to Ray Feb 28, 2025

hmellor moved this from Backlog to Done in Ray Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: No tokenizer while ray usage #3517

[Usage]: No tokenizer while ray usage #3517

dmayboroda commented Mar 20, 2024

esmeetu commented Mar 20, 2024

whybeyoung commented Sep 11, 2024 •

edited

Loading

github-actions bot commented Dec 11, 2024

ruisearch42 commented Dec 21, 2024 •

edited

Loading

[Usage]: No tokenizer while ray usage #3517

[Usage]: No tokenizer while ray usage #3517

Comments

dmayboroda commented Mar 20, 2024

Your current environment

How would you like to use vllm

esmeetu commented Mar 20, 2024

whybeyoung commented Sep 11, 2024 • edited Loading

github-actions bot commented Dec 11, 2024

ruisearch42 commented Dec 21, 2024 • edited Loading

whybeyoung commented Sep 11, 2024 •

edited

Loading

ruisearch42 commented Dec 21, 2024 •

edited

Loading