Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: FastAPI 0.113.0 breaks vLLM OpenAPI #8212

Closed
1 task done
drikster80 opened this issue Sep 5, 2024 · 9 comments · Fixed by #8435 or #8764
Closed
1 task done

[Bug]: FastAPI 0.113.0 breaks vLLM OpenAPI #8212

drikster80 opened this issue Sep 5, 2024 · 9 comments · Fixed by #8435 or #8764
Labels
bug Something isn't working

Comments

@drikster80
Copy link

Your current environment

The output of `python collect_env.py`
Collecting environment information...
WARNING 09-05 21:11:49 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead, and make sure to uninstall `pynvml`. When both of them are installed, `pynvml` will take precedence and cause errors. See https://pypi.org/project/pynvml for more information.
WARNING 09-05 21:11:49 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/vllm/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm.commit_id'
  from vllm.version import __version__ as VLLM_VERSION
PyTorch version: 2.4.0a0+3bcc3cddb5.nv24.07
Is debug build: False
CUDA used to build PyTorch: 12.5
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.30.0
Libc version: glibc-2.35

Python version: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-1024-nvidia-64k-aarch64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.5.82
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GH200 480GB
Nvidia driver version: 560.35.03
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_adv.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_cnn.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_engines_precompiled.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_graph.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_heuristic.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_ops.so.9.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       aarch64
CPU op-mode(s):                     64-bit
Byte Order:                         Little Endian
CPU(s):                             72
On-line CPU(s) list:                0-71
Vendor ID:                          ARM
Model name:                         Neoverse-V2
Model:                              0
Thread(s) per core:                 1
Core(s) per socket:                 72
Socket(s):                          1
Stepping:                           r0p0
Frequency boost:                    disabled
CPU max MHz:                        3492.0000
CPU min MHz:                        81.0000
BogoMIPS:                           2000.00
Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
L1d cache:                          4.5 MiB (72 instances)
L1i cache:                          4.5 MiB (72 instances)
L2 cache:                           72 MiB (72 instances)
L3 cache:                           114 MiB (1 instance)
NUMA node(s):                       9
NUMA node0 CPU(s):                  0-71
NUMA node1 CPU(s):
NUMA node2 CPU(s):
NUMA node3 CPU(s):
NUMA node4 CPU(s):
NUMA node5 CPU(s):
NUMA node6 CPU(s):
NUMA node7 CPU(s):
NUMA node8 CPU(s):
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization
Vulnerability Spectre v2:           Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.24.4
[pip3] nvidia-cudnn-frontend==1.5.1
[pip3] nvidia-dali-cuda120==1.39.0
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-modelopt==0.13.0
[pip3] nvidia-nvimgcodec-cu12==0.2.0.7
[pip3] nvidia-pyindex==1.0.9
[pip3] onnx==1.16.0
[pip3] optree==0.12.1
[pip3] pynvml==11.4.1
[pip3] pytorch-triton==3.0.0+989adb9a2
[pip3] pyzmq==26.0.3
[pip3] torch==2.4.0a0+3bcc3cddb5.nv24.7
[pip3] torch-tensorrt==2.5.0a0
[pip3] torchvision==0.19.0a0
[pip3] transformers==4.44.2
[pip3] triton==3.0.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.0@COMMIT_HASH_PLACEHOLDER
vLLM Build Flags:
CUDA Archs: 9.0+PTX; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    NIC0    NIC1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NODE    NODE    0-71    0               1
NIC0    NODE     X      PIX
NIC1    NODE    PIX      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1

🐛 Describe the bug

FastAPI released 0.113.0 about 5 hours ago. This release has a major refactor of Pydantic support. It appears this causes a Pydantic failure with the OpenAI-API calling.

Confirmed that reverting to FastAPI 0.112.2 resolves the problem (pip install fastapi==0.112.2).

Here are logs on the failure:

INFO:     172.16.10.6:40700 - "GET /v1/models HTTP/1.1" 200 OK
INFO:     172.16.10.6:39032 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 277, in _init_core_attrs
    self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 119, in _getattr_no_parents
    raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 73, in app
    response = await f(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 291, in app
    solved_result = await solve_dependencies(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/dependencies/utils.py", line 639, in solve_dependencies
    ) = await request_body_to_args(  # body_params checked above
  File "/usr/local/lib/python3.10/dist-packages/fastapi/dependencies/utils.py", line 810, in request_body_to_args
    fields_to_extract = get_model_fields(first_field.type_)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 283, in get_model_fields
    return [
  File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 284, in <listcomp>
    ModelField(field_info=field_info, name=name)
  File "<string>", line 6, in __init__
  File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 109, in __post_init__
    self._type_adapter: TypeAdapter[Any] = TypeAdapter(
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 264, in __init__
    self._init_core_attrs(rebuild_mocks=False)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 142, in wrapped
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 284, in _init_core_attrs
    self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 102, in _get_schema
    schema = gen.generate_schema(type_)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
    schema = self._generate_schema_inner(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 768, in _generate_schema_inner
    return self._annotated_schema(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1822, in _annotated_schema
    schema = self._apply_annotations(source_type, annotations)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1890, in _apply_annotations
    schema = get_inner_schema(source_type)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
    schema = self._handler(source_type)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
    schema = metadata_get_schema(source, get_inner_schema)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1968, in <lambda>
    lambda source, handler: handler(source)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
    schema = self._handler(source_type)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
    schema = metadata_get_schema(source, get_inner_schema)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_std_types_schema.py", line 316, in __get_pydantic_core_schema__
    items_schema = handler.generate_schema(self.item_source_type)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 97, in generate_schema
    return self._generate_schema.generate_schema(source_type)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
    schema = self._generate_schema_inner(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
    return self.match_type(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 871, in match_type
    return self._match_generic_type(obj, origin)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 895, in _match_generic_type
    return self._union_schema(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1207, in _union_schema
    choices.append(self.generate_schema(arg))
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
    schema = self._generate_schema_inner(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
    return self.match_type(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
    return self._typed_dict_schema(obj, None)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1309, in _typed_dict_schema
    for field_name, annotation in get_type_hints_infer_globalns(
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_fields.py", line 57, in get_type_hints_infer_globalns
    return get_type_hints(obj, globalns=globalns, localns=localns, include_extras=include_extras)
  File "/usr/lib/python3.10/typing.py", line 1833, in get_type_hints
    value = _eval_type(value, base_globals, base_locals)
  File "/usr/lib/python3.10/typing.py", line 327, in _eval_type
    return t._evaluate(globalns, localns, recursive_guard)
  File "/usr/lib/python3.10/typing.py", line 694, in _evaluate
    eval(self.__forward_code__, globalns, localns),
  File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.PydanticUndefinedType' object is not subscriptable
INFO:     172.16.10.6:39048 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 277, in _init_core_attrs
    self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 119, in _getattr_no_parents
    raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 73, in app
    response = await f(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 291, in app
    solved_result = await solve_dependencies(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/dependencies/utils.py", line 639, in solve_dependencies
    ) = await request_body_to_args(  # body_params checked above
  File "/usr/local/lib/python3.10/dist-packages/fastapi/dependencies/utils.py", line 810, in request_body_to_args
    fields_to_extract = get_model_fields(first_field.type_)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 283, in get_model_fields
    return [
  File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 284, in <listcomp>
    ModelField(field_info=field_info, name=name)
  File "<string>", line 6, in __init__
  File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 109, in __post_init__
    self._type_adapter: TypeAdapter[Any] = TypeAdapter(
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 264, in __init__
    self._init_core_attrs(rebuild_mocks=False)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 142, in wrapped
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 284, in _init_core_attrs
    self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 102, in _get_schema
    schema = gen.generate_schema(type_)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
    schema = self._generate_schema_inner(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 768, in _generate_schema_inner
    return self._annotated_schema(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1822, in _annotated_schema
    schema = self._apply_annotations(source_type, annotations)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1890, in _apply_annotations
    schema = get_inner_schema(source_type)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
    schema = self._handler(source_type)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
    schema = metadata_get_schema(source, get_inner_schema)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1968, in <lambda>
    lambda source, handler: handler(source)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
    schema = self._handler(source_type)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
    schema = metadata_get_schema(source, get_inner_schema)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_std_types_schema.py", line 316, in __get_pydantic_core_schema__
    items_schema = handler.generate_schema(self.item_source_type)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 97, in generate_schema
    return self._generate_schema.generate_schema(source_type)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
    schema = self._generate_schema_inner(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
    return self.match_type(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 871, in match_type
    return self._match_generic_type(obj, origin)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 895, in _match_generic_type
    return self._union_schema(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1207, in _union_schema
    choices.append(self.generate_schema(arg))
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
    schema = self._generate_schema_inner(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
    return self.match_type(obj)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
    return self._typed_dict_schema(obj, None)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1309, in _typed_dict_schema
    for field_name, annotation in get_type_hints_infer_globalns(
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_fields.py", line 57, in get_type_hints_infer_globalns
    return get_type_hints(obj, globalns=globalns, localns=localns, include_extras=include_extras)
  File "/usr/lib/python3.10/typing.py", line 1833, in get_type_hints
    value = _eval_type(value, base_globals, base_locals)
  File "/usr/lib/python3.10/typing.py", line 327, in _eval_type
    return t._evaluate(globalns, localns, recursive_guard)
  File "/usr/lib/python3.10/typing.py", line 694, in _evaluate
    eval(self.__forward_code__, globalns, localns),
  File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.PydanticUndefinedType' object is not subscriptable

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@drikster80
Copy link
Author

I believe I was able to find a solution to this. It is related to OpenAI-Python #1454

Not sure why it works with fastapi 0.112.2 but fails in 0.113.0

Problem line:

https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L286

async def create_chat_completion(request: ChatCompletionRequest,
                                 raw_request: Request):

Confirmed Fix:

async def create_chat_completion(request: Annotated[dict, ChatCompletionRequest],
                                 raw_request: Request):

I'll make a PR on this and reference the issue. Can also add some try/catch with TypeAdapter validation, unless it's seen as unnecessary or impacts performance.

@drikster80
Copy link
Author

Minimal example of triggering the issue:

Quick guide to running the latest vllm-openai container, upgrading fastapi, and triggering the issue. Also includes instructions to quickly change to editable mode

Pre-requisites:

  • Requires system with GPU and ability to pass GPU into docker container (e.g. nvidia-container-toolkit)

Download and start the latest vllm container:

docker run --gpus all -it --rm --network=host --ipc=host --entrypoint /bin/bash vllm/vllm-openai:latest

Working Example with 0.112.2:

Show current fastapi version:

python3 -c "import fastapi; print(fastapi.__version__)"
0.112.2

Start server with small model

python3 -m vllm.entrypoints.openai.api_server --model facebook/opt-125m

POST to 'v1/chat/completions'

curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
}'

Non-working example after upgrading fastapi:

Upgrade fastapi to 0.113.0 or higher

pip install --upgrade fastapi==0.113.0

Start openai-compatible api_server:

python3 -m vllm.entrypoints.openai.api_server --model facebook/opt-125m

From outside the container, attempt POST to 'v1/chat/completions':

curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
}'

Dev setup using pre-compiled C-binaries (saves hours of compiling when running pip install -e .):

Start docker container as above.

Install devel packages for Nvidia

apt-get update && apt-get install -y --no-install-recommends libtinfo5 libncursesw5 \
	cuda-cudart-dev-12-4=12.4.127-1 \
	cuda-command-line-tools-12-4=12.4.1-1 \
	cuda-minimal-build-12-4=12.4.1-1 \
	cuda-libraries-dev-12-4=12.4.1-1 \
	cuda-nvml-dev-12-4=12.4.127-1 \
	cuda-nvprof-12-4=12.4.127-1 \
	libnpp-dev-12-4=12.2.5.30-1 \
	libcusparse-dev-12-4=12.3.1.170-1 \
	libcublas-dev-12-4=12.4.5.8-1 \
	libnccl2=2.21.5-1+cuda12.4 \
	libnccl-dev=2.21.5-1+cuda12.4 \
	cuda-nsight-compute-12-4=12.4.1-1

Build vllm editable using precompiled binaries

git clone https://github.com/vllm-project/vllm.git /vllm
cd /vllm
cp /usr/local/lib/python3.10/dist-packages/vllm/*.so /vllm/vllm
VLLM_USE_PRECOMPILED=1 pip install -e .

Run API server

python3 ./vllm/entrypoints/openai/api_server.py --model facebook/opt-125m

Example of an inference request:

curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "facebook/opt-125m",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"}
  ],
  "chat_template": "{% if messages[0][\"role\"] == \"system\" %}{{ messages[0][\"content\"] }}\n{% endif %}{% for message in messages[1:] %}{% if message[\"role\"] == \"user\" %}Human: {{ message[\"content\"] }}\n{% elif message[\"role\"] == \"assistant\" %}Assistant: {{ message[\"content\"] }}\n{% endif %}{% endfor %}Assistant:",
  "max_tokens": 100
}'

@arshadshk
Copy link

I resolved the issue by downgrading FastAPI to version 0.111.0:

pip install fastapi==0.111.0

For reference, I'm using vllm==0.6.0.

@pachewise
Copy link

A few things I noticed:

  1. these issues seem to be related to this change: fastapi/fastapi@aa21814 (that new get_model_fields() list comprehension)
  2. this particular issue seems to happen around TypedDicts; from stack trace above:
# ...
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
    return self._typed_dict_schema(obj, None)

This may be a red herring, but wondering if there's some weirdness with Required or similar TypedDict hints.

Anyway, smallest reproducible example:

$ pip install vllm==0.6.0 fastapi==0.113.0 pydantic==2.8.2
$ python
Python 3.11.4 (main, Nov 28 2023, 16:28:36) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fastapi._compat import get_model_fields
>>> from vllm.entrypoints.openai.protocol import ChatCompletionRequest
INFO 09-10 02:16:23 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 09-10 02:16:23 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
>>> get_model_fields(ChatCompletionRequest)
Traceback (most recent call last):
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 277, in _init_core_attrs
    self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 119, in _getattr_no_parents
    raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 283, in get_model_fields
    return [
           ^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 284, in <listcomp>
    ModelField(field_info=field_info, name=name)
  File "<string>", line 6, in __init__
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 109, in __post_init__
    self._type_adapter: TypeAdapter[Any] = TypeAdapter(
                                           ^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 264, in __init__
    self._init_core_attrs(rebuild_mocks=False)
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 142, in wrapped
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 284, in _init_core_attrs
    self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 102, in _get_schema
    schema = gen.generate_schema(type_)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
    schema = self._generate_schema_inner(obj)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 768, in _generate_schema_inner
    return self._annotated_schema(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1822, in _annotated_schema
    schema = self._apply_annotations(source_type, annotations)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1890, in _apply_annotations
    schema = get_inner_schema(source_type)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
    schema = self._handler(source_type)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
    schema = metadata_get_schema(source, get_inner_schema)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1968, in <lambda>
    lambda source, handler: handler(source)
                            ^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
    schema = self._handler(source_type)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
    schema = metadata_get_schema(source, get_inner_schema)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_std_types_schema.py", line 316, in __get_pydantic_core_schema__
    items_schema = handler.generate_schema(self.item_source_type)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 97, in generate_schema
    return self._generate_schema.generate_schema(source_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
    schema = self._generate_schema_inner(obj)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
    return self.match_type(obj)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 871, in match_type
    return self._match_generic_type(obj, origin)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 895, in _match_generic_type
    return self._union_schema(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1207, in _union_schema
    choices.append(self.generate_schema(arg))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
    schema = self._generate_schema_inner(obj)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
    return self.match_type(obj)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
    return self._typed_dict_schema(obj, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1309, in _typed_dict_schema
    for field_name, annotation in get_type_hints_infer_globalns(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_fields.py", line 57, in get_type_hints_infer_globalns
    return get_type_hints(obj, globalns=globalns, localns=localns, include_extras=include_extras)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 2336, in get_type_hints
    value = _eval_type(value, base_globals, base_locals)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 371, in _eval_type
    return t._evaluate(globalns, localns, recursive_guard)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 877, in _evaluate
    eval(self.__forward_code__, globalns, localns),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.PydanticUndefinedType' object is not subscriptable
>>> 

Note that pydantic==2.9.0 does not have this issue.

$ pip install pydantic==2.9.0
$ python
Python 3.11.4 (main, Nov 28 2023, 16:28:36) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fastapi._compat import get_model_fields
>>> from vllm.entrypoints.openai.protocol import ChatCompletionRequest
INFO 09-10 02:26:12 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 09-10 02:26:12 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
>>> get_model_fields(ChatCompletionRequest)
[ModelField(field_info=FieldInfo(annotation=List[Union[ChatCompletionSystemMessageParam, ChatCompletionUserMessageParam, ChatCompletionAssistantMessageParam, ChatCompletionToolMessageParam, ChatCompletionFunctionMessageParam, CustomChatCompletionMessageParam]], required=True), name='messages', mode='validation'), ModelField(field_info=FieldInfo(annotation=str, required=True), name='model', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.0), name='frequency_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Dict[str, float], NoneType], required=False, default=None), name='logit_bias', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=0), name='top_logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='max_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=1), name='n', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.0), name='presence_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[ResponseFormat, NoneType], required=False, default=None), name='response_format', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None, metadata=[Ge(ge=-9223372036854775808), Le(le=9223372036854775807)]), name='seed', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, List[str], NoneType], required=False, default_factory=list), name='stop', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='stream', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[StreamOptions, NoneType], required=False, default=None), name='stream_options', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.7), name='temperature', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=1.0), name='top_p', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[ChatCompletionToolsParam], NoneType], required=False, default=None), name='tools', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Literal['none'], Literal['auto'], ChatCompletionNamedToolChoiceParam, NoneType], required=False, default='none'), name='tool_choice', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='parallel_tool_calls', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None), name='user', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='best_of', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='use_beam_search', mode='validation'), ModelField(field_info=FieldInfo(annotation=int, required=False, default=-1), name='top_k', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=0.0), name='min_p', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=1.0), name='repetition_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=1.0), name='length_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='early_stopping', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[int], NoneType], required=False, default_factory=list), name='stop_token_ids', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='include_stop_str_in_output', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='ignore_eos', mode='validation'), ModelField(field_info=FieldInfo(annotation=int, required=False, default=0), name='min_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True), name='skip_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True), name='spaces_between_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1)])], NoneType], required=False, default=None), name='truncate_prompt_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='prompt_logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False, description='If true, the new message will be prepended with the last message if they belong to the same role.'), name='echo', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True, description='If true, the generation prompt will be added to the chat template. This is a parameter used by chat template in tokenizer config of the model.'), name='add_generation_prompt', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False, description='If true, special tokens (e.g. BOS) will be added to the prompt on top of what is added by the chat template. For most models, the chat template takes care of adding the special tokens so this should be set to false (as is the default).'), name='add_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[Dict[str, str]], NoneType], required=False, default=None, description='A list of dicts representing documents that will be accessible to the model if it is performing RAG (retrieval-augmented generation). If the template does not support RAG, this argument will have no effect. We recommend that each document should be a dict containing "title" and "text" keys.'), name='documents', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='A Jinja template to use for this conversion. As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.'), name='chat_template', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Dict[str, Any], NoneType], required=False, default=None, description='Additional kwargs to pass to the template renderer. Will be accessible by the chat template.'), name='chat_template_kwargs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, dict, BaseModel, NoneType], required=False, default=None, description='If specified, the output will follow the JSON schema.'), name='guided_json', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, the output will follow the regex pattern.'), name='guided_regex', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None, description='If specified, the output will be exactly one of the choices.'), name='guided_choice', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, the output will follow the context free grammar.'), name='guided_grammar', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description="If specified, will override the default guided decoding backend of the server for this specific request. If set, must be either 'outlines' / 'lm-format-enforcer'"), name='guided_decoding_backend', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, will override the default whitespace pattern for guided json decoding.'), name='guided_whitespace_pattern', mode='validation')]
>>> 

This makes me feel like this is a pydantic issue? Or at least a confluence of factors across openai / pydantic / fastapi.

@tiangolo
Copy link

Checking @pachewise's code, I was able to reduce the error reproduction to:

from typing_extensions import Annotated
from typing import List
from vllm.entrypoints.chat_utils import (
    ChatCompletionMessageParam,
)
from vllm.entrypoints.openai.protocol import ChatCompletionRequest


from pydantic import TypeAdapter


for name, field in ChatCompletionRequest.model_fields.items():
    print(name, field)
    TypeAdapter(Annotated[List[ChatCompletionMessageParam], field])

That doesn't use FastAPI, it's just Pydantic. And indeed, it's fixed by upgrading Pydantic to 2.9.0. 🎉


It wasn't breaking in FastAPI before because the logic before 0.113.0 wasn't using TypeAdapter yet in that part of the code, and it seems that in the previous version of Pydantic there was a bug in it (not sure where, but it's already solved in 2.9.0).

@DarkLight1337
Copy link
Member

Glad that it's resolved! Does the issue still occur in FastAPI 0.113.1 with Pydantic 2.8? If so, we may have to update either fastapi or pydantic in our dependencies to make sure that users doesn't install the faulty versions.

@pachewise
Copy link

@DarkLight1337 yes, I'd recommend fastapi >= 0.114.1 (to fix a performance issue related to this part of their code) and pydantic >= 2.9.0 (to fix the actual issue we're seeing here).

@darthhexx
Copy link
Contributor

Unfortunately the fastapi bump has broken Ray 2.9 compatibility.

$ pip install vllm==0.6.1.post2 'ray[serve]==2.9.3'
... snip...
The conflict is caused by:
    vllm 0.6.1.post2 depends on fastapi>=0.114.1; python_version >= "3.9"
    ray[serve] 2.9.3 depends on fastapi<=0.108.0; extra == "serve"

I've prepped a fix for the Ray 2.9 regression introduced in a different PR, but it won't really help unless we address the fastapi pin here as well.

Can we lower the fastapi pinned version, since it wasn't actually the cause of the issue, so we maintain the Ray 2.9 compatibility?

@DarkLight1337
Copy link
Member

Unfortunately the fastapi bump has broken Ray 2.9 compatibility.

$ pip install vllm==0.6.1.post2 'ray[serve]==2.9.3'
... snip...
The conflict is caused by:
    vllm 0.6.1.post2 depends on fastapi>=0.114.1; python_version >= "3.9"
    ray[serve] 2.9.3 depends on fastapi<=0.108.0; extra == "serve"

I've prepped a fix for the Ray 2.9 regression introduced in a different PR, but it won't really help unless we address the fastapi pin here as well.

Can we lower the fastapi pinned version, since it wasn't actually the cause of the issue, so we maintain the Ray 2.9 compatibility?

On it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment