Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deployment Issues] docling-serve on AWS g5.xlarge Instance #54

Closed
emyco opened this issue Feb 19, 2025 · 4 comments
Closed

[Deployment Issues] docling-serve on AWS g5.xlarge Instance #54

emyco opened this issue Feb 19, 2025 · 4 comments

Comments

@emyco
Copy link

emyco commented Feb 19, 2025

Description

I'm encountering errors while deploying docling-serve on an AWS g5.xlarge instance. Below are the details of my setup:

  • docling-serve Image: docker pull ghcr.io/ds4sd/docling-serve:sha256-e797326e42984edac7f6640fba316dd908a7cfd23265e0d43616be6b6448e55e
  • AWS Instance Type: g5.xlarge
  • AWS AMI: Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.1 (Amazon Linux 2023) 20250216
  • Command Used: docker run --gpus all --shm-size=8g -e CUDA_HOME=/usr/local/cuda -e PATH='/opt/app-root/bin:/opt/app-root/bin:/opt/app-root/src/.local/bin/:/opt/app-root/src/bin:/opt/app-root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' -e TORCH_CUDA_ARCH_LIST='8.0 8.6 8.9+PTX' -it -p 5001:5001 docling:test
  • Error Messages:
docker run --gpus all --shm-size=8g -e CUDA_HOME=/usr/local/cuda -e PATH='/opt/app-root/bin:/opt/app-root/bin:/opt/app-root/src/.local/bin/:/opt/app-root/src/bin:/opt/app-root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' -e TORCH_CUDA_ARCH_LIST='8.0 8.6 8.9+PTX' -it  -p 5001:5001 docling:test
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:   2025-02-19 20:43:39,416 - docling.utils.accelerator_utils - Accelerator device: 'cuda:0'
WARNING:        2025-02-19 20:43:39,418 - easyocr.easyocr - Downloading detection model, please wait. This may take several minutes depending upon your network connection.
INFO:   2025-02-19 20:43:39,708 - httpx - HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
INFO:   2025-02-19 20:43:41,052 - easyocr.easyocr - Download complete
WARNING:        2025-02-19 20:43:41,052 - easyocr.easyocr - Downloading recognition model, please wait. This may take several minutes depending upon your network connection.
INFO:   2025-02-19 20:43:41,995 - easyocr.easyocr - Download complete.
INFO:   2025-02-19 20:43:44,083 - docling.utils.accelerator_utils - Accelerator device: 'cuda:0'
Could not load the custom kernel for multi-scale deformable attention: Error building extension 'MultiScaleDeformableAttention': [1/4] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output ms_deform_attn_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/TH -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++17 -c /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu -o ms_deform_attn_cuda.cuda.o
FAILED: ms_deform_attn_cuda.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output ms_deform_attn_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/TH -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++17 -c /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/cuda/ms_deform_attn_cuda.cu -o ms_deform_attn_cuda.cuda.o
/bin/sh: line 1: /usr/local/cuda/bin/nvcc: No such file or directory
[2/4] c++ -MMD -MF ms_deform_attn_cpu.o.d -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/TH -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -DWITH_CUDA=1 -c /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/cpu/ms_deform_attn_cpu.cpp -o ms_deform_attn_cpu.o
FAILED: ms_deform_attn_cpu.o
c++ -MMD -MF ms_deform_attn_cpu.o.d -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/TH -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -DWITH_CUDA=1 -c /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/cpu/ms_deform_attn_cpu.cpp -o ms_deform_attn_cpu.o
In file included from /opt/app-root/lib/python3.12/site-packages/torch/include/ATen/cuda/CUDAContext.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/cpu/ms_deform_attn_cpu.cpp:14:
/opt/app-root/lib/python3.12/site-packages/torch/include/ATen/cuda/CUDAContextLight.h:6:10: fatal error: cuda_runtime_api.h: No such file or directory
    6 | #include <cuda_runtime_api.h>
      |          ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
[3/4] c++ -MMD -MF vision.o.d -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/TH -isystem /opt/app-root/lib64/python3.12/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -DWITH_CUDA=1 -c /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/vision.cpp -o vision.o
In file included from /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/vision.cpp:11:
/opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/ms_deform_attn.h: In function ‘at::Tensor ms_deform_attn_forward(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, int)’:
/opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/ms_deform_attn.h:29:19: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
   29 |     if (value.type().is_cuda())
      |         ~~~~~~~~~~^~
In file included from /opt/app-root/lib/python3.12/site-packages/torch/include/ATen/core/Tensor.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/ATen/Tensor.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/extension.h:5,
                 from /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/cpu/ms_deform_attn_cpu.h:12,
                 from /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/ms_deform_attn.h:13,
                 from /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/vision.cpp:11:
/opt/app-root/lib/python3.12/site-packages/torch/include/ATen/core/TensorBody.h:225:30: note: declared here
  225 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/vision.cpp:11:
/opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/ms_deform_attn.h: In function ‘std::vector<at::Tensor> ms_deform_attn_backward(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, int)’:
/opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/ms_deform_attn.h:51:19: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
   51 |     if (value.type().is_cuda())
      |         ~~~~~~~~~~^~
In file included from /opt/app-root/lib/python3.12/site-packages/torch/include/ATen/core/Tensor.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/ATen/Tensor.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /opt/app-root/lib/python3.12/site-packages/torch/include/torch/extension.h:5,
                 from /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/cpu/ms_deform_attn_cpu.h:12,
                 from /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/ms_deform_attn.h:13,
                 from /opt/app-root/lib/python3.12/site-packages/transformers/kernels/deformable_detr/vision.cpp:11:
/opt/app-root/lib/python3.12/site-packages/torch/include/ATen/core/TensorBody.h:225:30: note: declared here
  225 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
ninja: build stopped: subcommand failed.

Could not load the custom kernel for multi-scale deformable attention: /opt/app-root/src/.cache/torch_extensions/py312_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: Nosuch file or directory
Could not load the custom kernel for multi-scale deformable attention: /opt/app-root/src/.cache/torch_extensions/py312_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: Nosuch file or directory
Could not load the custom kernel for multi-scale deformable attention: /opt/app-root/src/.cache/torch_extensions/py312_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: Nosuch file or directory
Could not load the custom kernel for multi-scale deformable attention: /opt/app-root/src/.cache/torch_extensions/py312_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: Nosuch file or directory
Could not load the custom kernel for multi-scale deformable attention: /opt/app-root/src/.cache/torch_extensions/py312_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: Nosuch file or directory
INFO:   2025-02-19 20:44:04,255 - docling.utils.accelerator_utils - Accelerator device: 'cuda:0'
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5001 (Press CTRL+C to quit)

Could you provide guidance on the recommended hardware requirements for deploying docling-serve? Any insights on troubleshooting this issue would also be greatly appreciated.

Thanks!

@dolfim-ibm
Copy link
Contributor

Do you know which CUDA driver version is installed on your machine? The image should be using CUDA 12.4.

@emyco
Copy link
Author

emyco commented Feb 19, 2025

@dolfim-ibm , r5.xlarge using CUDA 12.6.
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

nvidia-smi
Wed Feb 19 21:45:39 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 31C P0 60W / 300W | 861MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 7932 C python 852MiB |
+-----------------------------------------------------------------------------------------+

@emyco
Copy link
Author

emyco commented Feb 19, 2025

I have updated my docker command by:
docker run --gpus all
--shm-size=4g
-e CUDA_HOME="/usr/local/cuda-12.4"
-e PATH='/usr/local/cuda-12.4/bin:/opt/app-root/bin:/opt/app-root/src/.local/bin/:/opt/app-root/src/bin:/opt/app-root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
-e TORCH_CUDA_ARCH_LIST='8.0 8.6 8.9+PTX'
-v /usr/local/cuda/cuda-12.4:/usr/local/cuda-12.4
-it -p 5001:5001 docling:test

Still get errors as above. Based the research I got: huggingface/transformers#35349

@emyco
Copy link
Author

emyco commented Feb 21, 2025

Down torch version solved the issues: torch==2.5.1 torchvision==0.20.1

@emyco emyco closed this as completed Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants