Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in docker for ADA 6000 RTX -- nvcc fatal : Unsupported gpu architecture 'compute_89' #30

Closed
monajalal opened this issue Apr 5, 2024 · 1 comment

Comments

@monajalal
Copy link

monajalal commented Apr 5, 2024

I built the docker from scratch:

(my) root@ada:/data/FoundationPose# bash build_all.sh
CMake Error at /usr/local/share/cmake/pybind11/FindPythonLibsNew.cmake:147 (message):
  Python config failure:

Call Stack (most recent call first):
  /usr/local/share/cmake/pybind11/pybind11Tools.cmake:50 (find_package)
  /usr/local/share/cmake/pybind11/pybind11Common.cmake:180 (include)
  /usr/local/share/cmake/pybind11/pybind11Config.cmake:248 (include)
  CMakeLists.txt:13 (find_package)


-- Configuring incomplete, errors occurred!
See also "/data/FoundationPose/mycpp/build/CMakeFiles/CMakeOutput.log".
Obtaining file:///kaolin
  Preparing metadata (setup.py) ... done
Requirement already satisfied: ipycanvas in /opt/conda/envs/my/lib/python3.8/site-packages/ipycanvas-0.13.1-py3.8.egg (from kaolin==0.15.0) (0.13.1)
Requirement already satisfied: ipyevents in /opt/conda/envs/my/lib/python3.8/site-packages/ipyevents-2.0.2-py3.8.egg (from kaolin==0.15.0) (2.0.2)
Requirement already satisfied: jupyter_client<8 in /opt/conda/envs/my/lib/python3.8/site-packages/jupyter_client-7.4.9-py3.8.egg (from kaolin==0.15.0) (7.4.9)
Requirement already satisfied: pyzmq<25 in /opt/conda/envs/my/lib/python3.8/site-packages/pyzmq-24.0.1-py3.8-linux-x86_64.egg (from kaolin==0.15.0) (24.0.1)
Requirement already satisfied: flask in /opt/conda/envs/my/lib/python3.8/site-packages (from kaolin==0.15.0) (3.0.2)
Requirement already satisfied: tornado in /opt/conda/envs/my/lib/python3.8/site-packages/tornado-6.4-py3.8-linux-x86_64.egg (from kaolin==0.15.0) (6.4)
Requirement already satisfied: comm>=0.1.3 in /opt/conda/envs/my/lib/python3.8/site-packages (from kaolin==0.15.0) (0.2.2)
Requirement already satisfied: numpy in /opt/conda/envs/my/lib/python3.8/site-packages (from kaolin==0.15.0) (1.24.3)
Requirement already satisfied: pybind11 in /opt/conda/envs/my/lib/python3.8/site-packages/pybind11-2.12.0-py3.8.egg (from kaolin==0.15.0) (2.12.0)
Requirement already satisfied: Pillow>=8.0.0 in /opt/conda/envs/my/lib/python3.8/site-packages (from kaolin==0.15.0) (10.2.0)
Requirement already satisfied: tqdm>=4.51.0 in /opt/conda/envs/my/lib/python3.8/site-packages (from kaolin==0.15.0) (4.66.2)
Requirement already satisfied: scipy in /opt/conda/envs/my/lib/python3.8/site-packages (from kaolin==0.15.0) (1.10.1)
Requirement already satisfied: pygltflib in /opt/conda/envs/my/lib/python3.8/site-packages/pygltflib-1.16.2-py3.8.egg (from kaolin==0.15.0) (1.16.2)
Requirement already satisfied: usd-core<=23.5 in /opt/conda/envs/my/lib/python3.8/site-packages/usd_core-23.5-py3.8-linux-x86_64.egg (from kaolin==0.15.0) (23.5)
Requirement already satisfied: ipython<8.13 in /opt/conda/envs/my/lib/python3.8/site-packages (from kaolin==0.15.0) (8.12.3)

etc
etc

Requirement already satisfied: mypy-extensions>=0.3.0 in /opt/conda/envs/my/lib/python3.8/site-packages/mypy_extensions-1.0.0-py3.8.egg (from typing-inspect<1,>=0.4.0->dataclasses-json>=0.0.25->pygltflib->kaolin==0.15.0) (1.0.0)
Installing collected packages: kaolin
  Running setup.py develop for kaolin
    error: subprocess-exited-with-error
    
    × python setup.py develop did not run successfully.
    │ exit code: 1
    ╰─> [292 lines of output]
        Warning: passing language='c++' to cythonize() is deprecated. Instead, put "# distutils: language=c++" in your .pyx or .pxd file(s)
        /kaolin/setup.py:12: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
          from pkg_resources import parse_version
        /kaolin/setup.py:51: UserWarning: Kaolin requires cython == 0.29.20, but couldn't find the module installed. This setup is gonna try to install it...
          warnings.warn(
        /kaolin/setup.py:74: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
        !!
        
                ********************************************************************************
                Requirements should be satisfied by a PEP 517 installer.
                If you are using pip, you can try `pip install --use-pep517`.
                ********************************************************************************
        
        !!
          dist.Distribution().fetch_build_eggs(missing_modules)
        INFO - running develop
        /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
        !!
        
                ********************************************************************************
                Please avoid running ``setup.py`` and ``easy_install``.
                Instead, use pypa/build, pypa/installer or other
                standards-based tools.
        
                See https://github.com/pypa/setuptools/issues/917 for details.
                ********************************************************************************
        
        !!
          easy_install.initialize_options(self)
        /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
        !!
        
                ********************************************************************************
                Please avoid running ``setup.py`` directly.
                Instead, use pypa/build, pypa/installer or other
                standards-based tools.
        
                See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
                ********************************************************************************
        
        !!
          self.initialize_options()
        INFO - running egg_info
        INFO - creating kaolin.egg-info
        INFO - writing kaolin.egg-info/PKG-INFO
        INFO - writing dependency_links to kaolin.egg-info/dependency_links.txt
        INFO - writing requirements to kaolin.egg-info/requires.txt
        INFO - writing top-level names to kaolin.egg-info/top_level.txt
        INFO - writing manifest file 'kaolin.egg-info/SOURCES.txt'
        INFO - dependency /opt/conda/envs/my/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h won't be automatically included in the manifest: the path must be relative
        INFO - dependency /opt/conda/envs/my/lib/python3.8/site-packages/numpy/core/include/numpy/arrayscalars.h won't be automatically included in the manifest: the path must be relative
        INFO - dependency /opt/conda/envs/my/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h won't be automatically included in the manifest: the path must be relative
        INFO - dependency /opt/conda/envs/my/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h won't be automatically included in the manifest: the path must be relative
        INFO - dependency /opt/conda/envs/my/lib/python3.8/site-packages/numpy/core/include/numpy/ufuncobject.h won't be automatically included in the manifest: the path must be relative
        INFO - reading manifest file 'kaolin.egg-info/SOURCES.txt'
        INFO - reading manifest template 'MANIFEST.in'
        INFO - adding license file 'LICENSE'
        INFO - adding license file 'LICENSE.NSCL'
        INFO - writing manifest file 'kaolin.egg-info/SOURCES.txt'
        INFO - running build_ext
        /opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py:388: UserWarning: The detected CUDA version (11.3) has a minor version mismatch with the version that was used to compile PyTorch (11.8). Most likely this shouldn't be a problem.
          warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
        INFO - building 'kaolin._C' extension
        INFO - creating /kaolin/build


etc 
etc

        INFO - creating /kaolin/build/temp.linux-x86_64-cpython-38/kaolin/csrc/render/spc
        Emitting ninja build file /kaolin/build/temp.linux-x86_64-cpython-38/build.ninja...
        Compiling objects...
        Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
        [1/38] /usr/local/cuda/bin/nvcc  -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -I/opt/conda/envs/my/lib/python3.8/site-packages/torch/include -I/opt/conda/envs/my/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/my/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/envs/my/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/envs/my/include/python3.8 -c -c /kaolin/kaolin/csrc/metrics/sided_distance_cuda.cu -o /kaolin/build/temp.linux-x86_64-cpython-38/kaolin/csrc/metrics/sided_distance_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -std=c++17
        FAILED: /kaolin/build/temp.linux-x86_64-cpython-38/kaolin/csrc/metrics/sided_distance_cuda.o
        /usr/local/cuda/bin/nvcc  -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -I/opt/conda/envs/my/lib/python3.8/site-packages/torch/include -I/opt/conda/envs/my/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/my/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/envs/my/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/envs/my/include/python3.8 -c -c /kaolin/kaolin/csrc/metrics/sided_distance_cuda.cu -o /kaolin/build/temp.linux-x86_64-cpython-38/kaolin/csrc/metrics/sided_distance_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 -std=c++17
        nvcc fatal   : Unsupported gpu architecture 'compute_89'

etc
etc

    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "<pip-setuptools-caller>", line 34, in <module>
      File "/data/FoundationPose/bundlesdf/mycuda/setup.py", line 21, in <module>
        setup(
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/__init__.py", line 103, in setup
        return distutils.core.setup(**attrs)
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
        return run_commands(dist)
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
        dist.run_commands()
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
        self.run_command(cmd)
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command
        super().run_command(command)
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
        cmd_obj.run()
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
        self.install_for_development()
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py", line 109, in install_for_development
        self.run_command('build_ext')
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
        self.distribution.run_command(command)
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command
        super().run_command(command)
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
        cmd_obj.run()
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 88, in run
        _build_ext.run(self)
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
        self.build_extensions()
      File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
        build_ext.build_extensions(self)
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
        self._build_extensions_serial()
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
        self.build_extension(ext)
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 249, in build_extension
        _build_ext.build_extension(self, ext)
      File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
        objects = self.compiler.compile(
      File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
        _write_ninja_file_and_compile_objects(
      File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
        _run_ninja_build(
      File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error compiling objects for extension
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

Docker build

(base) mona@ada:/data/FoundationPose/docker$ docker build --network host -t foundationpose .
[+] Building 2090.7s (16/16) FINISHED                                                                                                                                                        docker:default
 => [internal] load build definition from dockerfile                                                                                                                                                   0.0s
 => => transferring dockerfile: 3.06kB                                                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                                        0.0s
 => [internal] load metadata for docker.io/nvidia/cudagl:11.3.0-devel-ubuntu20.04                                                                                                                      0.5s
 => CACHED [ 1/12] FROM docker.io/nvidia/cudagl:11.3.0-devel-ubuntu20.04@sha256:9d87d2a797e19927369a0c8d83c3c5b3699b7f05c14638128c30ffdac600ef75                                                       0.0s
 => [ 2/12] RUN ln -snf /usr/share/zoneinfo/US/Pacific /etc/localtime && echo US/Pacific > /etc/timezone                                                                                               0.1s
 => [ 3/12] RUN apt-get update --fix-missing &&     apt-get install -y libgtk2.0-dev &&     apt-get install -y wget bzip2 ca-certificates curl git vim tmux g++ gcc build-essential cmake checkinst  105.9s
 => [ 4/12] RUN cd / && git clone https://github.com/pybind/pybind11 &&    cd pybind11 && git checkout v2.10.0 &&    mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DPYBIND11_INSTAL  9.3s
 => [ 5/12] RUN cd / && wget https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.gz &&    tar xvzf ./eigen-3.4.0.tar.gz &&    cd eigen-3.4.0 &&    mkdir build &&    cd build &&    cma  9.4s 
 => [ 6/12] RUN cd / && wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /miniconda.sh &&     /bin/bash /miniconda.sh -b -p /opt/conda &&    ln -s /opt/conda/e  17.7s 
 => [ 7/12] RUN conda init bash &&    echo "conda activate my" >> ~/.bashrc &&    conda activate my &&    pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 --index-url ht  846.6s 
 => [ 8/12] RUN cd / && git clone --recursive https://github.com/NVIDIAGameWorks/kaolin                                                                                                               23.4s 
 => [ 9/12] RUN conda activate my && cd /kaolin &&    FORCE_CUDA=1 python setup.py develop                                                                                                           957.0s 
 => [10/12] RUN cd / && git clone https://github.com/NVlabs/nvdiffrast &&    conda activate my && cd /nvdiffrast && pip install .                                                                      3.8s 
 => [11/12] RUN conda activate my &&    pip install scikit-image meshcat webdataset omegaconf pypng roma seaborn opencv-contrib-python openpyxl wandb imgaug Ninja xlsxwriter timm albumentations xa  81.7s 
 => [12/12] RUN ln -sf /bin/bash /bin/sh                                                                                                                                                               0.1s 
 => exporting to image                                                                                                                                                                                35.1s 
 => => exporting layers                                                                                                                                                                               35.1s 
 => => writing image sha256:1473fc0ec8fd8054503c5ae36ae46d1ce6a1c82791581255cd8f1eb284d4c61f                                                                                                           0.0s 
 => => naming to docker.io/library/foundationpose                             

I tried the other version of cudagl in other issues and none worked. The version I made the docker with is:
FROM nvidia/cudagl:11.3.0-devel-ubuntu20.04

(base) mona@ada:/data/FoundationPose$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
(base) mona@ada:/data/FoundationPose$ nvidia-smi
Fri Apr  5 17:42:15 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 6000 Ada Gene...    On  | 00000000:52:00.0  On |                  Off |
| 30%   42C    P8              27W / 300W |   1674MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      4427      G   /usr/lib/xorg/Xorg                          831MiB |
|    0   N/A  N/A      4598      G   /usr/bin/gnome-shell                         54MiB |
|    0   N/A  N/A      5274      G   ...ures=SpareRendererForSitePerProcess      124MiB |
|    0   N/A  N/A    124228      G   ...2208806,17303107387989153126,262144       62MiB |
|    0   N/A  N/A   1357737      G   ...sion,SpareRendererForSitePerProcess      101MiB |
|    0   N/A  N/A   1603088      G   ...irefox/3941/usr/lib/firefox/firefox      232MiB |
|    0   N/A  N/A   2730073      G   ...AAAAAAAACAAAAAAAAAA= --shared-files       74MiB |
|    0   N/A  N/A   3037902      G   meshlab                                       8MiB |
+---------------------------------------------------------------------------------------+
(base) mona@ada:/data/FoundationPose$ uname -a
Linux ada 6.5.0-25-generic #25~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Feb 20 16:09:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
(base) mona@ada:/data/FoundationPose$ lsb_release -a
LSB Version:	core-11.1.0ubuntu4-noarch:security-11.1.0ubuntu4-noarch
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.3 LTS
Release:	22.04
Codename:	jammy

@wenbowen123
Copy link
Collaborator

#27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants