Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile / docker-compose to help streamline build process #547

Closed
wants to merge 35 commits into from
Closed

Dockerfile / docker-compose to help streamline build process #547

wants to merge 35 commits into from

Conversation

loeken
Copy link
Contributor

@loeken loeken commented Mar 24, 2023

Wanted to run in docker, used https://github.com/RedTopper`s version in #174 as a base, modified slightly

added small section in readme to explain how to start up, the defaults of this config run with < 4GB of vram

@loeken loeken mentioned this pull request Mar 24, 2023
@deece
Copy link
Contributor

deece commented Mar 25, 2023

Thanks! I was just about to start work on a similar PR.

I'm testing it now.

I think it would make more sense to use the source from the current directory, rather than pulling from the public git repo. This would make it easier for devs to test their patches within an isolated environment.

@deece
Copy link
Contributor

deece commented Mar 25, 2023

Unfortunately, it looks like testing failed:

#19` 58.41 [2/2] /usr/local/cuda/bin/nvcc  -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /build/quant_cuda_kernel.cu -o /build/build/temp.linux-x86_64-3.10/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17
#19 58.41 FAILED: /build/build/temp.linux-x86_64-3.10/quant_cuda_kernel.o 
#19 58.41 /usr/local/cuda/bin/nvcc  -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /build/quant_cuda_kernel.cu -o /build/build/temp.linux-x86_64-3.10/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17
#19 58.41 nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41           detected during:
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
#19 58.41 (61): here
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/core/TensorImpl.h(77): here
#19 58.41 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41           detected during:
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
#19 58.41 (61): here
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/qualified_name.h(73): here
#19 58.41 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41           detected during:
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
#19 58.41 (61): here
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/core/TensorImpl.h(77): here
#19 58.41 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41           detected during:
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
#19 58.41 (61): here
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/qualified_name.h(73): here
#19 58.41 
#19 58.41 /build/quant_cuda_kernel.cu(149): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41             argument types are: (double *, double)
#19 58.41           detected during instantiation of "void VecQuant2MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]" 
#19 58.41 (87): here
#19 58.41 
#19 58.41 /build/quant_cuda_kernel.cu(261): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41             argument types are: (double *, double)
#19 58.41           detected during instantiation of "void VecQuant3MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]" 
#19 58.41 (171): here
#19 58.41 
#19 58.41 /build/quant_cuda_kernel.cu(337): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41             argument types are: (double *, double)
#19 58.41           detected during instantiation of "void VecQuant4MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]" 
#19 58.41 (283): here
#19 58.41 
#19 58.41 /build/quant_cuda_kernel.cu(409): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41             argument types are: (double *, double)
#19 58.41           detected during instantiation of "void VecQuant8MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]" 
#19 58.41 (359): here
#19 58.41 
#19 58.41 4 errors detected in the compilation of "/build/quant_cuda_kernel.cu".
#19 58.42 ninja: build stopped: subcommand failed.
#19 58.42 Traceback (most recent call last):
#19 58.42   File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
#19 58.42     subprocess.run(
#19 58.42   File "/usr/lib/python3.10/subprocess.py", line 524, in run
#19 58.42     raise CalledProcessError(retcode, process.args,
#19 58.42 subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
#19 58.42 
#19 58.42 The above exception was the direct cause of the following exception:
#19 58.42 
#19 58.42 Traceback (most recent call last):
#19 58.42   File "/build/setup_cuda.py", line 4, in <module>
#19 58.42     setup(
#19 58.42   File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
#19 58.42     return distutils.core.setup(**attrs)
#19 58.42   File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
#19 58.42     dist.run_commands()
#19 58.43   File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
#19 58.43     self.run_command(cmd)
#19 58.43   File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
#19 58.43     cmd_obj.run()
#19 58.43   File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
#19 58.43     self.run_command('build')
#19 58.43   File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
#19 58.43     self.distribution.run_command(command)
#19 58.43   File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
#19 58.43     cmd_obj.run()
#19 58.43   File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
#19 58.43     self.run_command(cmd_name)
#19 58.43   File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
#19 58.43     self.distribution.run_command(command)
#19 58.43   File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
#19 58.43     cmd_obj.run()
#19 58.43   File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
#19 58.43     _build_ext.run(self)
#19 58.43   File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
#19 58.43     self.build_extensions()
#19 58.43   File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
#19 58.43     build_ext.build_extensions(self)
#19 58.43   File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
#19 58.43     self._build_extensions_serial()
#19 58.43   File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
#19 58.43     self.build_extension(ext)
#19 58.43   File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
#19 58.43     _build_ext.build_extension(self, ext)
#19 58.43   File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
#19 58.43     objects = self.compiler.compile(sources,
#19 58.43   File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
#19 58.43     _write_ninja_file_and_compile_objects(
#19 58.43   File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
#19 58.43     _run_ninja_build(
#19 58.43   File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
#19 58.44     raise RuntimeError(message) from e
#19 58.44 RuntimeError: Error compiling objects for extension
------
executor failed running [/bin/sh -c python3 setup_cuda.py bdist_wheel -d .]: exit code: 1

@deece
Copy link
Contributor

deece commented Mar 25, 2023

It looks like it wants this patch:
qwopqwop200/GPTQ-for-LLaMa#58

Bumping the GPTQ SHA to 841feedde876785bc8022ca48fd9c3ff626587e2 gets past this

@loeken
Copy link
Contributor Author

loeken commented Mar 25, 2023

@deece tried setting the specific TORCH_CUDA_ARCH_LIST in the docker-compose to what your graphics card needs? the error you posted indicated that you didnt

@deece
Copy link
Contributor

deece commented Mar 25, 2023 via email

@loeken
Copy link
Contributor Author

loeken commented Mar 25, 2023

@deece with M40 do you mean a Quadro M4000 ?

@deece
Copy link
Contributor

deece commented Mar 25, 2023 via email

@loeken
Copy link
Contributor Author

loeken commented Mar 25, 2023

https://developer.nvidia.com/cuda-gpus <- based on the docs page your M40 expects version 5.2 try changing TORCH_CUDA_ARCH_LIST from 7.5 to 5.2 in the docker-compose.yml

@deece
Copy link
Contributor

deece commented Mar 25, 2023 via email

@loeken
Copy link
Contributor Author

loeken commented Mar 25, 2023

@deece I tried your suggested sha 841feedde876785bc8022ca48fd9c3ff626587e2 and HEAD which made it fail with load_quant() missing 1 required positional argument: 'pre_layer'

I ve updated the PR and moved all configs into an .env file which might make it easier to test/compare

@deece
Copy link
Contributor

deece commented Mar 25, 2023 via email

@loeken
Copy link
Contributor Author

loeken commented Mar 27, 2023

@deece it now uses HEAD, updated it to work with the new changes ( https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode )

@MarlinMr
Copy link
Contributor

How about also preloading extentions into the docker image?

@loeken
Copy link
Contributor Author

loeken commented Mar 27, 2023

@MarlinMr mapped the extensions folder ( and a few more others, in the docker-compose )

@MarlinMr
Copy link
Contributor

Yeah, it makes sense for local configuration. But I was thinking more like pulling dependencies for the current supported extensions into the docker image.

@loeken
Copy link
Contributor Author

loeken commented Mar 27, 2023

@MarlinMr running pip3 installs for the extensions too now, using the same caching as with the others, also added port 5000 for the api via docker-compose

@loeken
Copy link
Contributor Author

loeken commented Mar 29, 2023

@oobabooga mind merging this? would make it easier to hop branches and test in docker

@deece
Copy link
Contributor

deece commented Mar 29, 2023

It might be worth squashing/refactoring the commits before merging the PR. Maybe even squashing it down to a single commit?

@deece
Copy link
Contributor

deece commented Mar 29, 2023

There's a couple of missing variables from the sample env file:

WARNING: The HOST_API_PORT variable is not set. Defaulting to a blank string.
WARNING: The CONTAINER_API_PORT variable is not set. Defaulting to a blank string.
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.text-generation-webui.ports contains an invalid type, it should be a number, or an object

@loeken
Copy link
Contributor Author

loeken commented Mar 29, 2023

yeah this PR has turned a bit into a mess i ll close this one and create a new clean one

@loeken
Copy link
Contributor Author

loeken commented Mar 29, 2023

#633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants