You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The are two issues encountered when using rocm 6.0.2.
The first one might be related to building a rocm container on a machine lacking an AMD gpu. The build of rocm used amdgpu-install -y --usecase=hiplibsdk,rocm,hip,opencl to install, which in earlier versions defined __HIP_PLATFORM_AMD__ but this not defined. The result is configure will fail
checking for hip/hip_runtime.h... no
configure: error: unable to find required headers
This is uninformative and a deeper look at the config.log shows
configure:4638: checking for hip/hip_runtime.h
configure:4638: gcc-12 -c -I/opt/rocm/include -I/opt/rocm/include -I/usr/include -I/usr/include conftest.c >&5
In file included from conftest.c:60:
/opt/rocm/include/hip/hip_runtime.h:66:2: error: #error ("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__");
66 | #error("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__");
| ^~~~~
In file included from /opt/rocm/include/hip/hip_runtime.h:70:
/opt/rocm/include/hip/hip_runtime_api.h:8575:2: error: #error ("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__");
8575 | #error("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__");
| ^~~~~
In file included from /opt/rocm/include/hip/hip_runtime.h:71:
/opt/rocm/include/hip/library_types.h:75:2: error: #error ("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__");
75 | #error("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__");
| ^~~~~
In file included from /opt/rocm/include/hip/hip_runtime.h:73:
/opt/rocm/include/hip/hip_vector_types.h:38:2: error: #error ("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__");
38 | #error("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__");
| ^~~~~
It is just a matter of defining the compilation argument but it wasn't necessary in previous versions to do so explicitly.
The other issue is a compilation issue. With changes made to hipPointerAttribute_t the code will not compile, giving a message
make[2]: Entering directory '/tmp/aws-ofi-rccl/src'
CC nccl_ofi_net.lo
nccl_ofi_net.c: In function 'get_cuda_device':
nccl_ofi_net.c:497:17: error: 'struct hipPointerAttribute_t' has no member named 'memoryType'
497 | if (attr.memoryType == hipMemoryTypeDevice) {
| ^
make[2]: *** [Makefile:435: nccl_ofi_net.lo] Error 1
The fix is to update this line to use attr.type.
Operating System
Ubuntu 22.04 LTS
CPU
AMD EPYC-Rome with no GPU
GPU
AMD Instinct MI250X
ROCm Version
ROCm 6.0.0
ROCm Component
No response
Steps to Reproduce
Here is the section from the Docker recipe and shows the instructions that I am running.
ARG ROCM_VERSION=6.0.2
RUN echo "Building rocm ${ROCM_VERSION}" \
&& rocm_major=$(echo ${ROCM_VERSION} | sed "s/\./ /g" | awk '{print $1}') \
&& rocm_minor=$(echo ${ROCM_VERSION} | sed "s/\./ /g" | awk '{print $2}') \
&& ROCM_INSTALLER_VERSION=$(echo ${ROCM_VERSION} | sed "s/\./0/g") \
# if rocm version does not list minor patch version number add 00 to end of installer version
&& if [ $(echo ${ROCM_VERSION} | sed "s/\./\n/g" | wc -l) -eq "2" ]; then ROCM_INSTALLER_VERSION=${ROCM_INSTALLER_VERSION}"00"; fi \
&& ROCM_INSTALLER_VERSION=${ROCM_INSTALLER_VERSION}"-1" \
&& ROCM_INSTALLER_VERSION=${rocm_major}.${rocm_minor}.${ROCM_INSTALLER_VERSION} \
&& cd /tmp/build \
# && wget https://bootstrap.pypa.io/get-pip.py \
# && python3 get-pip.py \
&& roc_url="https://repo.radeon.com/amdgpu-install/"${ROCM_VERSION}"/ubuntu/jammy/amdgpu-install_"${ROCM_INSTALLER_VERSION}"_all.deb" \
&& echo ${roc_url} \
&& wget ${roc_url} \
&& apt -y install ./amdgpu-install_${ROCM_INSTALLER_VERSION}_all.deb \
&& amdgpu-install -y --usecase=hiplibsdk,rocm,hip,opencl \
&& cd /tmp/build && rm -rf amdgpu-install_${ROCM_INSTALLER_VERSION}_all.deb \
echo "Done"
# Install aws-ofi-rccl
ARG RCCL_CONFIGURE_OPTIONS="--prefix=/usr --with-mpi=/usr --with-libfabric=/usr --with-hip=/opt/rocm --with-rccl=/opt/rocm CC=gcc-12 CXX=g++-12"
RUN echo "Build rccl" \
&& git clone https://github.com/ROCmSoftwarePlatform/aws-ofi-rccl.git \
&& cd aws-ofi-rccl \
&& ./autogen.sh \
&& ./configure ${RCCL_CONFIGURE_OPTIONS}} \
&& make -j 16 \
&& make install \
&& cd /tmp \
&& rm -rf /tmp/build \
&& echo "Done"
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered:
Problem Description
The are two issues encountered when using rocm 6.0.2.
amdgpu-install -y --usecase=hiplibsdk,rocm,hip,opencl
to install, which in earlier versions defined__HIP_PLATFORM_AMD__
but this not defined. The result is configure will failThis is uninformative and a deeper look at the config.log shows
It is just a matter of defining the compilation argument but it wasn't necessary in previous versions to do so explicitly.
The other issue is a compilation issue. With changes made to
hipPointerAttribute_t
the code will not compile, giving a messageThe fix is to update this line to use
attr.type
.Operating System
Ubuntu 22.04 LTS
CPU
AMD EPYC-Rome with no GPU
GPU
AMD Instinct MI250X
ROCm Version
ROCm 6.0.0
ROCm Component
No response
Steps to Reproduce
Here is the section from the Docker recipe and shows the instructions that I am running.
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: