-
Notifications
You must be signed in to change notification settings - Fork 78
Closed
Labels
Description
Running the simple HIP code:
#include <hip/hip_runtime.h>
#define CHECK(Res) \
if (Res != hipSuccess) { \
printf(#Res " Failed!\n"); \
return 1; \
}
int main() {
hipDevice_t Dev;
CHECK(hipDeviceGet(&Dev, 0));
hipCtx_t Ctx;
CHECK(hipDevicePrimaryCtxRetain(&Ctx, Dev));
CHECK(hipCtxSetCurrent(Ctx));
hipEvent_t Ev;
CHECK(hipEventCreateWithFlags(&Ev, hipEventDefault));
CHECK(hipEventRecord(Ev, 0));
CHECK(hipEventDestroy(Ev));
CHECK(hipDevicePrimaryCtxRelease(Dev));
}
Crashes when run in parallel:
$ cat run.sh
export AMD_LOG_LEVEL=4
for i in {1..500}; do
{
output_file=$(mktemp) # Create a temporary file for the output
./a.out &> $output_file
if [[ $? -ne 0 ]]; then # Check if the exit status is non-zero
cat "$output_file" > error.log # Save the output
fi
rm "$output_file" # Remove the temporary file
} &
done
wait
Here is the error.log:
$ cat error.log
:3:rocdevice.cpp :434 : 136942947233 us: 44761: [tid:0x7f8f2a317f00] Initializing HSA stack.
:3:comgrctx.cpp :33 : 136957862005 us: 44761: [tid:0x7f8f2a317f00] Loading COMGR library.
:3:rocdevice.cpp :202 : 136957862078 us: 44761: [tid:0x7f8f2a317f00] Numa selects cpu agent[3]=0x308ec0(fine=0x3090e0,coarse=0x304c40) for gpu agent=0x305930
:3:rocdevice.cpp :1635: 136957862488 us: 44761: [tid:0x7f8f2a317f00] HMM support: 1, xnack: 0, direct host access: 0
:4:rocdevice.cpp :2012: 136957864821 us: 44761: [tid:0x7f8f2a317f00] Allocate hsa host memory 0x7f8cf9200000, size 0x101000
:4:rocdevice.cpp :2012: 136957865040 us: 44761: [tid:0x7f8f2a317f00] Allocate hsa host memory 0x7f8cf9000000, size 0x101000
:3:rocdevice.cpp :202 : 136957872821 us: 44761: [tid:0x7f8f2a317f00] Numa selects cpu agent[3]=0x308ec0(fine=0x3090e0,coarse=0x304c40) for gpu agent=0x3279c0
:3:rocdevice.cpp :1635: 136957873016 us: 44761: [tid:0x7f8f2a317f00] HMM support: 1, xnack: 0, direct host access: 0
:4:rocdevice.cpp :2012: 136957873079 us: 44761: [tid:0x7f8f2a317f00] Allocate hsa host memory 0x7f8f2a324000, size 0x70
:4:rocdevice.cpp :2012: 136957873385 us: 44761: [tid:0x7f8f2a317f00] Allocate hsa host memory 0x7f8cf8e00000, size 0x101000
:4:rocdevice.cpp :2012: 136957873805 us: 44761: [tid:0x7f8f2a317f00] Allocate hsa host memory 0x7f8cf8c00000, size 0x101000
:4:runtime.cpp :83 : 136957873877 us: 44761: [tid:0x7f8f2a317f00] init
:3:hip_context.cpp :48 : 136957873881 us: 44761: [tid:0x7f8f2a317f00] Direct Dispatch: 1
:3:hip_device.cpp :169 : 136957873903 us: 44761: [tid:0x7f8f2a317f00] hipDeviceGet: Returned hipSuccess :
:3:hip_context.cpp :383 : 136957873918 us: 44761: [tid:0x7f8f2a317f00] hipDevicePrimaryCtxRetain ( 0x7ffe1b0eec00, 0 )
:3:hip_context.cpp :394 : 136957873922 us: 44761: [tid:0x7f8f2a317f00] hipDevicePrimaryCtxRetain: Returned hipSuccess :
:3:hip_context.cpp :179 : 136957873930 us: 44761: [tid:0x7f8f2a317f00] hipCtxSetCurrent ( context:0x38c670 )
:3:hip_context.cpp :193 : 136957873934 us: 44761: [tid:0x7f8f2a317f00] hipCtxSetCurrent: Returned hipSuccess :
:3:hip_event.cpp :321 : 136957873942 us: 44761: [tid:0x7f8f2a317f00] hipEventCreateWithFlags ( 0x7ffe1b0eebf8, 0 )
:3:hip_event.cpp :327 : 136957873948 us: 44761: [tid:0x7f8f2a317f00] hipEventCreateWithFlags: Returned hipSuccess : event:0x38d980
:3:hip_event.cpp :396 : 136957873955 us: 44761: [tid:0x7f8f2a317f00] hipEventRecord ( event:0x38d980, stream:<null> )
:3:rocdevice.cpp :2822: 136957873966 us: 44761: [tid:0x7f8f2a317f00] number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4
:4:command.cpp :349 : 136959305223 us: 44761: [tid:0x7f8f2a317f00] Command (InternalMarker) enqueued: 0x38ea60
run.sh: line 4: 40305 Segmentation fault ./a.out &> $output_file
Using rocm-5.6.0.
$ rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD EPYC 7A53 64-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7A53 64-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 0
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 130797524(0x7cbcfd4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 130797524(0x7cbcfd4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 130797524(0x7cbcfd4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: AMD EPYC 7A53 64-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7A53 64-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 1
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 1
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 132112468(0x7dfe054) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 132112468(0x7dfe054) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 132112468(0x7dfe054) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 3
*******
Name: AMD EPYC 7A53 64-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7A53 64-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 2
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 2
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 132112468(0x7dfe054) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 132112468(0x7dfe054) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 132112468(0x7dfe054) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 4
*******
Name: AMD EPYC 7A53 64-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7A53 64-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 3
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 3
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 132090580(0x7df8ad4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 132090580(0x7df8ad4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 132090580(0x7df8ad4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 5
*******
Name: gfx90a
Uuid: GPU-a5c82df98194e170
Marketing Name:
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 4
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 29704(0x7408)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1700
BDFID: 49408
Internal Node ID: 4
Compute Unit: 110
SIMDs per CU: 4
Shader Engines: 8
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 2048(0x800)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 67092480(0x3ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 67092480(0x3ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*******
Agent 6
*******
Name: gfx90a
Uuid: GPU-01c9def4489c62b5
Marketing Name:
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 5
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 29704(0x7408)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1700
BDFID: 50688
Internal Node ID: 5
Compute Unit: 110
SIMDs per CU: 4
Shader Engines: 8
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 2048(0x800)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 67092480(0x3ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 67092480(0x3ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
OS:
$ cat /etc/os-release
NAME="SLES"
VERSION="15-SP4"
VERSION_ID="15.4"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp4"
DOCUMENTATION_URL="https://documentation.suse.com/"