-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested Operations on "large" fields fails on Vulkan and OpenGL, works on Cuda #3544
Comments
Addendum:
I do notice with my "working properly" Vulkan targets in taichi, if I run something complicated with low FPS, the window system [both Windows and Linux] seems to "lock up" while kernels are running in between "frames" of compute. This is also true for OpenGL targets, but not for CUDA. Perhaps this is relevant? |
Okay, I will check it later
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Ye Kuang ***@***.***>
Sent: Monday, November 22, 2021 3:09:17 PM
To: taichi-dev/taichi ***@***.***>
Cc: Chang Yu ***@***.***>; Mention ***@***.***>
Subject: Re: [taichi-dev/taichi] Nested Operations on "large" fields fails on Vulkan and OpenGL, works on Cuda (Issue #3544)
Hmm, @bobcao3<https://github.com/bobcao3> @g1n0st<https://github.com/g1n0st> could you help take a look on this? I don't expect it will be a problem writing a plain loop with 512 iterations in either SPIR-V or GLSL. So maybe something to do with our codegen (and Vulkan resource management)?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#3544 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIE5PHBUFTUQOKZKOAPEDSTUNHUB3ANCNFSM5IIRSIUA>.
|
Sorry for the late reply... I cannot reproduce this in my 2060... vulkan backend just works fine... |
Maybe related to resource management? @bobcao3 |
Since lower numbers are needed on "weaker" GPUs to avoid the problem and the 2060 >> my 1070, I wonder if you would see this if you increased |
Confirming the above: On Linux with nvidia cards, I did not see the "GPU hung" message either from CLI or in dmesg which I observed with the i915 machine |
Hi, this seems to be an issue with the drivers' compiler where it tries to unroll the loop, and then it's simply too much for it to handle. I don't have an immediate fix in mind, I feel like without the compiler's proper support for big loops there's not much we can do. Maybe try to split it into multiple kernels? It seems you can do atomic accumulation into the resulting buffer from the code snippet. |
For my particular use case, I was able to tile the inner loop using atomic accumulation, and call the parameterized kernel multiple times from python context. Thanks for the suggestion. |
#3791 is in the latest release, can you try to update and test it out on the Vulkan backend? |
🎉 One note - I now get a whole bunch of these with Vulkan backend:
|
However...
I can't replicate the crash in my simple test code, but am seeing something very odd there nontheless. Will post when I have something worth looking at. |
Whew, OK. The above crash is a red herring, a different new bug reported as #3857 What I am running into which is still relevant to this issue is a situation where I have not exhaustively tested but definitely see this on my resource-constrained i915 target, described more fully in the Here's some repro information for the scheduling/allocation issue; note how OpenGL behaves differently. The timing in square brackets helps explain what seems to be going on. This code: import sys
import math
import taichi as ti
import numpy as np
from time import monotonic
#ti.init(arch=ti.opengl) # fine
ti.init(arch=ti.vulkan) # over-schedules kernels unless ti.sync() is done
fieldWidth = 1024
fieldHeight = 688
field_chunk = 32
@ti.func
def samplePeriodic(field: ti.template(), u, v):
P = ti.Vector([int(u), int(v)])
shape = ti.Vector(field.shape)
P = ti.mod(P, shape)
return field[int(P)]
@ti.kernel
def initialize():
for x in range(in_field.shape[0]):
for y in range(in_field.shape[1]):
in_field[x,y] = ti.sin(x/10 * math.pi) * ti.sin(y/5 * math.pi)
@ti.kernel
def compute_chunked(yi: ti.i32, yn: ti.i32):
for px, py in in_field:
F = 0.
for x in range(out_field.shape[0]):
for y in range(yi, yn):
Q = samplePeriodic(in_field, x, y)
F += Q
out_field[px, py] += F
in_field = ti.field(ti.f32, shape=(fieldWidth, fieldHeight))
out_field = ti.field(ti.f32, shape=(fieldWidth, fieldHeight))
initialize()
print("Wait...", end="")
sys.stdout.flush()
out_field.fill(0.)
numy = int(ti.ceil(out_field.shape[1]/field_chunk))
last = monotonic()
for i in range(0, numy):
now = monotonic()
print("{}/{}[{:#.2f}]...".format(i+1, numy, now - last), end="")
last = now
sys.stdout.flush()
compute_chunked(i*field_chunk,
min(out_field.shape[1], (i+1)*field_chunk))
#ti.sync() # Vulkan fails without this
print()
ti.imshow(out_field.to_numpy()) When run against OpenGL backend without
When run against Vulkan backed with
When run against Vulkan backend without
|
Describe the bug
I have a set of kernels which perform nested operations over fields. When the field operations involve computed field access, in my case a periodic access , and the inner loop is "too big", Vulkan crashes with a "failed to submit command buffer". OpenGL just exits with no warnings. Cuda and CPU are fine.
Here is a minimal repro case; "compute()" is what fails, with symptoms showing up right when it would have finished, seemingly in the next operations.
To Reproduce
On Vulkan and OpenGL, this fails (on nvidia GTX 1070) while Cuda ["ti.gpu"] works fine.
Changing the fieldHeight from 768 to 512, it succeeds on all three.
Replacing
samplePeriodic()
with a straight field access also runs on all targets. While that would be suitable for this trivial repro-case, I have an application where it is not, hence this report.Log/Screenshots
00vulkan-fail.txt
00opengl-fail.txt
Additional comments
PS - I 🖤 Taichi
The text was updated successfully, but these errors were encountered: