Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vulkan] ti.sync() changes program behavior on Vulkan #3791

Closed
AmesingFlank opened this issue Dec 14, 2021 · 5 comments · Fixed by #3818
Closed

[Vulkan] ti.sync() changes program behavior on Vulkan #3791

AmesingFlank opened this issue Dec 14, 2021 · 5 comments · Fixed by #3818
Assignees
Labels
potential bug Something that looks like a bug but not yet confirmed

Comments

@AmesingFlank
Copy link
Collaborator

AmesingFlank commented Dec 14, 2021

Example of when ti.sync() changes program behavior on Vulkan: #3790. In this example, the program only behaves correctly when the ti.sync() is added.

On vulkan, we aggressively batch consecutive kernel launches into a single command buffer. We probably need to add a buffer_barrier or something between consecutive compute dispatches. I will investigate this.

@AmesingFlank AmesingFlank added the potential bug Something that looks like a bug but not yet confirmed label Dec 14, 2021
@AmesingFlank AmesingFlank self-assigned this Dec 14, 2021
@bobcao3
Copy link
Collaborator

bobcao3 commented Dec 14, 2021

We do have barriers between them.

cmdlist->memory_barrier();

@AmesingFlank
Copy link
Collaborator Author

Yeah, I saw it. So why is this happening :((

@AmesingFlank
Copy link
Collaborator Author

AmesingFlank commented Dec 14, 2021

To repro:

import taichi as ti

ti.init(ti.vulkan)

def parallel_sort(x):
    N = x.shape[0]

    @ti.kernel
    def sort_stage(x:ti.template(), N:int, p:int, k:int, invocations:int):
        for inv in range(invocations):
            j = k%p + inv * 2 * k
            for i in range(0, min(k,N-j-k)):
                a = i+j
                b = i+j+k
                if int(a / (p*2)) == int(b / (p*2)):
                    val_a = x[a]
                    val_b = x[b]
                    if val_a > val_b:
                        x[a] = val_b
                        x[b] = val_a

    p = 1
    while p < N:
        k = p
        while k >= 1:
            invocations = int((N-k-k%p) / (2*k)) + 1
            sort_stage(x,N,p,k, invocations)
            # ti.sync()
            k = int(k/2)
        p = int(p * 2)
  
def test_sort():
    def test_sort_for_dtype(dtype,N):
        x = ti.field(dtype, N)

        @ti.kernel
        def fill():
            for i in x:
                x[i] = ti.random()*N
        
        fill()
        parallel_sort(x)

        x_host = x.to_numpy()
        
        for i in range(N-1):
            assert x_host[i] <= x_host[i+1]

    test_sort_for_dtype(ti.i32,1)
    test_sort_for_dtype(ti.i32,256)
    test_sort_for_dtype(ti.i32,100001)
    test_sort_for_dtype(ti.f32,1)
    test_sort_for_dtype(ti.f32,256)
    test_sort_for_dtype(ti.f32,100001)

test_sort()

@bobcao3
Copy link
Collaborator

bobcao3 commented Dec 14, 2021

What if we change

Change it to src_stage_mask=BOTTOM_OF_PIPE and dst_stage_mask=TOP_OF_PIPE

@AmesingFlank
Copy link
Collaborator Author

Doesn't seem to fix it :<

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug Something that looks like a bug but not yet confirmed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants