Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt] Flatten if(0) and if(1) #1393

Merged
merged 3 commits into from
Jul 4, 2020
Merged

Conversation

xumingkuan
Copy link
Contributor

@xumingkuan xumingkuan requested a review from archibate July 4, 2020 05:03
Copy link
Collaborator

@archibate archibate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@codecov
Copy link

codecov bot commented Jul 4, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@f2bd982). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1393   +/-   ##
=========================================
  Coverage          ?   66.72%           
=========================================
  Files             ?       37           
  Lines             ?     5196           
  Branches          ?      933           
=========================================
  Hits              ?     3467           
  Misses            ?     1567           
  Partials          ?      162           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2bd982...e486dd9. Read the comment docs.

Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -38,4 +38,4 @@ def func():
ret[None] = s

func()
print(ret[None])
assert ret[None] == 55
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this ;-)

@yuanming-hu yuanming-hu merged commit 6b75b6b into taichi-dev:master Jul 4, 2020
@k-ye
Copy link
Member

k-ye commented Jul 5, 2020

At least on Metal, this broke

if 1:
for i in range(n):
a, b = b, a + b
return b

I think the if 1 in that test is to explicitly make the for loop serial, but now it's been promoted into a range-for kernel.

  • IR with this PR:
kernel {
  $0 = offloaded  
  body {
    <i32*x1> $1 = global tmp var (offset = 8 B)
    <i32 x1> $2 = const [0]
    <i32*x1> $3 : global store [$1 <- $2]
    <i32*x1> $4 = global tmp var (offset = 4 B)
    <i32 x1> $5 = const [1]
    <i32*x1> $6 : global store [$4 <- $5]
    <i32 x1> $7 = arg[0]
    <i32*x1> $8 = global tmp var (offset = 0 B)
    <i32*x1> $9 : global store [$8 <- $7]
  }
  $10 = offloaded range_for(0, tmp(offset=0B)) block_dim=adaptive  
  body {
    <i32*x1> $11 = global tmp var (offset = 4 B)
    <i32 x1> $12 = global load $11
    <i32*x1> $13 = global tmp var (offset = 8 B)
    <i32 x1> $14 = global load $13
    <i32 x1> $15 = add $14 $12
    <i32*x1> $16 : global store [$13 <- $12]
    <i32*x1> $17 : global store [$11 <- $15]
  }
  $18 = offloaded  
  body {
    <i32*x1> $19 = global tmp var (offset = 4 B)
    <i32 x1> $20 = global load $19
    <i32 x1> $21 : kernel return $20
  }
}
  • IR before
kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [0]
    <i32 x1> $2 = const [1]
    <i32 x1> $3 = alloca
    <i32 x1> $4 : local store [$3 <- $1]
    <i32 x1> $5 = alloca
    <i32 x1> $6 : local store [$5 <- $2]
    $7 : if $2 {
      <i32 x1> $8 = arg[0]
      $9 : for in range($1, $8) (vectorize 1) block_dim=adaptive {
        <i32 x1> $10 = local load [ [$5[0]]]
        <i32 x1> $11 = local load [ [$3[0]]]
        <i32 x1> $12 = add $11 $10
        <i32 x1> $13 : local store [$3 <- $10]
        <i32 x1> $14 : local store [$5 <- $12]
      }
    }
    <i32 x1> $15 = local load [ [$5[0]]]
    <i32 x1> $16 : kernel return $15
  }
}

@k-ye k-ye mentioned this pull request Jul 5, 2020
@archibate
Copy link
Collaborator

archibate commented Jul 5, 2020

At least on Metal, this broke

Is this because some configs not sync with LLVM ones on Metal?
Sorry... could not verify OpenGL behavior on my ancient laptop, @yuanming-hu Could you do me a favor?

I think ultimately we should:

  1. Have an option to explicitly disable range-for from being parallelized instead of ad-hoc if 1:, e.g.:
@ti.kernel
def func():
  for i in ti.serial(range(4)):
    print(i)
  1. Have an possibility to make non-top-level range-for being parallelized, which could be useful in some situations like:
  • vertex shader (parallelize over vertices&faces) -> fragment shader (parallelize over pixels in each face)

@k-ye
Copy link
Member

k-ye commented Jul 5, 2020

Is this because some configs not sync with LLVM ones on Metal?

I don't think so, because the IR itself already seems buggy. (I don't think CI has test on CUDA, does it?)

Have an option to explicitly disable range-for from being parallelized instead of ad-hoc if 1:, e.g.:

+1. This will be a lot less error-prone.

Have an possibility to make non-top-level range-for being parallelized, which could be useful in some situations like:

I feel like this is a very non-trivial work. It fundamentally changes how Taichi produces the backend code. In certain cases, maybe Taichi can be made smart enough to figure out how to flatten out a nested loop into the top level. Still, that will likely involve some more IR analysis and transformation works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants