[opt] Flatten if(0) and if(1) #1393

xumingkuan · 2020-07-04T00:02:31Z

Related issue = #1372

archibate

LGTM.

codecov · 2020-07-04T07:38:23Z

Codecov Report

❗ No coverage uploaded for pull request base (master@f2bd982). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master    #1393   +/-   ##
=========================================
  Coverage          ?   66.72%           
=========================================
  Files             ?       37           
  Lines             ?     5196           
  Branches          ?      933           
=========================================
  Hits              ?     3467           
  Misses            ?     1567           
  Partials          ?      162

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2bd982...e486dd9. Read the comment docs.

yuanming-hu

Thanks!

yuanming-hu · 2020-07-04T21:13:26Z

tests/python/test_while.py

@@ -38,4 +38,4 @@ def func():
        ret[None] = s

    func()
-    print(ret[None])
+    assert ret[None] == 55


Thanks for fixing this ;-)

k-ye · 2020-07-05T07:55:16Z

At least on Metal, this broke

taichi/tests/python/test_tuple_assign.py

Lines 9 to 12 in b709fb6

    
           if 1: 
        
               for i in range(n): 
        
                   a, b = b, a + b 
        
           return b

I think the if 1 in that test is to explicitly make the for loop serial, but now it's been promoted into a range-for kernel.

IR with this PR:

kernel {
  $0 = offloaded  
  body {
    <i32*x1> $1 = global tmp var (offset = 8 B)
    <i32 x1> $2 = const [0]
    <i32*x1> $3 : global store [$1 <- $2]
    <i32*x1> $4 = global tmp var (offset = 4 B)
    <i32 x1> $5 = const [1]
    <i32*x1> $6 : global store [$4 <- $5]
    <i32 x1> $7 = arg[0]
    <i32*x1> $8 = global tmp var (offset = 0 B)
    <i32*x1> $9 : global store [$8 <- $7]
  }
  $10 = offloaded range_for(0, tmp(offset=0B)) block_dim=adaptive  
  body {
    <i32*x1> $11 = global tmp var (offset = 4 B)
    <i32 x1> $12 = global load $11
    <i32*x1> $13 = global tmp var (offset = 8 B)
    <i32 x1> $14 = global load $13
    <i32 x1> $15 = add $14 $12
    <i32*x1> $16 : global store [$13 <- $12]
    <i32*x1> $17 : global store [$11 <- $15]
  }
  $18 = offloaded  
  body {
    <i32*x1> $19 = global tmp var (offset = 4 B)
    <i32 x1> $20 = global load $19
    <i32 x1> $21 : kernel return $20
  }
}

IR before

kernel {
  $0 = offloaded  
  body {
    <i32 x1> $1 = const [0]
    <i32 x1> $2 = const [1]
    <i32 x1> $3 = alloca
    <i32 x1> $4 : local store [$3 <- $1]
    <i32 x1> $5 = alloca
    <i32 x1> $6 : local store [$5 <- $2]
    $7 : if $2 {
      <i32 x1> $8 = arg[0]
      $9 : for in range($1, $8) (vectorize 1) block_dim=adaptive {
        <i32 x1> $10 = local load [ [$5[0]]]
        <i32 x1> $11 = local load [ [$3[0]]]
        <i32 x1> $12 = add $11 $10
        <i32 x1> $13 : local store [$3 <- $10]
        <i32 x1> $14 : local store [$5 <- $12]
      }
    }
    <i32 x1> $15 = local load [ [$5[0]]]
    <i32 x1> $16 : kernel return $15
  }
}

archibate · 2020-07-05T08:05:49Z

At least on Metal, this broke

Is this because some configs not sync with LLVM ones on Metal?
Sorry... could not verify OpenGL behavior on my ancient laptop, @yuanming-hu Could you do me a favor?

I think ultimately we should:

Have an option to explicitly disable range-for from being parallelized instead of ad-hoc if 1:, e.g.:

@ti.kernel
def func():
  for i in ti.serial(range(4)):
    print(i)

Have an possibility to make non-top-level range-for being parallelized, which could be useful in some situations like:

vertex shader (parallelize over vertices&faces) -> fragment shader (parallelize over pixels in each face)

k-ye · 2020-07-05T15:20:35Z

Is this because some configs not sync with LLVM ones on Metal?

I don't think so, because the IR itself already seems buggy. (I don't think CI has test on CUDA, does it?)

Have an option to explicitly disable range-for from being parallelized instead of ad-hoc if 1:, e.g.:

+1. This will be a lot less error-prone.

Have an possibility to make non-top-level range-for being parallelized, which could be useful in some situations like:

I feel like this is a very non-trivial work. It fundamentally changes how Taichi produces the backend code. In certain cases, maybe Taichi can be made smart enough to figure out how to flatten out a nested loop into the top level. Still, that will likely involve some more IR analysis and transformation works.

xumingkuan added 2 commits July 3, 2020 20:01

[opt] Flatten if(0) and if(1)

189f2fe

Fix tests

7d02f56

xumingkuan requested a review from archibate July 4, 2020 05:03

archibate approved these changes Jul 4, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into flatten-opt

e486dd9

yuanming-hu approved these changes Jul 4, 2020

View reviewed changes

yuanming-hu merged commit 6b75b6b into taichi-dev:master Jul 4, 2020

k-ye mentioned this pull request Jul 5, 2020

[bug] Fix test_fibonacci #1414

Merged

xumingkuan deleted the flatten-opt branch July 7, 2020 01:05

FantasyVR mentioned this pull request Jul 8, 2020

[release] v0.6.17 #1436

Merged

xumingkuan mentioned this pull request Jul 30, 2020

[Bug] [opt] Fix compiler error when flattening if(0) and if(1) #1613

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[opt] Flatten if(0) and if(1) #1393

[opt] Flatten if(0) and if(1) #1393

xumingkuan commented Jul 4, 2020

archibate left a comment

codecov bot commented Jul 4, 2020 •

edited

Loading

yuanming-hu left a comment

yuanming-hu Jul 4, 2020

k-ye commented Jul 5, 2020

archibate commented Jul 5, 2020 •

edited

Loading

k-ye commented Jul 5, 2020

[opt] Flatten if(0) and if(1) #1393

[opt] Flatten if(0) and if(1) #1393

Conversation

xumingkuan commented Jul 4, 2020

archibate left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 4, 2020 • edited Loading

Codecov Report

yuanming-hu left a comment

Choose a reason for hiding this comment

yuanming-hu Jul 4, 2020

Choose a reason for hiding this comment

k-ye commented Jul 5, 2020

archibate commented Jul 5, 2020 • edited Loading

k-ye commented Jul 5, 2020

codecov bot commented Jul 4, 2020 •

edited

Loading

archibate commented Jul 5, 2020 •

edited

Loading