Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opengl] Randomly breaking down mpm128.py #633

Closed
archibate opened this issue Mar 21, 2020 · 5 comments
Closed

[opengl] Randomly breaking down mpm128.py #633

archibate opened this issue Mar 21, 2020 · 5 comments
Assignees
Labels
potential bug Something that looks like a bug but not yet confirmed

Comments

@archibate
Copy link
Collaborator

No description provided.

@archibate archibate added the potential bug Something that looks like a bug but not yet confirmed label Mar 21, 2020
@archibate
Copy link
Collaborator Author

I surprisingly found that I can't reproduce this bug now..

@archibate
Copy link
Collaborator Author

I surprisingly found that I can't reproduce this bug now..

Maybe you should move group_size = xxx back into reset() to cause that offload fault.

@archibate archibate self-assigned this Mar 26, 2020
@archibate archibate changed the title Randomly breaking down mpm128.py in OpenGL [opengl] Randomly breaking down mpm128.py Mar 26, 2020
@yuanming-hu
Copy link
Member

I built the OpenGL backend on my end and run into this issue on mpm99.py. I also tested mpm128 and got a similar issue. Do you have an idea? :-)

python mpm99.py 
[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-_zvu7wvc
[Taichi] sandbox prepared
[Taichi] version 0.5.8, cuda 10.0, commit a66eba07, python 3.6.9
[I 03/26/20 20:33:30.700] [program.cpp:materialize_layout@255] OpenGL root buffer size: 1114112 B
[W 03/26/20 20:33:30.700] [opengl_api.cpp:initialize_opengl@194] OpenGL backend currently WIP, MAY NOT WORK
[I 03/26/20 20:33:30.869] [opengl_api.cpp:initialize_opengl@223] [glsl] OpenGL 4.3.0 NVIDIA 430.26
[E 03/26/20 20:33:30.893] [opengl_api.cpp:compile@62] [glsl] error while compiling shader:
  1 #version 430 core
  2 precision highp float;
  3 #define S25 const int // place float
  4 #define S25_stride 4 // sizeof(float)
  5 #define S24_ch const int
  6 #define S24_get0(a_) (a_) // S25
  7 #define S24_ch_stride (S25_stride)
  8 #define S24 const int // dense
  9 #define S24_n 16384
 10 #define S24_stride (S24_ch_stride * S24_n)
 11 #define S24_children(a_, i) ((a_) + S24_ch_stride * (i))
 12 #define S23 const int // place float
 13 #define S23_stride 4 // sizeof(float)
 14 #define S22 const int // place float
 15 #define S22_stride 4 // sizeof(float)
 16 #define S21_ch const int
 17 #define S21_get0(a_) (a_) // S22
 18 #define S21_get1(a_) ((a_) + (S22_stride)) // S23
 19 #define S21_ch_stride (S22_stride + S23_stride)
 20 #define S21 const int // dense
 21 #define S21_n 16384
 22 #define S21_stride (S21_ch_stride * S21_n)
 23 #define S21_children(a_, i) ((a_) + S21_ch_stride * (i))
 24 #define S20 const int // place float
 25 #define S20_stride 4 // sizeof(float)
 26 #define S19_ch const int
 27 #define S19_get0(a_) (a_) // S20
 28 #define S19_ch_stride (S20_stride)
 29 #define S19 const int // dense
 30 #define S19_n 16384
 31 #define S19_stride (S19_ch_stride * S19_n)
 32 #define S19_children(a_, i) ((a_) + S19_ch_stride * (i))
 33 #define S18 const int // place int
 34 #define S18_stride 4 // sizeof(int)
 35 #define S17_ch const int
 36 #define S17_get0(a_) (a_) // S18
 37 #define S17_ch_stride (S18_stride)
 38 #define S17 const int // dense
 39 #define S17_n 16384
 40 #define S17_stride (S17_ch_stride * S17_n)
 41 #define S17_children(a_, i) ((a_) + S17_ch_stride * (i))
 42 #define S16 const int // place float
 43 #define S16_stride 4 // sizeof(float)
 44 #define S15 const int // place float
 45 #define S15_stride 4 // sizeof(float)
 46 #define S14 const int // place float
 47 #define S14_stride 4 // sizeof(float)
 48 #define S13 const int // place float
 49 #define S13_stride 4 // sizeof(float)
 50 #define S12_ch const int
 51 #define S12_get0(a_) (a_) // S13
 52 #define S12_get1(a_) ((a_) + (S13_stride)) // S14
 53 #define S12_get2(a_) ((a_) + (S13_stride + S14_stride)) // S15
 54 #define S12_get3(a_) ((a_) + (S13_stride + S14_stride + S15_stride)) // S16
 55 #define S12_ch_stride (S13_stride + S14_stride + S15_stride + S16_stride)
 56 #define S12 const int // dense
 57 #define S12_n 16384
 58 #define S12_stride (S12_ch_stride * S12_n)
 59 #define S12_children(a_, i) ((a_) + S12_ch_stride * (i))
 60 #define S11 const int // place float
 61 #define S11_stride 4 // sizeof(float)
 62 #define S10 const int // place float
 63 #define S10_stride 4 // sizeof(float)
 64 #define S9 const int // place float
 65 #define S9_stride 4 // sizeof(float)
 66 #define S8 const int // place float
 67 #define S8_stride 4 // sizeof(float)
 68 #define S7_ch const int
 69 #define S7_get0(a_) (a_) // S8
 70 #define S7_get1(a_) ((a_) + (S8_stride)) // S9
 71 #define S7_get2(a_) ((a_) + (S8_stride + S9_stride)) // S10
 72 #define S7_get3(a_) ((a_) + (S8_stride + S9_stride + S10_stride)) // S11
 73 #define S7_ch_stride (S8_stride + S9_stride + S10_stride + S11_stride)
 74 #define S7 const int // dense
 75 #define S7_n 16384
 76 #define S7_stride (S7_ch_stride * S7_n)
 77 #define S7_children(a_, i) ((a_) + S7_ch_stride * (i))
 78 #define S6 const int // place float
 79 #define S6_stride 4 // sizeof(float)
 80 #define S5 const int // place float
 81 #define S5_stride 4 // sizeof(float)
 82 #define S4_ch const int
 83 #define S4_get0(a_) (a_) // S5
 84 #define S4_get1(a_) ((a_) + (S5_stride)) // S6
 85 #define S4_ch_stride (S5_stride + S6_stride)
 86 #define S4 const int // dense
 87 #define S4_n 16384
 88 #define S4_stride (S4_ch_stride * S4_n)
 89 #define S4_children(a_, i) ((a_) + S4_ch_stride * (i))
 90 #define S3 const int // place float
 91 #define S3_stride 4 // sizeof(float)
 92 #define S2 const int // place float
 93 #define S2_stride 4 // sizeof(float)
 94 #define S1_ch const int
 95 #define S1_get0(a_) (a_) // S2
 96 #define S1_get1(a_) ((a_) + (S2_stride)) // S3
 97 #define S1_ch_stride (S2_stride + S3_stride)
 98 #define S1 const int // dense
 99 #define S1_n 16384
100 #define S1_stride (S1_ch_stride * S1_n)
101 #define S1_children(a_, i) ((a_) + S1_ch_stride * (i))
102 #define S0_ch const int
103 #define S0_get0(a_) (a_) // S1
104 #define S0_get1(a_) ((a_) + (S1_stride)) // S4
105 #define S0_get2(a_) ((a_) + (S1_stride + S4_stride)) // S7
106 #define S0_get3(a_) ((a_) + (S1_stride + S4_stride + S7_stride)) // S12
107 #define S0_get4(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride)) // S17
108 #define S0_get5(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride)) // S19
109 #define S0_get6(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride + S19_stride)) // S21
110 #define S0_get7(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride + S19_stride + S21_stride)) // S24
111 #define S0_ch_stride (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride + S19_stride + S21_stride + S24_stride)
112 #define S0 const int // root
113 #define S0_n 1
114 #define S0_stride (S0_ch_stride * S0_n)
115 #define S0_children(a_, i) ((a_) + S0_ch_stride * (i))
116 
117 layout(std430, binding = 0) buffer data_i32 { int _data_i32_[]; };
118 layout(std430, binding = 0) buffer data_f32 { float _data_f32_[]; };
119 layout(std430, binding = 0) buffer data_f64 { double _data_f64_[]; };
120 #define _mem_i32(x) _data_i32_[(x) >> 2]
121 #define _mem_f32(x) _data_f32_[(x) >> 2]
122 #define _mem_f64(x) _data_f64_[(x) >> 3]
123 #define _Ax_(x) x
124 #define _At_(x) _Ax_(_at_##x(x))
125 uvec4 _rand_;
126 
127 void _init_rand()
128 {
129   uint i = gl_GlobalInvocationID.x;
130   _rand_.x = 123456789 * i * 1000000007;
131   _rand_.y = 362436069;
132   _rand_.z = 521288629;
133   _rand_.w = 88675123;
134 }
135 
136 uint _rand_u32()
137 {
138   uint t = _rand_.x ^ (_rand_.x << 11);
139   _rand_.xyz = _rand_.yzw;
140   _rand_.x = _rand_.y;
141   _rand_.y = _rand_.z;
142   _rand_.z = _rand_.w;
143   _rand_.w = (_rand_.w ^ (_rand_.w >> 19)) ^ (t ^ (t >> 8));
144   return _rand_.w * 1000000007;
145 }
146 
147 float _rand_f32()
148 {
149   return float(_rand_u32()) * (1.0 / 4294967296.0);
150 }
151 
152 double _rand_f64()
153 {
154   return double(_rand_f32());
155 }
156 
157 int _rand_i32()
158 {
159   return int(_rand_u32());
160 }
161 
162 void initialize_c6_00()
163 { // range for
164   // range known at compile time
165   const int _thread_id_ = int(gl_GlobalInvocationID.x);
166   if (_thread_id_ >= 9000) return;
167   const int _it_value_ = 0 + _thread_id_ * 1;
168   const float tmp5 = _rand_f32();
169   const float tmp6 = 0.2;
170   const float tmp7 = float(tmp5 * tmp6);
171   const float tmp8 = 0.3;
172   const float tmp9 = float(tmp7 + tmp8);
173   const int tmp10 = _it_value_;
174   const int tmp11 = 3000;
175   const int tmp12 = int(tmp10 * tmp11 >= 0 ? abs(tmp10) / abs(tmp11) : sign(tmp10) * (abs(tmp10) + abs(tmp11) - 1) / tmp11);
176   const float tmp13 = 0.1;
177   const float tmp14 = float(tmp12);
178   const float tmp15 = float(tmp14 * tmp13);
179   const float tmp16 = float(tmp9 + tmp15);
180   S0 tmp19 = 0;
181   const int tmp199 = 0;
182   S0_ch tmp21 = S0_children(tmp19, tmp199);
183   S1 tmp22 = S0_get0(tmp21);
184   const int tmp23 = (((0 + tmp10) >> 0) & ((1 << 14) - 1));
185   const int tmp201 = 1;
186   const int tmp202 = int(tmp23 * tmp201);
187   const int tmp203 = int(tmp199 + tmp202);
188   S1_ch tmp25 = S1_children(tmp22, tmp203);
189   S2 tmp26 = S1_get0(tmp25);
190   #define _at_tmp26 _mem_f32
191   _At_(tmp26) = tmp16;
192   const float tmp29 = _rand_f32();
193   const float tmp30 = float(tmp29 * tmp6);
194   const float tmp31 = 0.05;
195   const float tmp32 = float(tmp30 + tmp31);
196   const float tmp33 = 0.32;
197   const float tmp34 = float(tmp14 * tmp33);
198   const float tmp35 = float(tmp32 + tmp34);
199   S3 tmp45 = S1_get1(tmp25);
200   #define _at_tmp45 _mem_f32
201   _At_(tmp45) = tmp35;
202   S17 tmp53 = S0_get4(tmp21);
203   S17_ch tmp56 = S17_children(tmp53, tmp203);
204   S18 tmp57 = S17_get0(tmp56);
205   #define _at_tmp57 _mem_i32
206   _At_(tmp57) = tmp12;
207   const float tmp61 = 0.0;
208   S4 tmp66 = S0_get1(tmp21);
209   S4_ch tmp69 = S4_children(tmp66, tmp203);
210   S5 tmp70 = S4_get0(tmp69);
211   #define _at_tmp70 _mem_f32
212   _At_(tmp70) = tmp61;
213   S6 tmp82 = S4_get1(tmp69);
214   #define _at_tmp82 _mem_f32
215   _At_(tmp82) = tmp61;
216   const float tmp86 = 1.0;
217   S12 tmp91 = S0_get3(tmp21);
218   S12_ch tmp94 = S12_children(tmp91, tmp203);
219   S13 tmp95 = S12_get0(tmp94);
220   #define _at_tmp95 _mem_f32
221   _At_(tmp95) = tmp86;
222   S14 tmp107 = S12_get1(tmp94);
223   #define _at_tmp107 _mem_f32
224   _At_(tmp107) = tmp61;
225   S15 tmp119 = S12_get2(tmp94);
226   #define _at_tmp119 _mem_f32
227   _At_(tmp119) = tmp61;
228   S16 tmp131 = S12_get3(tmp94);
229   #define _at_tmp131 _mem_f32
230   _At_(tmp131) = tmp86;
231   S19 tmp139 = S0_get5(tmp21);
232   S19_ch tmp142 = S19_children(tmp139, tmp203);
233   S20 tmp143 = S19_get0(tmp142);
234   #define _at_tmp143 _mem_f32
235   _At_(tmp143) = tmp86;
236 }
237 
238 void main()
239 {
240   _init_rand();
241   initialize_c6_00();
242 }
243 layout(local_size_x = 1792, local_size_y = 1, local_size_z = 1) in;

0(243) : error C7604: layout(layout_size_x = 1792) exceeds maximum value

@archibate
Copy link
Collaborator Author

archibate commented Mar 27, 2020

Not the same issue. This is because a hardcoded magic number, @archibate will find that glGetInteger(GL_MAX_THREADS_PER_GROUP); later.

Found: https://stackoverflow.com/questions/39004898/get-maximum-workgroup-size-for-compute-shaders

@archibate
Copy link
Collaborator Author

Fixed by 90055dd in #666.

yuanming-hu added a commit that referenced this issue Apr 2, 2020
* use GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS instead of 1792 for portability

* modify mpm128.py to reproduce bug #633

* Update opengl_api.cpp

* misc

* gather #define _at_{}

* [skip ci] use ptr_signat

* no #define _At_

[skip ci] fix typo

[skip ci] fix again

* attempt to fix opengl on test_loops

* [skip ci] really fix test_loops

* [skip ci] enable _GLSL_DEBUG & try improve used.atomic_float for all

* [skip ci] gtmp test

* [skip ci] fix calloc null when gtmp_size uninited

* no atan(double, double)

* [skip ci] better inform TI_ARCH

* [skip ci] Update misc/make_changelog.py

* hardcoded _GLSL_NVIDIA for built-in atomic float ops

* [skip ci] share work about stride_map_

[skip ci] really did stride_map_ test passing

* [skip ci] save my power to sleep

* [skip ci] fix const mutable by no const qua struct_compiled_

* [skip ci] also class_children_map_

* [skip ci] no use struct_compiled->source_code

* [skip ci] no macro for _earg_i32

* [skip ci] no macro like _arg_{}({})

* also make data/gtmp/extr no macroed

* use fancier short_name() to make NV GLSL compiler ridiculously faster

* no extra float(...) bracing BinaryOpStmt

* [skip ci] remove useless TI_INFO some

* auto detect GL_NV_shader_atomic_float

* [skip ci] fix typo in atomic sim

* apply reviews (thanks to @k-ye!)

* [skip ci] fix mpm88/99 bug (do we have better solution?)

* [skip ci] disable _GLSL_DEBUG

* guard short_name.cpp with TI_NAMESPACE_BEGIN/END

* [skip ci] use STR macro by k-ye for shader code

* [skip ci] enforce code format

* [skip ci] add clang-format off/on guard for STR

* [skip ci] enforce code format

* platform/opengl -> backends/opengl (like metal does)

* [skip ci] use opengl/shaders/*.glsl.h for STR(..)

* [skip ci] minor shader code adjustments

Co-authored-by: Yuanming Hu <yuanmhu@gmail.com>
Co-authored-by: Taichi Gardener <taichigardener@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug Something that looks like a bug but not yet confirmed
Projects
None yet
Development

No branches or pull requests

2 participants