Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in OpenGL with from_numpy #6922

Open
whorfin opened this issue Dec 18, 2022 · 4 comments
Open

Crash in OpenGL with from_numpy #6922

whorfin opened this issue Dec 18, 2022 · 4 comments
Assignees

Comments

@whorfin
Copy link

whorfin commented Dec 18, 2022

Describe the bug
from_numpy throws an GL_INVALID_OPERATION error and core dumps with OpenGL backend for fields of certain sizes. Fields of same size work fine with Vulkan. This is on KBL GT2 i915

To Reproduce

#!/usr/bin/python3
import sys
import taichi as ti
import numpy as np

#ti.init(arch=ti.vulkan)    # works
ti.init(arch=ti.opengl)     # fails

print("[ ] allocating",end="")
sys.stdout.flush()
target_np = np.full((4096, 4096, 3), .5)    # fails w/ opengl
#target_np = np.full((2048, 2048, 3), .5)    # works w/ opengl
target = ti.Vector.field(3, ti.f32, shape=(target_np.shape[0], target_np.shape[1]))
print("\r[+")
sys.stdout.flush()

print("[ ] numpy assign astype",end="")
sys.stdout.flush()
target_np = target_np.astype(np.float32)
print("\r[+")
sys.stdout.flush()

print("[ ] taichi from_numpy",end="")
sys.stdout.flush()
target.from_numpy(target_np)    # this is what fails
print("\r[+")
sys.stdout.flush()

Log/Screenshots
Using the smaller allocation or Vulkan backend, as indicated in comments, everything works fine.
As submitted above it crashes:

$ python3 whorfin-testogl-submit.py
[Taichi] version 1.3.0, llvm 15.0.4, commit 0f25b95e, linux, python 3.10.6
[Taichi] Starting on arch=opengl
[+] allocating
[+] numpy assign astype
[ ] taichi from_numpy[E 12/18/22 12:55:57.432 73408] [opengl_device.cpp:check_opengl_error@181] glDispatchCompute: GL_INVALID_VALUE


Traceback (most recent call last):
  File "/home/whorfin/whorfin art/taichi/electrostatic/whorfin-testogl-submit.py", line 25, in <module>
    target.from_numpy(target_np)    # this is what fails
  File "/usr/local/lib/python3.10/dist-packages/taichi/lang/util.py", line 298, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/taichi/lang/matrix.py", line 1666, in from_numpy
    self._from_external_arr(arr)
  File "/usr/local/lib/python3.10/dist-packages/taichi/lang/util.py", line 298, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/taichi/lang/matrix.py", line 1650, in _from_external_arr
    ext_arr_to_matrix(arr, self, as_vector)
  File "/usr/local/lib/python3.10/dist-packages/taichi/lang/kernel_impl.py", line 945, in wrapped
    return primal(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/taichi/lang/kernel_impl.py", line 872, in __call__
    return self.runtime.compiled_functions[key](*args)
  File "/usr/local/lib/python3.10/dist-packages/taichi/lang/kernel_impl.py", line 797, in func__
    raise e from None
  File "/usr/local/lib/python3.10/dist-packages/taichi/lang/kernel_impl.py", line 794, in func__
    t_kernel(launch_ctx)
RuntimeError: [opengl_device.cpp:check_opengl_error@181] glDispatchCompute: GL_INVALID_VALUE
[E 12/18/22 12:55:57.515 73408] [opengl_device.cpp:check_opengl_error@181] glBindBufferBase: GL_INVALID_OPERATION


[E 12/18/22 12:55:57.516 73408] [opengl_device.cpp:check_opengl_error@181] glBindBufferBase: GL_INVALID_OPERATION


terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
Aborted (core dumped)

Additional comments
I don't know what's going on here, seems to be a bind failure, is it possible the initial allocation failed but was not checked?
An error message closer to the actual error would be excellent if possible, ie if it is the initial field allocation?

FWIW:

$ glxinfo -l | grep GL_MAX_TEXTURE_SIZE
    GL_MAX_TEXTURE_SIZE = 16384
    GL_MAX_TEXTURE_SIZE = 16384
$ ti diagnose
[Taichi] version 1.3.0, llvm 15.0.4, commit 0f25b95e, linux, python 3.10.6

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.10.6 (main, Nov  2 2022, 18:53:38) [GCC 11.3.0]
system: linux
executable: /usr/bin/python3
platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.35
architecture: 64bit ELF
uname: uname_result(system='Linux', node='shiv', release='5.15.0-56-generic', version='#62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022', machine='x86_64')
locale: en_US.UTF-8
PATH: /home/whorfin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH: ['/usr/local/bin', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages']

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.1 LTS
Release:	22.04
Codename:	jammy



import: <module 'taichi' from '/usr/local/lib/python3.10/dist-packages/taichi/__init__.py'>

cc: False
cpu: True
metal: False
opengl: True
cuda: False
vulkan: True

`glewinfo` not available: [Errno 2] No such file or directory: 'glewinfo'

`nvidia-smi` not available: [Errno 2] No such file or directory: 'nvidia-smi'
[Taichi] version 1.3.0, llvm 15.0.4, commit 0f25b95e, linux, python 3.10.6

[Taichi] version 1.3.0, llvm 15.0.4, commit 0f25b95e, linux, python 3.10.6
[Taichi] Starting on arch=x64

[Taichi] version 1.3.0, llvm 15.0.4, commit 0f25b95e, linux, python 3.10.6
[Taichi] Starting on arch=opengl

[W 12/18/22 12:50:17.333 73068] [cuda_driver.cpp:load_lib@36] libcuda.so lib not found.
[W 12/18/22 12:50:17.333 73068] [misc.py:adaptive_arch_select@766] Arch=[<Arch.cuda: 5>] is not supported, falling back to CPU
[Taichi] version 1.3.0, llvm 15.0.4, commit 0f25b95e, linux, python 3.10.6
[Taichi] Starting on arch=x64

[Taichi] version 1.3.0, llvm 15.0.4, commit 0f25b95e, linux, python 3.10.6

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

                                   TAICHI EXAMPLES                                    
 ──────────────────────────────────────────────────────────────────────────────────── 
  0: ad_gravity               24: laplace                 48: physarum                
  1: comet                    25: laplace_equation        49: print_offset            
  2: cornell_box              26: mandelbrot_zoom         50: rasterizer              
  3: diff_sph                 27: marching_squares        51: regression              
  4: euler                    28: mass_spring_3d_ggui     52: sdf_renderer            
  5: explicit_activation      29: mass_spring_game        53: simple_derivative       
  6: export_mesh              30: mass_spring_game_ggui   54: simple_texture          
  7: export_ply               31: mciso_advanced          55: simple_uv               
  8: export_videos            32: mgpcg                   56: snow_phaseField         
  9: fem128                   33: mgpcg_advanced          57: stable_fluid            
  10: fem128_ggui             34: minimal                 58: stable_fluid_ggui       
  11: fem99                   35: minimization            59: stable_fluid_graph      
  12: fractal                 36: mpm128                  60: taichi_bitmasked        
  13: fractal3d_ggui          37: mpm128_ggui             61: taichi_dynamic          
  14: fullscreen              38: mpm3d                   62: taichi_logo             
  15: game_of_life            39: mpm3d_ggui              63: taichi_ngp              
  16: gui_image_io            40: mpm88                   64: taichi_sparse           
  17: gui_widgets             41: mpm88_graph             65: texture_graph           
  18: implicit_fem            42: mpm99                   66: tutorial                
  19: implicit_mass_spring    43: mpm_lagrangian_forces   67: two_stream_instability  
  20: initial_value_problem   44: nbody                   68: vortex_rings            
  21: jacobian                45: odop_solar              69: waterwave               
  22: karman_vortex_street    46: patterns                                            
  23: keyboard                47: pbf2d                                               
 ──────────────────────────────────────────────────────────────────────────────────── 
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.42s

Consider attaching this log when maintainers ask about system information.
>>> Running time: 7.09s
@taichi-gardener taichi-gardener moved this to Untriaged in Taichi Lang Dec 18, 2022
@erizmr erizmr self-assigned this Dec 23, 2022
@erizmr erizmr moved this from Untriaged to Todo in Taichi Lang Dec 23, 2022
@erizmr
Copy link
Contributor

erizmr commented Dec 30, 2022

Hi @whorfin , sorry for the late reply. I didn't reproduce the issue on my own machine. Could you please provide more information about your hardware? Thanks.

image

@whorfin
Copy link
Author

whorfin commented Dec 30, 2022

System:
  Host: hostname Kernel: 5.15.0-56-generic x86_64 bits: 64 Desktop: LXQt 0.17.1
  Distro: Ubuntu 22.04.1 LTS (Jammy Jellyfish)
Machine:
  Type: Laptop System: Razer product: Blade Stealth v: 2.04
    serial: <superuser required>
  Mobo: Razer model: Razer serial: <superuser required> UEFI: Razer v: 8.02
    date: 02/22/2018
CPU:
  Info: dual core model: Intel Core i7-7500U bits: 64 type: MT MCP cache:
    L2: 512 KiB
  Speed (MHz): avg: 1083 min/max: 400/3500 cores: 1: 700 2: 700 3: 1023
    4: 1911
Graphics:
  Device-1: Intel HD Graphics 620 driver: i915 v: kernel

my hosts which have nvidia GPUs have not shown this particular behavior fwiw

if you change the "4096, 4096" allocation to 8K you might see if you repro?

@erizmr
Copy link
Contributor

erizmr commented Dec 30, 2022

I tried with 8192, 8192 but still fail to repro on my host with a nvidia GPU. I am not sure whether it is a Intel HD Graphics 620 specific problem.

@whorfin
Copy link
Author

whorfin commented Dec 30, 2022

I was not able to repro on nvidia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants