Skip to content

Examples calling Warp precompiled (cached) kernels directly from C++ (without Python)

License

Notifications You must be signed in to change notification settings

erwincoumans/warp_cpp

Repository files navigation

warp_cpp

Examples calling NVIDIA Warp precompiled (cached) kernels directly from C++ (without Python)

Usage

Install Warp

pip install numpy
git clone https://github.com/NVIDIA/warp.git
cd warp
python build_lib.py --cuda_path=/usr/local/warp
pip install -e .

example_add_float_array.py has this Warp kernel:

@wp.kernel
def add_float_arrays(dest: wp.array(dtype=wp.float32),
             a: wp.array(dtype=wp.float32),
             b: wp.array(dtype=wp.float32)):

    tid = wp.tid()
    dest[tid] = a[tid]+b[tid]

Run this Warp Python example to jit compile the example_add_float_array.py

python example_add_float_array.py
Warp 0.8.2 initialized:
   CUDA Toolkit: 11.8, Driver: 12.1
   Devices:
     "cpu"    | Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
     "cuda:0" | NVIDIA GeForce RTX 4090 Laptop GPU (sm_89)
   Kernel cache: C:\Users\erwin\AppData\Local\NVIDIA Corporation\warp\Cache\0.8.2
Module __main__ load on device 'cpu' took 15.43 ms
dest.numpy()= [100.5      101.98572  103.47143  104.95714  106.442856 107.92857
 109.41428  110.9     ]

Note the Kernel cache path, it will contain the compiled Warp kernel as CPU DLL or CUDA PTX binary. The mangled names of the compiled kernel and methods are stored in a mangled_names_*.txt.

Use cmake, compile and run the C++ example_add_float_array_cpu.cpp and example_add_float_array_cuda.cpp Use the variable WARP_PATH to point to the location of the Warp source root When running example_add_float_array_cpu, pass the location to the .dll / .so Warp compiled kernel file:

cmake -DWARP_PATH=/home/ecoumans/dev/warp_cpp/warp .
cmake --build .
./example_add_float_array_cpu
a:1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8
b:100 200 300 400 500 600 700 800
Sum:101.1 202.2 303.3 404.4 505.5 606.6 707.7 808.8

You can extract the Warp kernel C++ signature from the Warp generated c++ code in the cache/gen folder (gen\wp___main__.cpp)

// CPU entry points
void (*add_float_arrays_cpu_forward)(launch_bounds_t dim,
array_t<float32> var_dest,
array_t<float32> var_a,
array_t<float32> var_b);

The array_t and other Warp definitions are in the builtin.h header file.

Same for the cuda version, make sure to change device = "cpu" into device="cuda" to compile to CUDA/PTX. Pass the path to the PTX file as first argument:

./example_add_float_array_cuda
hello cuda world
CUDA driver version:12010
CUDA device count:1
len=22741
Sum:101.1 202.2 303.3 404.4 505.5 606.6 707.7 808.8

About

Examples calling Warp precompiled (cached) kernels directly from C++ (without Python)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published