warp_cpp

Examples calling NVIDIA Warp precompiled (cached) kernels directly from C++ (without Python)

Usage

Install Warp

pip install numpy
git clone https://github.com/NVIDIA/warp.git
cd warp
python build_lib.py --cuda_path=/usr/local/warp
pip install -e .

example_add_float_array.py has this Warp kernel:

@wp.kernel
def add_float_arrays(dest: wp.array(dtype=wp.float32),
             a: wp.array(dtype=wp.float32),
             b: wp.array(dtype=wp.float32)):

    tid = wp.tid()
    dest[tid] = a[tid]+b[tid]

Run this Warp Python example to jit compile the example_add_float_array.py

python example_add_float_array.py
Warp 0.8.2 initialized:
   CUDA Toolkit: 11.8, Driver: 12.1
   Devices:
     "cpu"    | Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
     "cuda:0" | NVIDIA GeForce RTX 4090 Laptop GPU (sm_89)
   Kernel cache: C:\Users\erwin\AppData\Local\NVIDIA Corporation\warp\Cache\0.8.2
Module __main__ load on device 'cpu' took 15.43 ms
dest.numpy()= [100.5      101.98572  103.47143  104.95714  106.442856 107.92857
 109.41428  110.9     ]

Note the Kernel cache path, it will contain the compiled Warp kernel as CPU DLL or CUDA PTX binary. The mangled names of the compiled kernel and methods are stored in a mangled_names_*.txt.

Use cmake, compile and run the C++ example_add_float_array_cpu.cpp and example_add_float_array_cuda.cpp Use the variable WARP_PATH to point to the location of the Warp source root When running example_add_float_array_cpu, pass the location to the .dll / .so Warp compiled kernel file:

cmake -DWARP_PATH=/home/ecoumans/dev/warp_cpp/warp .
cmake --build .
./example_add_float_array_cpu
a:1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8
b:100 200 300 400 500 600 700 800
Sum:101.1 202.2 303.3 404.4 505.5 606.6 707.7 808.8

You can extract the Warp kernel C++ signature from the Warp generated c++ code in the cache/gen folder (gen\wp___main__.cpp)

// CPU entry points
void (*add_float_arrays_cpu_forward)(launch_bounds_t dim,
array_t<float32> var_dest,
array_t<float32> var_a,
array_t<float32> var_b);

The array_t and other Warp definitions are in the builtin.h header file.

Same for the cuda version, make sure to change device = "cpu" into device="cuda" to compile to CUDA/PTX. Pass the path to the PTX file as first argument:

./example_add_float_array_cuda
hello cuda world
CUDA driver version:12010
CUDA device count:1
len=22741
Sum:101.1 202.2 303.3 404.4 505.5 606.6 707.7 808.8

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
example_add_float_array.py		example_add_float_array.py
example_add_float_array_cpu.cpp		example_add_float_array_cpu.cpp
example_add_float_array_cuda.cpp		example_add_float_array_cuda.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

warp_cpp

Usage

About

Releases

Packages

Contributors 2

Languages

License

erwincoumans/warp_cpp

Folders and files

Latest commit

History

Repository files navigation

warp_cpp

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages