Charts showing overall averages across benchmark scenes. (details below!)
- gaussian-splatting - original (cuda) implementation for gaussian-splatting paper diff_gaussian_rasterizer
- taichi(original) - original taichi implementation taichi_3d_gaussian_splatting
- taichi-splatting(16) - our implementation with tile_size=16
- taichi-splatting(32) - our implementation with tile_size=32
The benchmark uses reconstructions from the official gaussian-splatting implementation. These can be found from the official gaussian-splatting page. Under Pre-trained models.
Though it can also be run on any other reconstructions, this is a good common benchmark.
The script for running these benchmarks is found in the splat-viewer repository as the splat-multi-benchmark
script, such to not polute the repository here with other dependencies.
It was run with arguments splat-multi-benchmark *(/)
when in the models
folder.
-
All methods first convert splat data to the most convenient form as not perform conversions during benchmark:
- Alpha, scale etc. activations are computed for diff_gaussian_rasterization
- Gaussian parameters are packed for taichi-splatting
-
Camera views for each of the original viewpoints are renderered in succession
-
Camera image sizes use a longest-edge resizing (the original images vary in size quite a bit) and the shortest edge is scaled in proportion
- taichi_3d_gaussian_splatting images are padded to the tile size (usually 16)
-
Torch synchronization is called at the end of each benchmark
Averages are computed using a geometric mean across all the scenes (note that the 2070 average excludes a couple of results which were Out of Memory due to the lower 8Gb VRAM).
Benchmark is run on three different machines with different performance GPUs, Nvidia 2070,3090, and 4090 and at three different resolution levels, with longest size resized to 1024, 2048 and 4096. Three different implementations are tested, the original taichi_3d_gaussian_splatting taichi(original)
, the taichi-splatting at two different tile sizes taichi-splatting(16)
and taichi-splatting(32)
and the official implementation diff_gaussian_rasterization, labelled gaussian-splatting
.
Raw data: 2070, 3090, 4090, Tidied up: 2070, 3090, 4090
Overall, taichi-splatting has a small speed-up at the lowest image setting, with much larger factors of improvement at higher resolution levels. The original reference implementation likely has less overhead in terms of allocation, and per-point processing. The 4090 in particular shows larger speedups, especially at high resolution, compared to the
A chart showing the performance on one of the individual benchmarks, and the variation across scenes:
Some example profiles showing current relative costs in different parts of the rendering system can be seen for the bicycle scene on a 4090 GPU, where the backward kernel is still the largest cost, but at a much reduced level compared to the taichi_3d_gaussian_splatting implementation which was the starting point (where it was 85-90% of the time spent).