Import cpu-sparse prototype #818

raphlinus · 2025-02-18T04:12:58Z

This brings in the cpu-sparse prototype from the piet-next branch of the piet repo. No substantive changes, but cpu-sparse is renamed vello_hybrid and piet-next is renamed vello_api.

Quite a bit of editing to satisfy the lint monster.

There was a half-written SIMD implementation of flattening, that's removed. It should be finished and re-added, as it's a good speedup.

This brings in the cpu-sparse prototype from the piet-next branch of the piet repo. No substantive changes, but cpu-sparse is renamed vello_hybrid and piet-next is renamed vello_api. Quite a bit of editing to satisfy the lint monster. There was a half-written SIMD implementation of flattening, that's removed. It should be finished and re-added, as it's a good speedup.

tomcur · 2025-02-18T13:41:37Z

vello_hybrid/src/strip.rs

+                        // Note: getting rid of this predicate might help with
+                        // auto-vectorization. That said, just getting rid of
+                        // it causes artifacts (which may be divide by zero).
+                        if dy != 0.0 {
+                            let xx0 = startx + (y0 - starty) * slope;
+                            let xx1 = startx + (y1 - starty) * slope;
+                            let xmin0 = xx0.min(xx1);
+                            let xmax = xx0.max(xx1);
+                            let xmin = xmin0.min(1.0) - 1e-6;
+                            let b = xmax.min(1.0);
+                            let c = b.max(0.0);
+                            let d = xmin.max(0.0);
+                            let a = (b + 0.5 * (d * d - c * c) - xmin) / (xmax - xmin);
+                            areas[x as usize][y] += a * dy;
+                        }


Removing this indeed appears to improve things, cutting the time needed to generate strips by ~50% on my platform. See: https://xi.zulipchat.com/#narrow/channel/197075-gpu/topic/CPU.20sparse.20strip.20rendering.20to.20pixels/near/500401632.

Suggested change

// Note: getting rid of this predicate might help with

// auto-vectorization. That said, just getting rid of

// it causes artifacts (which may be divide by zero).

if dy != 0.0 {

let xx0 = startx + (y0 - starty) * slope;

let xx1 = startx + (y1 - starty) * slope;

let xmin0 = xx0.min(xx1);

let xmax = xx0.max(xx1);

let xmin = xmin0.min(1.0) - 1e-6;

let b = xmax.min(1.0);

let c = b.max(0.0);

let d = xmin.max(0.0);

let a = (b + 0.5 * (d * d - c * c) - xmin) / (xmax - xmin);

areas[x as usize][y] += a * dy;

}

let xx0 = startx + (y0 - starty) * slope;

let xx1 = startx + (y1 - starty) * slope;

let xmin0 = xx0.min(xx1);

let xmax = xx0.max(xx1);

let xmin = xmin0.min(1.0) - 1e-6;

let b = xmax.min(1.0);

let c = b.max(0.0);

let d = xmin.max(0.0);

let a = (b + 0.5 * (d * d - c * c) - xmin) / (xmax - xmin);

// This is (on x86 and ARM) a branchless way to set `a` to 0 if it is NaN.

areas[x as usize][y] += a.abs().max(0.).copysign(a) * dy;

x86: https://godbolt.org/z/TcEKTjPY6
ARM: https://godbolt.org/z/z7E1YGP8T

Unfortunately just removing it causes artifacts, as some divide by zeros get through. If you look at the explicit SIMD version, there's an is_finite check. Very likely that could be adapted, but I haven't spent a lot of time on it.

The change to areas[x as usize][y] += a.abs().max(0.).copysign(a) * dy; should prevent NaNs branchlessly (but not infinities). I'd have to try it with this branch explicitly, but running Ghostcript_Tiger.svg through an older revision of cpu-sparse-experiments with this patch results in an identical image.

An alternative is areas[x as usize][y] += (a * dy).abs().max(0.).copysign(a * dy), which would also handle setting infinities to 0.0.

Renders a simple scene to the GPU, first by doing coarse rasterization the same as cpu-sparse, then doing a single draw call.

raphlinus marked this pull request as draft February 18, 2025 04:14

Fix lints in non-aarch64 cfg's

eb79130

tomcur reviewed Feb 18, 2025

View reviewed changes

raphlinus added 2 commits February 19, 2025 12:00

Start wiring up GPU render pipeline

23a973d

Renders a simple scene to the GPU, first by doing coarse rasterization the same as cpu-sparse, then doing a single draw call.

Add missing file, fix lints

cdeecc7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import cpu-sparse prototype #818

Import cpu-sparse prototype #818

raphlinus commented Feb 18, 2025

tomcur Feb 18, 2025 •

edited

Loading

tomcur Feb 18, 2025

raphlinus Feb 18, 2025

tomcur Feb 18, 2025 •

edited

Loading

Import cpu-sparse prototype #818

Are you sure you want to change the base?

Import cpu-sparse prototype #818

Conversation

raphlinus commented Feb 18, 2025

tomcur Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

tomcur Feb 18, 2025

Choose a reason for hiding this comment

raphlinus Feb 18, 2025

Choose a reason for hiding this comment

tomcur Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

tomcur Feb 18, 2025 •

edited

Loading

tomcur Feb 18, 2025 •

edited

Loading