Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import cpu-sparse prototype #818

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Import cpu-sparse prototype #818

wants to merge 4 commits into from

Conversation

raphlinus
Copy link
Contributor

This brings in the cpu-sparse prototype from the piet-next branch of the piet repo. No substantive changes, but cpu-sparse is renamed vello_hybrid and piet-next is renamed vello_api.

Quite a bit of editing to satisfy the lint monster.

There was a half-written SIMD implementation of flattening, that's removed. It should be finished and re-added, as it's a good speedup.

This brings in the cpu-sparse prototype from the piet-next branch of the piet repo. No substantive changes, but cpu-sparse is renamed vello_hybrid and piet-next is renamed vello_api.

Quite a bit of editing to satisfy the lint monster.

There was a half-written SIMD implementation of flattening, that's removed. It should be finished and re-added, as it's a good speedup.
@raphlinus raphlinus marked this pull request as draft February 18, 2025 04:14
Comment on lines +145 to +159
// Note: getting rid of this predicate might help with
// auto-vectorization. That said, just getting rid of
// it causes artifacts (which may be divide by zero).
if dy != 0.0 {
let xx0 = startx + (y0 - starty) * slope;
let xx1 = startx + (y1 - starty) * slope;
let xmin0 = xx0.min(xx1);
let xmax = xx0.max(xx1);
let xmin = xmin0.min(1.0) - 1e-6;
let b = xmax.min(1.0);
let c = b.max(0.0);
let d = xmin.max(0.0);
let a = (b + 0.5 * (d * d - c * c) - xmin) / (xmax - xmin);
areas[x as usize][y] += a * dy;
}
Copy link
Member

@tomcur tomcur Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this indeed appears to improve things, cutting the time needed to generate strips by ~50% on my platform. See: https://xi.zulipchat.com/#narrow/channel/197075-gpu/topic/CPU.20sparse.20strip.20rendering.20to.20pixels/near/500401632.

Suggested change
// Note: getting rid of this predicate might help with
// auto-vectorization. That said, just getting rid of
// it causes artifacts (which may be divide by zero).
if dy != 0.0 {
let xx0 = startx + (y0 - starty) * slope;
let xx1 = startx + (y1 - starty) * slope;
let xmin0 = xx0.min(xx1);
let xmax = xx0.max(xx1);
let xmin = xmin0.min(1.0) - 1e-6;
let b = xmax.min(1.0);
let c = b.max(0.0);
let d = xmin.max(0.0);
let a = (b + 0.5 * (d * d - c * c) - xmin) / (xmax - xmin);
areas[x as usize][y] += a * dy;
}
let xx0 = startx + (y0 - starty) * slope;
let xx1 = startx + (y1 - starty) * slope;
let xmin0 = xx0.min(xx1);
let xmax = xx0.max(xx1);
let xmin = xmin0.min(1.0) - 1e-6;
let b = xmax.min(1.0);
let c = b.max(0.0);
let d = xmin.max(0.0);
let a = (b + 0.5 * (d * d - c * c) - xmin) / (xmax - xmin);
// This is (on x86 and ARM) a branchless way to set `a` to 0 if it is NaN.
areas[x as usize][y] += a.abs().max(0.).copysign(a) * dy;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately just removing it causes artifacts, as some divide by zeros get through. If you look at the explicit SIMD version, there's an is_finite check. Very likely that could be adapted, but I haven't spent a lot of time on it.

Copy link
Member

@tomcur tomcur Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to areas[x as usize][y] += a.abs().max(0.).copysign(a) * dy; should prevent NaNs branchlessly (but not infinities). I'd have to try it with this branch explicitly, but running Ghostcript_Tiger.svg through an older revision of cpu-sparse-experiments with this patch results in an identical image.

An alternative is areas[x as usize][y] += (a * dy).abs().max(0.).copysign(a * dy), which would also handle setting infinities to 0.0.

Renders a simple scene to the GPU, first by doing coarse rasterization the same as cpu-sparse, then doing a single draw call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants