-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import cpu-sparse prototype #818
base: main
Are you sure you want to change the base?
Conversation
This brings in the cpu-sparse prototype from the piet-next branch of the piet repo. No substantive changes, but cpu-sparse is renamed vello_hybrid and piet-next is renamed vello_api. Quite a bit of editing to satisfy the lint monster. There was a half-written SIMD implementation of flattening, that's removed. It should be finished and re-added, as it's a good speedup.
// Note: getting rid of this predicate might help with | ||
// auto-vectorization. That said, just getting rid of | ||
// it causes artifacts (which may be divide by zero). | ||
if dy != 0.0 { | ||
let xx0 = startx + (y0 - starty) * slope; | ||
let xx1 = startx + (y1 - starty) * slope; | ||
let xmin0 = xx0.min(xx1); | ||
let xmax = xx0.max(xx1); | ||
let xmin = xmin0.min(1.0) - 1e-6; | ||
let b = xmax.min(1.0); | ||
let c = b.max(0.0); | ||
let d = xmin.max(0.0); | ||
let a = (b + 0.5 * (d * d - c * c) - xmin) / (xmax - xmin); | ||
areas[x as usize][y] += a * dy; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this indeed appears to improve things, cutting the time needed to generate strips by ~50% on my platform. See: https://xi.zulipchat.com/#narrow/channel/197075-gpu/topic/CPU.20sparse.20strip.20rendering.20to.20pixels/near/500401632.
// Note: getting rid of this predicate might help with | |
// auto-vectorization. That said, just getting rid of | |
// it causes artifacts (which may be divide by zero). | |
if dy != 0.0 { | |
let xx0 = startx + (y0 - starty) * slope; | |
let xx1 = startx + (y1 - starty) * slope; | |
let xmin0 = xx0.min(xx1); | |
let xmax = xx0.max(xx1); | |
let xmin = xmin0.min(1.0) - 1e-6; | |
let b = xmax.min(1.0); | |
let c = b.max(0.0); | |
let d = xmin.max(0.0); | |
let a = (b + 0.5 * (d * d - c * c) - xmin) / (xmax - xmin); | |
areas[x as usize][y] += a * dy; | |
} | |
let xx0 = startx + (y0 - starty) * slope; | |
let xx1 = startx + (y1 - starty) * slope; | |
let xmin0 = xx0.min(xx1); | |
let xmax = xx0.max(xx1); | |
let xmin = xmin0.min(1.0) - 1e-6; | |
let b = xmax.min(1.0); | |
let c = b.max(0.0); | |
let d = xmin.max(0.0); | |
let a = (b + 0.5 * (d * d - c * c) - xmin) / (xmax - xmin); | |
// This is (on x86 and ARM) a branchless way to set `a` to 0 if it is NaN. | |
areas[x as usize][y] += a.abs().max(0.).copysign(a) * dy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately just removing it causes artifacts, as some divide by zeros get through. If you look at the explicit SIMD version, there's an is_finite
check. Very likely that could be adapted, but I haven't spent a lot of time on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to areas[x as usize][y] += a.abs().max(0.).copysign(a) * dy;
should prevent NaNs branchlessly (but not infinities). I'd have to try it with this branch explicitly, but running Ghostcript_Tiger.svg
through an older revision of cpu-sparse-experiments
with this patch results in an identical image.
An alternative is areas[x as usize][y] += (a * dy).abs().max(0.).copysign(a * dy)
, which would also handle setting infinities to 0.0
.
Renders a simple scene to the GPU, first by doing coarse rasterization the same as cpu-sparse, then doing a single draw call.
This brings in the cpu-sparse prototype from the piet-next branch of the piet repo. No substantive changes, but cpu-sparse is renamed vello_hybrid and piet-next is renamed vello_api.
Quite a bit of editing to satisfy the lint monster.
There was a half-written SIMD implementation of flattening, that's removed. It should be finished and re-added, as it's a good speedup.