Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve code size and compile time for local laplacian app #7927

Merged
merged 9 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions apps/local_laplacian/Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
include ../support/Makefile.inc

.PHONY: build clean test
.SECONDARY:

build: $(BIN)/$(HL_TARGET)/process

Expand Down
27 changes: 21 additions & 6 deletions apps/local_laplacian/local_laplacian_generator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,10 @@ class LocalLaplacian : public Halide::Generator<LocalLaplacian> {
// Reintroduce color (Connelly: use eps to avoid scaling up noise w/ apollo3.png input)
Func color;
float eps = 0.01f;
color(x, y, c) = outGPyramid[0](x, y) * (floating(x, y, c) + eps) / (gray(x, y) + eps);
color(x, y, c) = input(x, y, c) * (outGPyramid[0](x, y) + eps) / (gray(x, y) + eps);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a bit-exact change? I guess it's very close and doesn't really matter (especially if it helps to avoid the boundary condition), just wanted to double-check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is not bit exact, but it made more sense to me with the change to the scaling of this term. Before it took the ratio of the input color channel to the input grayscale image, and applied that ratio to the output grayscale. Now it computes the ratio of the output grayscale to the input grayscale, and applies that as a scaling factor to the input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The only difference is which term in the numerator gets a +eps)


// Convert back to 16-bit
output(x, y, c) = cast<uint16_t>(clamp(color(x, y, c), 0.0f, 1.0f) * 65535.0f);
output(x, y, c) = cast<uint16_t>(clamp(color(x, y, c), 0.0f, 65535.0f));

/* ESTIMATES */
// (This can be useful in conjunction with RunGen and benchmarks as well
Expand Down Expand Up @@ -131,8 +131,16 @@ class LocalLaplacian : public Halide::Generator<LocalLaplacian> {

remap.compute_root();
Var yo;
output.reorder(c, x, y).split(y, yo, y, 64).parallel(yo).vectorize(x, 8);
gray.compute_root().parallel(y, 32).vectorize(x, 8);
output
.reorder(c, x, y)
.split(y, yo, y, 64)
.parallel(yo)
.vectorize(x, 8);
gray
.compute_root()
.partition(y, Partition::Never)
.parallel(y, 32)
.vectorize(x, 8);
for (int j = 1; j < 5; j++) {
inGPyramid[j]
.compute_root()
Expand All @@ -148,12 +156,19 @@ class LocalLaplacian : public Halide::Generator<LocalLaplacian> {
.store_at(output, yo)
.compute_at(output, y)
.fold_storage(y, 4)
.vectorize(x, 8);
.vectorize(x, 8, TailStrategy::RoundUp);
if (j > 1) {
// Turn off loop partitioning at higher pyramid levels. This
// shaves about 3% off code size and compile time without
// affecting performance.
inGPyramid[j].partition(x, Partition::Never);
gPyramid[j].partition(x, Partition::Never);
}
}
outGPyramid[0]
.compute_at(output, y)
.hoist_storage(output, yo)
.vectorize(x, 8);
.vectorize(x, 8, TailStrategy::RoundUp);
for (int j = 5; j < J; j++) {
inGPyramid[j].compute_root();
gPyramid[j].compute_root().parallel(k);
Expand Down
1 change: 1 addition & 0 deletions src/Generator.h
Original file line number Diff line number Diff line change
Expand Up @@ -3052,6 +3052,7 @@ class NamesInterface {
using LoopLevel = Halide::LoopLevel;
using MemoryType = Halide::MemoryType;
using NameMangling = Halide::NameMangling;
using Partition = Halide::Partition;
using Pipeline = Halide::Pipeline;
using PrefetchBoundStrategy = Halide::PrefetchBoundStrategy;
using RDom = Halide::RDom;
Expand Down