Skip to content

Commit

Permalink
perf: Reduce compile time by trimming template expansion in IBA (#4476)
Browse files Browse the repository at this point in the history
I was profiling the builds and saw that modules with lots of template
expansion dominate the compile time. For example,
imagebufalgo_pixelmath.cpp alone took 290s to compile on my 2020
MacbookPro (!), and imagebufalgo_addsub.cpp took 84 seconds.

This is all due to the combnatorics of expanding IBA templates via the
DISPATCH macros in imagebufalgo_util.h separately for every type that
the arguments can be. But I claim that most combinations are rarely if
ever used. I mean, how often does anybody need IBA::add() to add an int8
image to an int16 image? So this PR rewrites those macros to simplify
the cases as follows:

* The common pixel data types are float, half, unint8, and uint16.

* Specialized versions are fully expanded only when the result and input
images are one of these types. Images not of one of those types are
first automatically converted to float to make them reduce to a common
case. That makes uncommon pixel data types like (signed) int16 not
expand the template, but rather convert to and from float and use the
float specializations.

* For binary and ternary operations (those with 2 or 3 image inputs), if
the pixel types of the inputs doesn't match, we make sure they both are
converted to float. So, for example, we don't need a specialized version
that adds a half image to a uint16 image -- just convert them to float
and use the common case. But we do specalize if the two inputs are both
the same common case, such as adding two uint16 images.

* Assume that commonly, the result image will either be float, or will
be the same pixel data type as the inputs. Other combinations trigger
assignment to a temporary float IB, then copying with convertion to the
uncommon use-supplied result buffer.

* Additionally, we cut down on a little bit more templating by moving
some "deep" methods from the type-templated ImageBuf::Iterator to its
type-generic non-templated base class IteraterBase.

The net result of all this is an awful lot less template expansion. With
this in place, my laptop compiles imagebufalgo_pixelmath.cpp in 97s (vs
290 before) and imagebufalgo_addsub.cpp in 26s (from 84). It takes a big
bite out of all the iba files, and reduces project-wide compile time by
over 10%, around 30s out of 300 for a fresh, uncached, optimized build
with 16 threads.

Signed-off-by: Larry Gritz <lg@larrygritz.com>
  • Loading branch information
lgritz authored Oct 8, 2024
1 parent 704d0db commit ff20241
Show file tree
Hide file tree
Showing 2 changed files with 305 additions and 174 deletions.
48 changes: 24 additions & 24 deletions src/include/OpenImageIO/imagebuf.h
Original file line number Diff line number Diff line change
Expand Up @@ -1604,6 +1604,30 @@ class OIIO_API ImageBuf {
m_nchannels);
}

/// Set the number of deep data samples at this pixel. (Only use
/// this if deep_alloc() has not yet been called on the buffer.)
void set_deep_samples(int n)
{
ensure_writable();
return const_cast<ImageBuf*>(m_ib)->set_deep_samples(m_x, m_y, m_z,
n);
}

/// Set the deep data value of sample s of channel c. (Only use this
/// if deep_alloc() has been called.)
void set_deep_value(int c, int s, float value)
{
ensure_writable();
return const_cast<ImageBuf*>(m_ib)->set_deep_value(m_x, m_y, m_z, c,
s, value);
}
void set_deep_value(int c, int s, uint32_t value)
{
ensure_writable();
return const_cast<ImageBuf*>(m_ib)->set_deep_value(m_x, m_y, m_z, c,
s, value);
}

protected:
friend class ImageBuf;
friend class ImageBufImpl;
Expand Down Expand Up @@ -1799,30 +1823,6 @@ class OIIO_API ImageBuf {
TypeDesc::BASETYPE(m_pixeltype), m_proxydata,
m_nchannels);
}

/// Set the number of deep data samples at this pixel. (Only use
/// this if deep_alloc() has not yet been called on the buffer.)
void set_deep_samples(int n)
{
ensure_writable();
return const_cast<ImageBuf*>(m_ib)->set_deep_samples(m_x, m_y, m_z,
n);
}

/// Set the deep data value of sample s of channel c. (Only use this
/// if deep_alloc() has been called.)
void set_deep_value(int c, int s, float value)
{
ensure_writable();
return const_cast<ImageBuf*>(m_ib)->set_deep_value(m_x, m_y, m_z, c,
s, value);
}
void set_deep_value(int c, int s, uint32_t value)
{
ensure_writable();
return const_cast<ImageBuf*>(m_ib)->set_deep_value(m_x, m_y, m_z, c,
s, value);
}
};


Expand Down
Loading

0 comments on commit ff20241

Please sign in to comment.