Enable Intel SSE in configure builds #576

jbowler · 2024-08-13T20:04:55Z

At present configure and general Makefile builds have the Intel SSE optimizations hardwired off by a special #define PNG_INTEL_SSE which disables the auto-detection of Intel SSE support. This was done by Glenn in 2017 here:

bef76802de97c7ba917c4d397d

Since then CMakeLists.txt has been modified in such a way that it bypasses this requirement. As a consequence cmake and configure builds produce different results.

The CMakeLists.txt changes also broke the support for Apple Universal Binaries and, possibly, for GCC multilib in general. This is fixed by #575 which will also have the side effect of bringing "no option" cmake and configure builds into line. Consequently cmake will no longer build with Intel SSE optimizations by default.

This resolves this by removing the check on PNG_INTEL_SSE in pngpriv.h; instead of switching Intel SSE off (by default) in cmake builds it is switched on (by default) in both configure and "raw" makefile builds.

The original motivation for the check was to avoid perturbing the 1.6 relesae by adding possibly broken hardware optimizations mid-cycle. Since then, however, this rule has been ignored and multiple hardware optimizations have been added that are on by default without moving to a new major release.

ctruta · 2024-09-03T13:26:41Z

Thank you. What you're saying makes sense, and yet, I have a recollection about having this tested before on the CI bots, and the test failed. Now that I fixed an unrelated failure on AppVeyor CI, I'm retesting this commit again, with the fingers crossed 🤞

ctruta · 2024-09-03T13:35:21Z

Thank you. What you're saying makes sense, and yet, I have a recollection about having this tested before on the CI bots, and the test failed. Now that I fixed an unrelated failure on AppVeyor CI, I'm retesting this commit again, with the fingers crossed 🤞

So... It failed on FreeBSD (but not on Mac or Linux), with the cmake build (but not with the configure build).
It's time to boot my FreeBSD 14.1 installation...

jbowler · 2024-09-03T15:29:37Z

The cmake fixes are in #575, this change only affects configure. Did FreeBSD work in the first place? This change removes the test:

# ifdef PNG_INTEL_SSE

Somehow CMakeLists.txt manages (without #575) to set PNG_INTEL_SSE as a #define (in addition to a cmake variable).

I just tested with absolutely no command line options (just cmake ..) on the current tip and cmake builds SSE even though I can't see where it is setting the #define.make

Indeed cmake builds the same thing so far as I can see with and without the patch on an Intel SSE machine (and this is what I expect). Specifically "size CMakeFiles/png_shared.dir/intel/*.o" produces identical output in both cases (and the same applies to the main files).

ctruta · 2024-09-03T16:32:15Z

The cmake fixes are in #575, this change only affects configure.

Please forgive me, but the correct statement is "this change should only affect configure".

Did FreeBSD work in the first place?

See

libpng/.travis.yml

Line 9 in 532fec0

- freebsd

That thing hardly ever breaks. I had headaches, now and then, with macOS, or with Windows, but not so much with FreeBSD.

Good commit: https://app.travis-ci.com/github/ctruta/libpng/builds/272153025
Bad commit: https://app.travis-ci.com/github/ctruta/libpng/builds/272154204

You will see this in the build log:

ld: error: libpng16.so.16.44.git: undefined reference to png_init_filter_functions_sse2

In the meantime, the Windows verification with MSYS2 (but not with Visual Studio) of the cmake build (but not of the configure build) also failed, with the exact same error.

https://ci.appveyor.com/project/ctruta/libpng/builds/50527132

I can reproduce the failure in my offline testing as well, but I still couldn't find the culprit.

jbowler · 2024-09-03T17:16:57Z

I thought the CI tests were run automatically; surely they should be? Up above I see "All checks have passed" but it's just running "lint".

The msys build fails on all systems with hardware optimizations doesn't it? makefile.std has NOHWOPT defined so it succeeds on the listed architectures but will fail on the ones that aren't listed. These are missing from the list:

PNG_MIPS_MMI_OPT
PNG_LOONGARCH_LSX_OPT

So this is a bug in scripts/makefile.msys (no disabling of the HW) and scripts/makefile.std (no disabling of certain untested architectures). In fact both can also be fixed by simply including all the hardware-specific files; an approach which is required by multilib as in #575.

You will see this in the build log:

So that applies to FreeBSD as well? I.e. it isn't using configure or cmake, it's using (and failing) with makefile.msys?

At present configure and general Makefile builds have the Intel SSE optimizations hardwired off by a special #define PNG_INTEL_SSE which disables the auto-detection of Intel SSE support. This was done by Glenn in 2017 here: bef76802de97c7ba917c4d397d Since then CMakeLists.txt has been modified in such a way that it bypasses this requirement. As a consequence cmake and configure builds produce different results. The CMakeLists.txt changes also broke the support for Apple Universal Binaries and, possibly, for GCC multilib in general. This is fixed by #575 which will also have the side effect of bringing "no option" cmake and configure builds into line. Consequently cmake will no longer build with Intel SSE optimizations by default. This resolves this by removing the check on PNG_INTEL_SSE in pngpriv.h; instead of switching Intel SSE off (by default) in cmake builds it is switched on (by default) in both configure and "raw" makefile builds. The original motivation for the check was to avoid perturbing the 1.6 relesae by adding possibly broken hardware optimizations mid-cycle. Since then, however, this rule has been ignored and multiple hardware optimizations have been added that are on by default without moving to a new major release. Signed-off-by: John Bowler <jbowler@acm.org>

jbowler · 2024-09-03T17:36:50Z

This is the problem, with gh pr (i.e. with the change to makefile.msys):

git clean -dfx
make -f scripts/makefile.msys

Now succeeds. Without the PR it would fail on any architecture with HW optimizations other than Intel SSE. I have limited testing capability but I verified that:

make CC=aarch64-linux-gnu-cc -f scripts/makefile.msys

Does fail without the PR and does succeed with the new version of the PR.

I don't understand why this has anything to do with cmake; the cmake in your AppVeyor logs all seem to succeed, it's only the AUTOMATION=makefiles builds which fail.

ctruta · 2024-09-03T19:20:59Z

Here's the bug:

libpng/CMakeLists.txt

Line 189 in 532fec0

if(TARGET_ARCH MATCHES "^(i[3-6]86|x86|AMD64)")

On my FreeBSD, it's amd64 instead of AMD64. There are many obvious ways in which I can fix this, and I think I'll go for normalizing (i.e. lowercase-ing) TARGET_ARCH.

This is the problem, with gh pr (i.e. with the change to makefile.msys):

My bad. They're 100% unrelated. (I thought I was looking at a CMake failure, when in fact it was a plain Makefile failure.)

I'll fix the CMake problem first, because it is a real defect, rather than a missing setting, and I'll credit you @jbowler with the discovery. After all, your patch didn't introduced an error; but rather, it discovered one ;-)

ctruta · 2024-09-03T19:28:25Z

I thought the CI tests were run automatically; surely they should be? Up above I see "All checks have passed" but it's just running "lint".

That's for GitHub Actions, which, indeed, is restricted to linting only (for now). The actual verification ci_verify_*.sh is spread over Travis CI (for Linux, Mac and FreeBSD) and AppVeyor CI (for Windows), in order to mitigate the fact that a full libpng build+test is quite long. I'd rather not exceed the CI credits.

For another TODO task of mine, admittedly of a lower priority, I'd like to start using Circle CI for Android test builds.

jbowler · 2024-09-03T20:00:40Z

On my FreeBSD, it's amd64 instead of AMD64

This is fixed by #575 isn't it?

It will probably create a merge conflict with #575 either way. #575 is the point; it fixes multilib (aka the Apple/cmake? build system).

ctruta · 2024-09-03T20:05:16Z

On my FreeBSD, it's amd64 instead of AMD64

This is fixed by #575 isn't it?

Oh yes indeed. I'm still catching up with the discussion in there. I agree with it in principle, and I'll follow up tomorrow morning. (European morning, specifically.)

jbowler · 2024-09-03T20:06:56Z

Yes: the @csparker247 fix (i.e. #515) removes the string "AMD64" from CMakeLists.txt, so it can't cause bugs anymore.

I am in favour of #515 going in first because that's the way we tested these two changes; I deliberately separated #516 out because it does something slightly different.

Because of a missing "amd64" string (in lowercase) in a regex match, the CMake build was unable to pick up the PNG_HARDWARE_OPTIMIZATIONS flag on FreeBSD/amd64 (and possibly other amd64 systems as well). Rename the target arch variable from TARGET_ARCH to a more idiomatic PNG_TARGET_ARCHITECTURE, and set it to an always-lowercase string. The follow-on checks are now simpler and easier to get right.

ctruta · 2024-09-05T16:47:37Z

The original motivation for the check was to avoid perturbing the 1.6
relesae by adding possibly broken hardware optimizations mid-cycle.
Since then, however, this rule has been ignored and multiple hardware
optimizations have been added that are on by default without moving to a
new major release.

Here here!

The bad news is that @jbowler's commit cbf24df is by no means sufficient. All makefiles (not just makefile.msys) need to be updated, or else, they will fail. It just happened that makefile.msys happened to be the odd one that's tested on the CI bots.

Now, we can modify all makefiles, no problem. But here's the rub: these makefiles aren't used by anybody: they are just a reflections of what the users do in reality, with their custom builds. And there are many custom builds out there.

For every platform, except for x86, those changes to those makefiles are needed. Not so for x86, which is, you know, only the biggest one. And, oh, don't be fooled by the presence of -DPNG_INTEL_SSE_OPT=0 in HWOPT inside those makefiles: they work with it and without it, no problem.

In conclusion: we can fix all of our makefiles, no problem. Or we can remove them from our CI verification, which is even easier. More importantly, these little hand-made makefiles are a reflection of what handmade build stuff happens to exist on the users' machine.

Having said all that: I have an idea. (Wait for it...)

ctruta · 2024-09-05T16:58:37Z

While I'm still sleeping on this, awaiting your input @jbowler and @csparker247, I will move on to apply the most recent PR #580 which couldn't have come at a better time. (Thx @ericriff.) I needed to improve the cross-platform building and testing, plus that I still have the year-and-a-half-long promise to honour, i.e., to apply the RISCV patches. So I'll do those ones, for now.

Back to this: I wonder what do you think if we published a stop-gap libpng-1.8.0, between the current libpng-1.6.x and the future (and the elusive) libpng-ng. There are a bunch of annoyances that I would like to fix, including this one, and without disrupting the build process at all. I don't plan to break any source or binary compatibility -- at least, not deliberately -- just to do some long-overdue cleaning, for various reasons, including the reason to make a smoother transition to the aforementioned libpng-ng.

What say you?

jbowler · 2024-09-05T18:12:12Z

Just do it. Create a branch and I will submit a git PR full of "git rm" and "git mv" actions.

jbowler · 2024-09-05T20:41:00Z

In conclusion: we can fix all of our makefiles, no problem.

It's a few lines to fix all the makefiles by making the "default-default" for hardware stuff "off". This is failsafe, but if we are going down the libpng-1.8 route I'd rather fix the very original problem of all the "ifdef PNG_foo_SUPPORTED" stuff; it always was completely unnecessary.

jbowler · 2024-09-05T20:50:14Z

Back to this: I wonder what do you think if we published a stop-gap libpng-1.8.0, between the current libpng-1.6.x and the future (and the elusive) libpng-ng.

In fact I can do this by submitting a PR against pnggroup/libpng/master and pulling all the pending PRs that I've reviewed in. After all this is what I've been saying should happen for months:

Freeze 1.6; no more changes other than security fixes.
Move all of the PR handling (none corresponds to any security issue) to "master".
Handle all the deprecated stuff and deprecate currently obsoleted APIs such as "png_get_eXIf" (replaced by png_get_eXIf_1 years ago.)
When it's vaguely stable make a 1.8.0beta branch out of it.

When I talked about "git mv" I meant moving the source into a sub-directory and splitting it into separate files. Glenn wouldn't do this because it broke all the makefiles (at least I think that was the reasoning, he never really explained it). Splitting the source files is the first step to getting rid of most of the #ifdef PNG_foo_SUPPORTED" stuff.

csparker247 · 2024-09-05T21:14:31Z

@ctruta Yeah, I have no problems with a new minor version. My use case is not pinned to a specific version. I'm mostly new to this project, so I'm going to defer to you on how you want to manage and version it. Just let me know what you'd like me to do with my PR (rebase or whatever).

ctruta · 2024-09-05T21:36:24Z

1. Freeze 1.6; no more changes other than security fixes.

Agreed.

2. Move all of the PR handling (none corresponds to any security issue) to "master".

Agreed.

3. Handle all the deprecated stuff and deprecate currently obsoleted APIs such as "png_get_eXIf" (replaced by png_get_eXIf_1 years ago.)

Agreed.

4. When it's vaguely stable make a 1.8.0beta branch out of it.

Agreed.

When I talked about "git mv" I meant moving the source into a sub-directory and splitting it into separate files. Glenn wouldn't do this because it broke all the makefiles (at least I think that was the reasoning, he never really explained it). Splitting the source files is the first step to getting rid of most of the #ifdef PNG_foo_SUPPORTED" stuff.

I don't care too much about our makefiles, but I do care about not breaking other users' custom-made build files. See, for example, Android and Chromium, which have their own build systems. My rule of thumb is this: if it feels awkward for us to write a plain dumb Makefile that doesn't break e.g. when you look at it in a text viewer, then it sure feels awkward to write a foreign build file for an external user who wants neither our CMake file nor our configure script for some reason in their control (or not). And, BTW, I'd rather not introduce another complication with Mesonbuild, regardless of how awesome it appears to be.

To summarize my point: I'm good with breaking makefiles now, between 1.6.x and 1.8.x, but not across 1.6.x, and not across 1.8.x. I want makefile.linux and makefile.darwin and makefile.freebsd and their other redundant friends removed, because of their zero value added. On the other hand, I still want to hang on to the generic ones, like makefile.std and makefile.gcc, and also to keep the one-offs, like makefile.riscos.

Just do it. Create a branch and I will submit a git PR full of "git rm" and "git mv" actions.

Give me a few more days, e.g. until Monday, to wrap up 1.6.44, and then I'll fork the 1.8.0 alpha from that one.

But -- why wait? Just fork it off in your own repo, and then I'll come and pick up stuff from there. I, too, am doing my work and experiments and tests and stuff in two repos, one at github.com/ctruta, and the other at bitbucket.com/ctruta. I never work directly on github.com/pnggroup/libpng

jbowler · 2024-09-05T22:09:45Z

I have no problems with a new minor version.

We're talking about a major version. Some of the pending PRs involve ABI changes (fixing stdio on Windows) and removing the deprecated functions is an API change (well, deletion.) libpng releases are numbered:

1.major.patch

"patch" includes API adds but not removal, so it could be called "minor" but it is rigorously API and ABI compatible. This is why we have APIs with really bad names like png_get_exif_1

jbowler · 2024-09-05T22:32:11Z

I just force-pushed a version which fixes all the makefiles... Well, I tested with an unmodified scripts/makefile.msys

csparker247 · 2024-09-06T00:50:43Z

We're talking about a major version.

Also fine. I was fixing this cmake issue so I could subsequently build a multiarch version of OpenCV. So for me, it's A) OpenCV isn't using deprecated functionality and libpng 1.8 works fine, or, B) I go submit a PR against OpenCV to fix the issues with libpng 1.8.

Either way, I can keep using my fork for my purposes (as I'm currently doing) until the upstream issues are resolved.

jbowler · 2024-09-06T03:03:24Z

We're talking about a major version.

Also fine. I was fixing this cmake issue so I could subsequently build a multiarch version of OpenCV. So for me, it's A) OpenCV isn't using deprecated functionality and libpng 1.8 works fine, or, B) I go submit a PR against OpenCV to fix the issues with libpng 1.8.

Either way, I can keep using my fork for my purposes (as I'm currently doing) until the upstream issues are resolved.

I've incorporated all the currently pending issues that I've reviewed in #581; that's a pull against libpng/master but it currently also applies agaiinst libpng/libpng16.

It should fix the Universal Binary problem but I don't have working Apple build kit so your input would be valuable.

There are some other issues which require non-trivial work but they are unrelated.

EDIT: but if it's going on a new branch this PR shouldn't be fixing makefiles; that's a separate issue, so I'll simplify it back to the original form and redirected it to 'master'. The makefiles and everything can be fixed by rewriting the hardware code to not use #if and making the configuration (makefile/cmake/configure) always include a single group of HW support files.

csparker247 · 2024-09-06T10:35:18Z

I've incorporated all the currently pending issues that I've reviewed in #581; that's a pull against libpng/master but it currently also applies agaiinst libpng/libpng16.

I have a macOS sandbox I'm happy to test this in. On my side, it looks like that PR has no commits. What branch/PR should I test with at this point? Or I can just wait if it's not quite ready.

jbowler · 2024-09-06T15:36:22Z

I've incorporated all the currently pending issues that I've reviewed in #581; that's a pull against libpng/master but it currently also applies agaiinst libpng/libpng16.

I have a macOS sandbox I'm happy to test this in. On my side, it looks like that PR has no commits. What branch/PR should I test with at this point? Or I can just wait if it's not quite ready.

I closed it because it just came down to a combination of the existing commits, well, that and resolving the merge conflict in #575 which is pretty simple :-) It would help if you force pushed a version of #575 with the merge conflicts resolved.

Given that I can change things in a major release I'm looking at making it all compile time with no options or none other than disabling all hardware/architecture specific optimizations.

Some systems may have to run a single (not universal) binary which will run on different instances of the same basic architecture, for example even armv7a compiled with and without -mfpu=neon. Those builds should be done by using the highest common factor; so if all armv7a targets have -mfpu=neon compile with that fpu option but if some don't compile without it. Unfortunately I suspect sometime people will expect to be able to turn it off and it's difficult to discount that. I will start with it no options and see if anyone has an issue.

csparker247 · 2024-09-06T18:08:52Z

@jbowler Conflicts with libpng16 are resolved. Does that work?

jbowler · 2024-09-06T20:21:49Z

Conflicts with libpng16 are resolved. Does that work?

I think so but I need to run my tests again. "master" and "libpng16" are currenly the same so it should apply to either.

jbowler · 2024-09-08T00:12:07Z

Obsoleted by #583

csparker247 mentioned this pull request Aug 16, 2024

libpng: Replace repository reference with official upstream source educelab/vc-deps#12

Open

csparker247 mentioned this pull request Sep 4, 2024

cmake: Let pngpriv.h control most architecture optimizations #575

Closed

ctruta mentioned this pull request Sep 5, 2024

link error on Intel macOS when building Universal Binary (arm64) using cmake #372

Open

jbowler changed the base branch from libpng16 to master September 6, 2024 10:24

jbowler mentioned this pull request Sep 8, 2024

libpng 1.8 hardware support cleanup #583

Closed

jbowler closed this Sep 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Intel SSE in configure builds #576

Enable Intel SSE in configure builds #576

jbowler commented Aug 13, 2024

ctruta commented Sep 3, 2024

ctruta commented Sep 3, 2024

jbowler commented Sep 3, 2024 •

edited

Loading

ctruta commented Sep 3, 2024

jbowler commented Sep 3, 2024

jbowler commented Sep 3, 2024

ctruta commented Sep 3, 2024

ctruta commented Sep 3, 2024 •

edited

Loading

jbowler commented Sep 3, 2024 •

edited

Loading

ctruta commented Sep 3, 2024

jbowler commented Sep 3, 2024

ctruta commented Sep 5, 2024

ctruta commented Sep 5, 2024

jbowler commented Sep 5, 2024

jbowler commented Sep 5, 2024

jbowler commented Sep 5, 2024

csparker247 commented Sep 5, 2024

ctruta commented Sep 5, 2024

jbowler commented Sep 5, 2024 •

edited

Loading

jbowler commented Sep 5, 2024

csparker247 commented Sep 6, 2024

jbowler commented Sep 6, 2024 •

edited

Loading

csparker247 commented Sep 6, 2024

jbowler commented Sep 6, 2024

csparker247 commented Sep 6, 2024

jbowler commented Sep 6, 2024

jbowler commented Sep 8, 2024

Enable Intel SSE in configure builds #576

Enable Intel SSE in configure builds #576

Conversation

jbowler commented Aug 13, 2024

ctruta commented Sep 3, 2024

ctruta commented Sep 3, 2024

jbowler commented Sep 3, 2024 • edited Loading

ctruta commented Sep 3, 2024

jbowler commented Sep 3, 2024

jbowler commented Sep 3, 2024

ctruta commented Sep 3, 2024

ctruta commented Sep 3, 2024 • edited Loading

jbowler commented Sep 3, 2024 • edited Loading

ctruta commented Sep 3, 2024

jbowler commented Sep 3, 2024

ctruta commented Sep 5, 2024

ctruta commented Sep 5, 2024

jbowler commented Sep 5, 2024

jbowler commented Sep 5, 2024

jbowler commented Sep 5, 2024

csparker247 commented Sep 5, 2024

ctruta commented Sep 5, 2024

jbowler commented Sep 5, 2024 • edited Loading

jbowler commented Sep 5, 2024

csparker247 commented Sep 6, 2024

jbowler commented Sep 6, 2024 • edited Loading

csparker247 commented Sep 6, 2024

jbowler commented Sep 6, 2024

csparker247 commented Sep 6, 2024

jbowler commented Sep 6, 2024

jbowler commented Sep 8, 2024

jbowler commented Sep 3, 2024 •

edited

Loading

ctruta commented Sep 3, 2024 •

edited

Loading

jbowler commented Sep 3, 2024 •

edited

Loading

jbowler commented Sep 5, 2024 •

edited

Loading

jbowler commented Sep 6, 2024 •

edited

Loading