-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move some MQC functions into a header for speed #675
Conversation
Allow these hot functions to be inlined. This boosts decode performance by ~10%.
Can someone email me a copy of the JPEG 2000 standard? |
@c0nk try this: http://old.jpeg.org/jpeg2000/CDs15444.html You need to pay for the actual standard doc, but you can find the final draft at this link. |
+1 none of those functions were exported, so this should not change API / ABI. |
A quick test did not show any improvement at all. With gcc-4.8, Release build, and this commit rebased on current master, running opj_decompress with a 120 MB jp2 (single band) decoded to pgm I get :
|
@c0nk would you mind developing a bit further on the decoding performance improvements you noticed ? |
I'm not surprised by this - a decent compiler will inline a lot of this methods anyways, so this kind of easy perf improvement is illusory. To really speed things up, the project needs to
|
The idea of the patch is to allow those functions to be inlined into T1. The current situation is that none of the functions from MQC can be inlined into T1 functions because they are in different translation units (unless you enable link-time optimizations). However, it may be that inlining those functions doesn't really make a difference in most cases. On the other hand, it certainly doesn't hurt. The benchmarks I did were on ARMv7 running on iPad compiled with Apple's clang. I just profiled it again on ARMv7 with 4 different JP2 files:
On x86 and ARM64 there was no measurable difference. This doesn't surprise me because both those architectures tend be smarter than ARMv7. |
Done in master |
Move some MQC functions into a header so they can be inlined. This increases decoding performance by ~10%.