Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move some MQC functions into a header for speed #675

Closed
wants to merge 1 commit into from

Conversation

c0nk
Copy link
Contributor

@c0nk c0nk commented Dec 27, 2015

Move some MQC functions into a header so they can be inlined. This increases decoding performance by ~10%.

Allow these hot functions to be inlined. This boosts decode performance by ~10%.
@c0nk
Copy link
Contributor Author

c0nk commented Dec 27, 2015

Can someone email me a copy of the JPEG 2000 standard?

@boxerab
Copy link
Contributor

boxerab commented Jan 2, 2016

@c0nk try this:

http://old.jpeg.org/jpeg2000/CDs15444.html

You need to pay for the actual standard doc, but you can find the final draft at this link.

@julienmalik
Copy link
Collaborator

Advertising 10% perf improvements is appealing. Probably we should consider this for the upcoming 2.1, all the more since it seems an easy one.

The same should probably be done for the encoder, but this can be the subject of another PR.

@mayeut @detonin what do you think ?

@malaterre
Copy link
Collaborator

+1

none of those functions were exported, so this should not change API / ABI.

@julienmalik
Copy link
Collaborator

A quick test did not show any improvement at all.

With gcc-4.8, Release build, and this commit rebased on current master, running opj_decompress with a 120 MB jp2 (single band) decoded to pgm I get :

  • before
decode time: 32361 ms

real    0m39.767s
user    0m37.998s
sys 0m1.694s
  • after
decode time: 32138 ms

real    0m40.734s
user    0m37.986s
sys 0m1.334s

@julienmalik
Copy link
Collaborator

@c0nk would you mind developing a bit further on the decoding performance improvements you noticed ?

@boxerab
Copy link
Contributor

boxerab commented Apr 28, 2016

I'm not surprised by this - a decent compiler will inline a lot of this methods anyways, so this kind of easy perf improvement is illusory.

To really speed things up, the project needs to

  1. add OpenMP patch that has been around for years
  2. improve SIMD routines for DWT, MCT - support AVX and AVX2, for example

@c0nk
Copy link
Contributor Author

c0nk commented Apr 30, 2016

The idea of the patch is to allow those functions to be inlined into T1. The current situation is that none of the functions from MQC can be inlined into T1 functions because they are in different translation units (unless you enable link-time optimizations).

However, it may be that inlining those functions doesn't really make a difference in most cases. On the other hand, it certainly doesn't hurt.

The benchmarks I did were on ARMv7 running on iPad compiled with Apple's clang.

I just profiled it again on ARMv7 with 4 different JP2 files:

master:       [4666.2111 ms][5470.3850 ms][4027.0701 ms][4832.6990 ms]
master+patch: [4508.6824 ms][5248.4852 ms][3790.8632 ms][4829.4773 ms]

On x86 and ARM64 there was no measurable difference. This doesn't surprise me because both those architectures tend be smarter than ARMv7.

@rouault
Copy link
Collaborator

rouault commented Jun 14, 2017

Done in master

@rouault rouault closed this Jun 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants