Move some MQC functions into a header for speed #675

c0nk · 2015-12-27T19:23:36Z

Move some MQC functions into a header so they can be inlined. This increases decoding performance by ~10%.

Allow these hot functions to be inlined. This boosts decode performance by ~10%.

c0nk · 2015-12-27T23:04:12Z

Can someone email me a copy of the JPEG 2000 standard?

boxerab · 2016-01-02T05:27:40Z

@c0nk try this:

http://old.jpeg.org/jpeg2000/CDs15444.html

You need to pay for the actual standard doc, but you can find the final draft at this link.

julienmalik · 2016-04-28T07:58:15Z

Advertising 10% perf improvements is appealing. Probably we should consider this for the upcoming 2.1, all the more since it seems an easy one.

The same should probably be done for the encoder, but this can be the subject of another PR.

@mayeut @detonin what do you think ?

malaterre · 2016-04-28T08:03:09Z

+1

none of those functions were exported, so this should not change API / ABI.

julienmalik · 2016-04-28T08:18:26Z

A quick test did not show any improvement at all.

With gcc-4.8, Release build, and this commit rebased on current master, running opj_decompress with a 120 MB jp2 (single band) decoded to pgm I get :

before

decode time: 32361 ms

real    0m39.767s
user    0m37.998s
sys 0m1.694s

after

decode time: 32138 ms

real    0m40.734s
user    0m37.986s
sys 0m1.334s

julienmalik · 2016-04-28T08:20:03Z

@c0nk would you mind developing a bit further on the decoding performance improvements you noticed ?

boxerab · 2016-04-28T12:38:09Z

I'm not surprised by this - a decent compiler will inline a lot of this methods anyways, so this kind of easy perf improvement is illusory.

To really speed things up, the project needs to

add OpenMP patch that has been around for years
improve SIMD routines for DWT, MCT - support AVX and AVX2, for example

c0nk · 2016-04-30T16:07:19Z

The idea of the patch is to allow those functions to be inlined into T1. The current situation is that none of the functions from MQC can be inlined into T1 functions because they are in different translation units (unless you enable link-time optimizations).

However, it may be that inlining those functions doesn't really make a difference in most cases. On the other hand, it certainly doesn't hurt.

The benchmarks I did were on ARMv7 running on iPad compiled with Apple's clang.

I just profiled it again on ARMv7 with 4 different JP2 files:

master:       [4666.2111 ms][5470.3850 ms][4027.0701 ms][4832.6990 ms]
master+patch: [4508.6824 ms][5248.4852 ms][3790.8632 ms][4829.4773 ms]

On x86 and ARM64 there was no measurable difference. This doesn't surprise me because both those architectures tend be smarter than ARMv7.

rouault · 2017-06-14T14:23:56Z

Done in master

Move some MQC functions into a header for speed

8348c6d

Allow these hot functions to be inlined. This boosts decode performance by ~10%.

c0nk force-pushed the wip-mqc-inl branch from 86f2fcf to 8348c6d Compare December 27, 2015 19:58

rouault mentioned this pull request May 23, 2016

Tier1 decoder speed optimizations #783

Merged

rouault closed this Jun 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move some MQC functions into a header for speed #675

Move some MQC functions into a header for speed #675

c0nk commented Dec 27, 2015

c0nk commented Dec 27, 2015

boxerab commented Jan 2, 2016

julienmalik commented Apr 28, 2016

malaterre commented Apr 28, 2016

julienmalik commented Apr 28, 2016

julienmalik commented Apr 28, 2016

boxerab commented Apr 28, 2016 •

edited

Loading

c0nk commented Apr 30, 2016 •

edited

Loading

rouault commented Jun 14, 2017

Move some MQC functions into a header for speed #675

Move some MQC functions into a header for speed #675

Conversation

c0nk commented Dec 27, 2015

c0nk commented Dec 27, 2015

boxerab commented Jan 2, 2016

julienmalik commented Apr 28, 2016

malaterre commented Apr 28, 2016

julienmalik commented Apr 28, 2016

julienmalik commented Apr 28, 2016

boxerab commented Apr 28, 2016 • edited Loading

c0nk commented Apr 30, 2016 • edited Loading

rouault commented Jun 14, 2017

boxerab commented Apr 28, 2016 •

edited

Loading

c0nk commented Apr 30, 2016 •

edited

Loading