kernel: add optimized versions of PROD_INTOBJS #1047

fingolfin · 2017-01-05T16:09:45Z

Add three variants of PROD_INTOBJS using portable GCC extensions such as
__builtin_smul_overflow (supported by GCC >= 5.0 and clang >= 3.4)

I took great care to verify via test programs written in that the new
implementations yields identical results to PROD_INTOBJS, at least on
64bit systems (and I see no reason why 32bit should be different).

Regarding performance: in a C micro benchmark, the new code outperforms
the old by a factor of 2 on my laptop. In GAP, the gain is not as
impressive, probably because other things add a lot of overhead. Still,
there is a gain of about 10%-15% with this micro benchmark:

foo:=function(b)
local r,i,j,x;
r:=[-b..b];;
for i in r do
for j in r do
x:=i*j;
od;
od;
return x;
end;

Old code:
gap> foo(2^12);;time;
1480
gap> foo(2^13);;time;
5559

New code:
gap> foo(2^12);;time;
1186
gap> foo(2^13);;time;
4849

For comparison, the C mini benchmark does something very similar, but is
faster by a factor 18 (for inputs 2^12, 2^13) to 40 (for input 2^14).

Disassembling the code generated by GCC and clang for the new
implementation shows that it is very close what I'd write by hand, while
the old code is a very complicated, and requires divisions in some cases
(= slow).

Please make sure that this pull request:

is submitted to the correct branch (the stable branch is only for bugfixes)
contains an accurate description of changes for the release notes below
provides new tests or relies on existing ones
correctly refers to other issues and related pull requests

Tick all what applies to this pull request

Adds new features
Improves and extends functionality
Fixes bugs that could lead to crashes
Fixes bugs that could lead to incorrect results
Fixes bugs that could lead to break loops

Write below the description of changes (for the release notes)

codecov-io · 2017-01-05T16:53:04Z

Current coverage is 49.72% (diff: 100%)

Merging #1047 into master will increase coverage by 0.06%

@@             master      #1047   diff @@
==========================================
  Files           424        424          
  Lines        223275     223191    -84   
  Methods        3429       3429          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits         110880     110986   +106   
+ Misses       112395     112205   -190   
  Partials          0          0

Powered by Codecov. Last update fd9d4e3...f96b61d

ChrisJefferson · 2017-01-05T17:05:31Z

This looks fine to me. I'll count on your changes to the autoconf being right, they look "right enough" to me. Will just see if anyone else has a comment.

fingolfin · 2017-01-06T18:24:35Z

I just slightly adjusted the autoconf code, to make it more flexible, as I want to use it in the future to check for other builtins which require a different number of arguments.

And yeah, I am pretty sure they are "right enough" ;-).

Add three variants of PROD_INTOBJS using portable GCC extensions such as __builtin_smul_overflow (supported by GCC >= 5.0 and clang >= 3.4) I took great care to verify via test programs written in that the new implementations yields identical results to PROD_INTOBJS, at least on 64bit systems (and I see no reason why 32bit should be different). Regarding performance: in a C micro benchmark, the new code outperforms the old by a factor of 2 on my laptop. In GAP, the gain is not as impressive, probably because other things add a lot of overhead. Still, there is a gain of about 10%-15% with this micro benchmark: foo:=function(b) local r,i,j,x; r:=[-b..b];; for i in r do for j in r do x:=i*j; od; od; return x; end; Old code: gap> foo(2^12);;time; 1480 gap> foo(2^13);;time; 5559 New code: gap> foo(2^12);;time; 1186 gap> foo(2^13);;time; 4849 For comparison, the C mini benchmark does something very similar, but is faster by a factor 18 (for inputs 2^12, 2^13) to 40 (for input 2^14). Disassembling the code generated by GCC and clang for the new implementation shows that it is very close what I'd write by hand, while the old code is a very complicated, and requires divisions in some cases (= slow).

fingolfin · 2017-01-08T14:33:07Z

Wouldn't a rebase have been nice for this? This, together with the merge of #960, now makes for a somewhat confusing history again. Alternatively, if you prefer a merge commit, then I'd have been happy to rebase this first, before the merge, to make for simpler history.

ChrisJefferson · 2017-01-08T15:26:01Z

I just forget all the options of github.

fingolfin added the kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements label Jan 5, 2017

fingolfin force-pushed the mh/PROD_INTOBJS branch from db211fd to 75d6a5d Compare January 6, 2017 18:23

fingolfin force-pushed the mh/PROD_INTOBJS branch from 75d6a5d to f96b61d Compare January 6, 2017 18:26

ChrisJefferson merged commit a170b87 into gap-system:master Jan 8, 2017

fingolfin deleted the mh/PROD_INTOBJS branch January 13, 2017 22:48

fingolfin mentioned this pull request Sep 7, 2017

Release notes for GAP 4.9 #1699

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: add optimized versions of PROD_INTOBJS #1047

kernel: add optimized versions of PROD_INTOBJS #1047

fingolfin commented Jan 5, 2017 •

edited

Loading

codecov-io commented Jan 5, 2017 •

edited

Loading

ChrisJefferson commented Jan 5, 2017

fingolfin commented Jan 6, 2017

fingolfin commented Jan 8, 2017

ChrisJefferson commented Jan 8, 2017

kernel: add optimized versions of PROD_INTOBJS #1047

kernel: add optimized versions of PROD_INTOBJS #1047

Conversation

fingolfin commented Jan 5, 2017 • edited Loading

Tick all what applies to this pull request

Write below the description of changes (for the release notes)

codecov-io commented Jan 5, 2017 • edited Loading

Current coverage is 49.72% (diff: 100%)

ChrisJefferson commented Jan 5, 2017

fingolfin commented Jan 6, 2017

fingolfin commented Jan 8, 2017

ChrisJefferson commented Jan 8, 2017

fingolfin commented Jan 5, 2017 •

edited

Loading

codecov-io commented Jan 5, 2017 •

edited

Loading