-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel: add optimized versions of PROD_INTOBJS #1047
Conversation
Current coverage is 49.72% (diff: 100%)@@ master #1047 diff @@
==========================================
Files 424 424
Lines 223275 223191 -84
Methods 3429 3429
Messages 0 0
Branches 0 0
==========================================
+ Hits 110880 110986 +106
+ Misses 112395 112205 -190
Partials 0 0
|
This looks fine to me. I'll count on your changes to the autoconf being right, they look "right enough" to me. Will just see if anyone else has a comment. |
db211fd
to
75d6a5d
Compare
I just slightly adjusted the autoconf code, to make it more flexible, as I want to use it in the future to check for other builtins which require a different number of arguments. And yeah, I am pretty sure they are "right enough" ;-). |
Add three variants of PROD_INTOBJS using portable GCC extensions such as __builtin_smul_overflow (supported by GCC >= 5.0 and clang >= 3.4) I took great care to verify via test programs written in that the new implementations yields identical results to PROD_INTOBJS, at least on 64bit systems (and I see no reason why 32bit should be different). Regarding performance: in a C micro benchmark, the new code outperforms the old by a factor of 2 on my laptop. In GAP, the gain is not as impressive, probably because other things add a lot of overhead. Still, there is a gain of about 10%-15% with this micro benchmark: foo:=function(b) local r,i,j,x; r:=[-b..b];; for i in r do for j in r do x:=i*j; od; od; return x; end; Old code: gap> foo(2^12);;time; 1480 gap> foo(2^13);;time; 5559 New code: gap> foo(2^12);;time; 1186 gap> foo(2^13);;time; 4849 For comparison, the C mini benchmark does something very similar, but is faster by a factor 18 (for inputs 2^12, 2^13) to 40 (for input 2^14). Disassembling the code generated by GCC and clang for the new implementation shows that it is very close what I'd write by hand, while the old code is a very complicated, and requires divisions in some cases (= slow).
75d6a5d
to
f96b61d
Compare
Wouldn't a rebase have been nice for this? This, together with the merge of #960, now makes for a somewhat confusing history again. Alternatively, if you prefer a merge commit, then I'd have been happy to rebase this first, before the merge, to make for simpler history. |
I just forget all the options of github. |
Add three variants of PROD_INTOBJS using portable GCC extensions such as
__builtin_smul_overflow (supported by GCC >= 5.0 and clang >= 3.4)
I took great care to verify via test programs written in that the new
implementations yields identical results to PROD_INTOBJS, at least on
64bit systems (and I see no reason why 32bit should be different).
Regarding performance: in a C micro benchmark, the new code outperforms
the old by a factor of 2 on my laptop. In GAP, the gain is not as
impressive, probably because other things add a lot of overhead. Still,
there is a gain of about 10%-15% with this micro benchmark:
foo:=function(b)
local r,i,j,x;
r:=[-b..b];;
for i in r do
for j in r do
x:=i*j;
od;
od;
return x;
end;
Old code:
gap> foo(2^12);;time;
1480
gap> foo(2^13);;time;
5559
New code:
gap> foo(2^12);;time;
1186
gap> foo(2^13);;time;
4849
For comparison, the C mini benchmark does something very similar, but is
faster by a factor 18 (for inputs 2^12, 2^13) to 40 (for input 2^14).
Disassembling the code generated by GCC and clang for the new
implementation shows that it is very close what I'd write by hand, while
the old code is a very complicated, and requires divisions in some cases
(= slow).
Please make sure that this pull request:
Tick all what applies to this pull request
Write below the description of changes (for the release notes)