-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer cdef to def in mpoly operands #192
Conversation
Does this make any different to timings? |
Not quite. In Cython a subclass cdef method is picked using a virtual method table. I can't find a good place in the docs that explains this but it is mentioned here: A long time ago I profiled a bunch of different ways to dispatch to find the most efficient method of indirecting from one bit of cython code to another: This is not like calling a method in Python that needs to look up the method name by string in a dict though. Also worth noting that some special methods on a Cython class can be looked up faster in C because the methods are hard-coded into the type struct rather than needing a name lookup e.g. I think this happens with |
In a rather quick and dirty benchmark, slightly. Debug builds show similar deltas on my system from flint import *
import timeit
ctx = fmpz_mpoly_ctx.get_context(2, Ordering.lex, 'x')
def foo():
f = ctx.from_dict({(0, 0): 200000})
for _ in range(100000):
f = f - 1
return f
timeit.timeit(stmt="foo()", globals={"foo": foo}, number=100)
# build type release on this PR
# 1.8037770350019855
# 1.775936335001461
# 1.7955783019970113
# build type release on master
# 1.99440729500202
# 2.0311293699996895
# 2.105380791999778 |
Timings for master: In [5]: ctx = fmpz_mpoly_ctx(3, Ordering.lex, ["x", "y", "z"])
In [6]: x, y, z = ctx.gens()
In [7]: p = x + y + z
In [8]: %timeit p+1
2.06 µs ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [9]: %timeit p+p
1.76 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [10]: one = fmpz(1)
In [12]: %timeit p+one
1.74 µs ± 90.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Timings for the PR: In [6]: %timeit p+1
1.81 µs ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [7]: %timeit p+p
1.31 µs ± 6.16 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [9]: %timeit p+one
1.74 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) That's about 15% faster on average I guess... |
Ah yep ofc because they're still member functions not stand-alone C functions
That's interesting, seems like a reasonable optimisation |
Okay, looks good. |
This replaces a bunch of
def
s withcdef
s which I believe removes all runtime reflection in operands bar the initial call. Commutative operands don't have reflectedcdef
functions. Reflected non-commutative operands just perform argument swap before calling the normal method, exceptrsub_scalar
, which just does the whole thing itself due to the scalar type and additional negation