Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer cdef to def in mpoly operands #192

Merged
merged 2 commits into from
Aug 21, 2024

Conversation

Jake-Moss
Copy link
Contributor

This replaces a bunch of defs with cdefs which I believe removes all runtime reflection in operands bar the initial call. Commutative operands don't have reflected cdef functions. Reflected non-commutative operands just perform argument swap before calling the normal method, except rsub_scalar, which just does the whole thing itself due to the scalar type and additional negation

@oscarbenjamin
Copy link
Collaborator

Does this make any different to timings?

@oscarbenjamin
Copy link
Collaborator

This replaces a bunch of defs with cdefs which I believe removes all runtime reflection in operands bar the initial call.

Not quite. In Cython a subclass cdef method is picked using a virtual method table. I can't find a good place in the docs that explains this but it is mentioned here:
https://github.com/cython/cython/blob/master/docs/src/userguide/pyrex_differences.rst#cdef-inline

A long time ago I profiled a bunch of different ways to dispatch to find the most efficient method of indirecting from one bit of cython code to another:
https://github.com/oscarbenjamin/euler
I think I found that the v-table is generally the fastest way i.e. having a superclass call a subclass cdef method. There has to be some overhead though because it needs to do a v-table lookup and then it needs to call the other function.

This is not like calling a method in Python that needs to look up the method name by string in a dict though. Also worth noting that some special methods on a Cython class can be looked up faster in C because the methods are hard-coded into the type struct rather than needing a name lookup e.g. I think this happens with __add__ et al but not _add_.

@Jake-Moss
Copy link
Contributor Author

Jake-Moss commented Aug 20, 2024

Does this make any different to timings?

In a rather quick and dirty benchmark, slightly. Debug builds show similar deltas on my system

from flint import *
import timeit
ctx = fmpz_mpoly_ctx.get_context(2, Ordering.lex, 'x')

def foo():
    f = ctx.from_dict({(0, 0): 200000})
    for _ in range(100000):
        f = f - 1
    return f


timeit.timeit(stmt="foo()", globals={"foo": foo}, number=100)

# build type release on this PR
# 1.8037770350019855
# 1.775936335001461
# 1.7955783019970113

# build type release on master
# 1.99440729500202
# 2.0311293699996895
# 2.105380791999778

@oscarbenjamin
Copy link
Collaborator

Timings for master:

In [5]: ctx = fmpz_mpoly_ctx(3, Ordering.lex, ["x", "y", "z"])

In [6]: x, y, z = ctx.gens()

In [7]: p = x + y + z

In [8]: %timeit p+1
2.06 µs ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [9]: %timeit p+p
1.76 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [10]: one = fmpz(1)

In [12]: %timeit p+one
1.74 µs ± 90.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Timings for the PR:

In [6]: %timeit p+1
1.81 µs ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [7]: %timeit p+p
1.31 µs ± 6.16 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [9]: %timeit p+one
1.74 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

That's about 15% faster on average I guess...

@Jake-Moss
Copy link
Contributor Author

Not quite. In Cython a subclass cdef method is picked using a virtual method table.

Ah yep ofc because they're still member functions not stand-alone C functions

This is not like calling a method in Python that needs to look up the method name by string in a dict though. Also worth noting that some special methods on a Cython class can be looked up faster in C because the methods are hard-coded into the type struct rather than needing a name lookup e.g. I think this happens with add et al but not add.

That's interesting, seems like a reasonable optimisation

@oscarbenjamin
Copy link
Collaborator

Okay, looks good.

@oscarbenjamin oscarbenjamin merged commit 630e20c into flintlib:master Aug 21, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants