Prefer cdef to def in mpoly operands #192

Jake-Moss · 2024-08-20T13:25:54Z

This replaces a bunch of defs with cdefs which I believe removes all runtime reflection in operands bar the initial call. Commutative operands don't have reflected cdef functions. Reflected non-commutative operands just perform argument swap before calling the normal method, except rsub_scalar, which just does the whole thing itself due to the scalar type and additional negation

oscarbenjamin · 2024-08-20T13:56:32Z

Does this make any different to timings?

oscarbenjamin · 2024-08-20T14:08:46Z

This replaces a bunch of defs with cdefs which I believe removes all runtime reflection in operands bar the initial call.

Not quite. In Cython a subclass cdef method is picked using a virtual method table. I can't find a good place in the docs that explains this but it is mentioned here:
https://github.com/cython/cython/blob/master/docs/src/userguide/pyrex_differences.rst#cdef-inline

A long time ago I profiled a bunch of different ways to dispatch to find the most efficient method of indirecting from one bit of cython code to another:
https://github.com/oscarbenjamin/euler
I think I found that the v-table is generally the fastest way i.e. having a superclass call a subclass cdef method. There has to be some overhead though because it needs to do a v-table lookup and then it needs to call the other function.

This is not like calling a method in Python that needs to look up the method name by string in a dict though. Also worth noting that some special methods on a Cython class can be looked up faster in C because the methods are hard-coded into the type struct rather than needing a name lookup e.g. I think this happens with __add__ et al but not _add_.

Jake-Moss · 2024-08-20T14:25:36Z

Does this make any different to timings?

In a rather quick and dirty benchmark, slightly. Debug builds show similar deltas on my system

from flint import *
import timeit
ctx = fmpz_mpoly_ctx.get_context(2, Ordering.lex, 'x')

def foo():
    f = ctx.from_dict({(0, 0): 200000})
    for _ in range(100000):
        f = f - 1
    return f


timeit.timeit(stmt="foo()", globals={"foo": foo}, number=100)

# build type release on this PR
# 1.8037770350019855
# 1.775936335001461
# 1.7955783019970113

# build type release on master
# 1.99440729500202
# 2.0311293699996895
# 2.105380791999778

oscarbenjamin · 2024-08-20T14:27:17Z

Timings for master:

In [5]: ctx = fmpz_mpoly_ctx(3, Ordering.lex, ["x", "y", "z"])

In [6]: x, y, z = ctx.gens()

In [7]: p = x + y + z

In [8]: %timeit p+1
2.06 µs ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [9]: %timeit p+p
1.76 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [10]: one = fmpz(1)

In [12]: %timeit p+one
1.74 µs ± 90.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Timings for the PR:

In [6]: %timeit p+1
1.81 µs ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [7]: %timeit p+p
1.31 µs ± 6.16 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [9]: %timeit p+one
1.74 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

That's about 15% faster on average I guess...

Jake-Moss · 2024-08-20T14:27:53Z

Not quite. In Cython a subclass cdef method is picked using a virtual method table.

Ah yep ofc because they're still member functions not stand-alone C functions

This is not like calling a method in Python that needs to look up the method name by string in a dict though. Also worth noting that some special methods on a Cython class can be looked up faster in C because the methods are hard-coded into the type struct rather than needing a name lookup e.g. I think this happens with add et al but not add.

That's interesting, seems like a reasonable optimisation

oscarbenjamin · 2024-08-21T19:44:45Z

Okay, looks good.

Jake Moss and others added 2 commits August 20, 2024 14:01

cdef'd operands POC

7a3076c

Prefer cdef to def, removes runtime reflection in operands

c9de856

Jake-Moss mentioned this pull request Aug 20, 2024

Refactor fmpz_mpoly and fmpq_mpoly arithmatic functions #190

Merged

oscarbenjamin merged commit 630e20c into flintlib:master Aug 21, 2024
32 checks passed

Jake-Moss mentioned this pull request Aug 29, 2024

fix(nmod): Add nmod_ctx to store is_prime #179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefer cdef to def in mpoly operands #192

Prefer cdef to def in mpoly operands #192

Jake-Moss commented Aug 20, 2024

oscarbenjamin commented Aug 20, 2024

oscarbenjamin commented Aug 20, 2024

Jake-Moss commented Aug 20, 2024 •

edited

Loading

oscarbenjamin commented Aug 20, 2024

Jake-Moss commented Aug 20, 2024

oscarbenjamin commented Aug 21, 2024

Prefer cdef to def in mpoly operands #192

Prefer cdef to def in mpoly operands #192

Conversation

Jake-Moss commented Aug 20, 2024

oscarbenjamin commented Aug 20, 2024

oscarbenjamin commented Aug 20, 2024

Jake-Moss commented Aug 20, 2024 • edited Loading

oscarbenjamin commented Aug 20, 2024

Jake-Moss commented Aug 20, 2024

oscarbenjamin commented Aug 21, 2024

Jake-Moss commented Aug 20, 2024 •

edited

Loading