Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fmpq_poly.factor() return primitive factors #189

Merged
merged 15 commits into from
Aug 19, 2024

Conversation

oscarbenjamin
Copy link
Collaborator

@oscarbenjamin
Copy link
Collaborator Author

@GiacomoPope @Jake-Moss does either of you want to review this?

The main change here is that fmpq_factor returns primitive factors rather than monic factors e.g.:

In [7]: from flint import *

In [8]: x = fmpq_poly([0, 1])

In [9]: p = (x + fmpq(1,2))*(x - fmpq(1,4))/7

In [10]: p
Out[10]: 1/7*x^2 + 1/28*x + (-1/56)

In [11]: p.factor()
Out[11]: (1/56, [(4*x + (-1), 1), (2*x + 1, 1)])

This makes fmpq_poly consistent with fmpq_mpoly. Flint itself does not seem have an fmpq_poly_factor function so fmpq_poly.factor() uses fmpz_poly_factor and then on master it makes the factors monic. Here I've changed that to keep the factors primitive as returned by fmpz_poly_factor.

I also found some other issues:

fmpz_mod_poly.factor() crashes on the zero poly. This is documented for fmpz_mod_poly_factor:

Factorises a non-constant polynomial ...

There were inconsistent types in some places e.g. nmod_poly.leading_coefficient() returns int rather than nmod. I fixed that here.

The mpoly.factor() methods return fmpz as the multiplicity of a factor unlike other poly types that use int. I have fixed this for fmpz_mpoly and fmpq_mpoly but it might also need to be fixed for the new mpoly types in gh-164.

The mpoly types need to have a leading_coefficient method. I've added this but again it might be needed for new types in gh-164. It is also missing for some univariate types.

For fq_default and fq_default_poly there is no way to get the associated context from an instance e.g.:

In [6]: p = fq_default_ctx(163)

In [7]: x = p(1)

In [8]: x.ctx
---------------------------------------------------------------------------
AttributeError

I don't see any way to get the context, characteristic, etc from x. Same for fq_default_poly.

The modulus() has a different meaning for fq_default than for nmod and fmpz_mod. We probably need to add a .characteristic() method that could have the same meaning in all cases (at least the fmpz_mod context object could have this method). It also makes sense for fq_default_ctx to define .is_prime() just like fmpz_mod_ctx.

Also I have emphasised this a lot but let me just point out again that a carefully designed test like the test_factor_poly_mpoly test here can pick up on a lot of issues including ones that have been in the codebase for a long time. The test simultaneously tests all poly types and all mpoly types together and just covers a few simple and corner cases but it checks all types comprehensively and checks all outputs thoroughly. It also demonstrates clearly what the expected behaviour is supposed to be for each of the different types and where they are similar and where they are different. If we only write separate tests for different types or for univariate vs multivariate etc then we don't get the benefits of this.

@Jake-Moss
Copy link
Contributor

There were inconsistent types in some places e.g. nmod_poly.leading_coefficient() returns int rather than nmod. I fixed that here.

Is there a particular reason for this? Does something depend on this behaviour? It seems like an unnecessary conversion to me, any use with another python-flint object will result in a conversation back to a fmpz or similar almost immediately.

The mpoly.factor() methods return fmpz as the multiplicity of a factor unlike other poly types that use int. I have fixed this for fmpz_mpoly and fmpq_mpoly but it might also need to be fixed for the new mpoly types in gh-164.

Yes they will, I'll add those pending the above

Copy link
Contributor

@Jake-Moss Jake-Moss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my two comments it looks good to me

@@ -1798,6 +1798,20 @@ cdef class fmpz_mod_poly(flint_poly):
if not self.ctx.is_prime():
raise NotImplementedError("factor algorithm assumes that the modulus is prime")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new fmpz_mod_mpoly and nmod_mpoly are inconsistent with this exception type, they raise a DomainError instead. Seems like this isn't consistent across the code base, we have DomainError, NotImplementedError, and ValueError for similar exception reasons. What should the preference here?

Exported grep results:

src/flint/types/nmod_mpoly.pyx:312:            raise DomainError("division with non-prime modulus is not supported")
src/flint/types/nmod_mpoly.pyx:759:            raise DomainError("gcd with non-prime modulus is not supported")
src/flint/types/nmod_mpoly.pyx:780:            raise DomainError("square root with non-prime modulus is not supported")
src/flint/types/nmod_mpoly.pyx:811:            raise DomainError("factorisation with non-prime modulus is not supported")
src/flint/types/nmod_mpoly.pyx:853:            raise DomainError("factorisation with non-prime modulus is not supported")
src/flint/types/fmpz_mod.pyx:344:            raise NotImplementedError("algorithm assumes modulus is prime")
src/flint/types/fq_default.pyx:98:                    raise ValueError("characteristic is not prime")
src/flint/types/fq_default.pyx:108:            raise ValueError("either a prime or modulus must be passed for construction")
src/flint/types/fq_default.pyx:133:            raise ValueError("characteristic is not prime")
src/flint/types/fq_default.pyx:156:            raise ValueError("characteristic is not prime")
src/flint/types/fmpz_mod_poly.pyx:285:            raise NotImplementedError("minpoly algorithm assumes that the modulus is prime")
src/flint/types/fmpz_mod_poly.pyx:1270:            raise NotImplementedError("gcd algorithm assumes that the modulus is prime")
src/flint/types/fmpz_mod_poly.pyx:1765:            raise NotImplementedError("factor_squarefree algorithm assumes that the modulus is prime")
src/flint/types/fmpz_mod_poly.pyx:1799:            raise NotImplementedError("factor algorithm assumes that the modulus is prime")
src/flint/types/fmpz_mod_poly.pyx:1844:            raise NotImplementedError("factor algorithm assumes that the modulus is prime")
src/flint/types/fmpz_mod_mpoly.pyx:333:            raise DomainError("division with non-prime modulus is not supported")
src/flint/types/fmpz_mod_mpoly.pyx:783:            raise DomainError("gcd with non-prime modulus is not supported")
src/flint/types/fmpz_mod_mpoly.pyx:804:            raise DomainError("square root with non-prime modulus is not supported")
src/flint/types/fmpz_mod_mpoly.pyx:835:            raise DomainError("factorisation with non-prime modulus is not supported")
src/flint/types/fmpz_mod_mpoly.pyx:878:            raise DomainError("factorisation with non-prime modulus is not supported")
src/flint/test/test_all.py:3757:    assert raises(lambda: flint.fq_default_ctx(modulus=mod_not_prime), ValueError)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there should be a special exception type like NonPrimeModulusError because it is such a common case in the codebase.

It is also something that much real downstream code will never had to deal with because using a non-prime modulus is not that common or at least if using it then you would already know know not to use things like division.

@oscarbenjamin
Copy link
Collaborator Author

There were inconsistent types in some places e.g. nmod_poly.leading_coefficient() returns int rather than nmod. I fixed that here.

Is there a particular reason for this? Does something depend on this behaviour? It seems like an unnecessary conversion to me, any use with another python-flint object will result in a conversation back to a fmpz or similar almost immediately.

It definitely matters whether we return an int or an nmod because they have different behaviour. For example a common thing that you might want to do with a leading coefficient is to invert it:

In [1]: from flint import *

In [2]: lc_int = 3

In [3]: lc_nmod = nmod(3, 7)

In [4]: 1/lc_int
Out[4]: 0.3333333333333333

In [5]: 1/lc_nmod
Out[5]: 5

As for the unnecessary conversion there needs to be a conversion either way. The Flint routine returns a ulong and we need to turn that into a Python object whether that is PyLong or nmod. Cython handles it implicitly if you just return a ulong but it still needs to create an actual PyLong including allocating space on the heap for it.

One small difference actually is that CPython interns small integers (up to 255?) to avoid a heap allocation. We could do something similar with nmod if we had a context object to store the interned values for a given modulus. At least for small enough moduli it would make sense to do that when we can intern every possible value. It could be handled transparently by a context method like ctx.nmod_from_ulong().

@oscarbenjamin oscarbenjamin force-pushed the pr_fmpq_poly_factor_prim branch from 3178aa5 to 01e6418 Compare August 18, 2024 12:13
@oscarbenjamin
Copy link
Collaborator Author

It definitely matters whether we return an int or an nmod because they have different behaviour.

I just found a similar issue with nmod_mpoly and fmpz_mod_mpoly. The coefficients should be nmod and fmpz_mod but from gh-164 they are int and fmpz:

In [7]: ctx = fmpz_mod_mpoly_ctx(2, Ordering.lex, ["x", "y"], 11)

In [8]: x = ctx.gen(0)

In [9]: x.coeffs()
Out[9]: [1]

In [10]: type(x.coeffs()[0])
Out[10]: flint.types.fmpz.fmpz

In [11]: ctx = nmod_mpoly_ctx(2, Ordering.lex, ["x", "y"], 11)

In [12]: x = ctx.gen(0)

In [13]: type(x.coeffs()[0])
Out[13]: int

Likewise leading_coefficient needs to return nmod/fmpz_mod and also the constant term in the factor method.

Probably there are also inconsistencies in the factor_squarefree methods as well...

@oscarbenjamin
Copy link
Collaborator Author

Sorry @Jake-Moss, you reviewed this and then I made it 3x bigger. If you are still up for reviewing then please do...

The changes here are:

  • The main change is that fmpq_poly returns primitive rather than monic factors which is consistent with fmpq_mpoly and SymPy and other things.
  • I have added tests to ensure compatibility of different types and that operations work with all types.
  • I have added sqrt to all scalar types so all scalar and polynomial types have sqrt.
  • I have added factor_squarefree to all poly types that did not have it so that all poly/mpoly types can be assumed to have it.
  • I have added lead_coefficient to all poly types and now all poly/mpoly types can be assumed to have it.
  • Various types are made more consistent and various exceptions raised are changed or made consistent. A bigger discussion needs to be has about exception types.
  • I've added docstrings for some methods that did not have them.

What I have not fixed here is the type pf the coefficients for nmod_poly and fmpz_mod_poly which are int and fmpz when they should be nmod and fmpz_mod. That is something that needs to be fixed before next release,

@Jake-Moss
Copy link
Contributor

No worries, will do tonight

@Jake-Moss
Copy link
Contributor

There were inconsistent types in some places e.g. nmod_poly.leading_coefficient() returns int rather than nmod. I fixed that here.

Is there a particular reason for this? Does something depend on this behaviour? It seems like an unnecessary conversion to me, any use with another python-flint object will result in a conversation back to a fmpz or similar almost immediately.

It definitely matters whether we return an int or an nmod because they have different behaviour. For example a common thing that you might want to do with a leading coefficient is to invert it:

In [1]: from flint import *

In [2]: lc_int = 3

In [3]: lc_nmod = nmod(3, 7)

In [4]: 1/lc_int
Out[4]: 0.3333333333333333

In [5]: 1/lc_nmod
Out[5]: 5

As for the unnecessary conversion there needs to be a conversion either way. The Flint routine returns a ulong and we need to turn that into a Python object whether that is PyLong or nmod. Cython handles it implicitly if you just return a ulong but it still needs to create an actual PyLong including allocating space on the heap for it.

One small difference actually is that CPython interns small integers (up to 255?) to avoid a heap allocation. We could do something similar with nmod if we had a context object to store the interned values for a given modulus. At least for small enough moduli it would make sense to do that when we can intern every possible value. It could be handled transparently by a context method like ctx.nmod_from_ulong().

I'm sorry I quoted the wrong section of your message. The above makes complete sense but I was look at this change where the exponent is created as an fmpz, then immediately converted to an int.

The mpoly.factor() methods return fmpz as the multiplicity of a factor unlike other poly types that use int. I have fixed this for fmpz_mpoly and fmpq_mpoly but it might also need to be fixed for the new mpoly types in #164.

https://github.com/flintlib/python-flint/pull/189/files#diff-b5a069a771eae54c8982ca6dae36c6a291b19e7bb79effc990091b0f2304fdc4R959-R963

@oscarbenjamin
Copy link
Collaborator Author

I was look at this change where the exponent is created as an fmpz, then immediately converted to an int.

The mpoly.factor() methods return fmpz as the multiplicity of a factor unlike other poly types that use int.

It's important that the types are consistent and we don't get into a situation where we don't know what they are supposed to be any more. If it is supposed to be an int then it needs to be an int. I recently ran into a situation where having an fmpz in a place that would usually be an int caused a bug:
python/cpython#120950

I'm a bit surprised that an fmpz is used at all for the multiplicity in the factor struct. It suggests great ambition about the sizes of polynomials that can be factorised.

The nmod_poly code just casts the exponent to an integer which in principle would be incorrect if the multiplicity is greater than 2^(FLINT_BITS-2):

exp = fac.exp[i]
res[i] = (u, exp)

On a 64 bit system it is inconceivable that an nmod_poly could have a factor of multiplicity greater than 2^62. On a 32 bit system could it have greater than 2^30? Maybe at a push for GF(p) with very small p...

I suppose you can have something like:

In [14]: from flint import *

In [15]: ctx = fmpz_mpoly_ctx(2, Ordering.lex, ["x", "y"])

In [16]: x, y = ctx.gens()

In [17]: ((x*y)**(2**64)).factor()
Out[17]: (1, [(x, 18446744073709551616), (y, 18446744073709551616)])

It's hard to imagine that a nontrivial polynomial could be factorised and have a multiplicity that does not fit slong/ulong.

For dense polynomials it is perhaps impossible:

In [18]: p = nmod_poly(0, 11)

In [19]: p[2**32] = 1
Flint exception (General error):
    Unable to allocate memory (34359738376).
Aborted (core dumped)

(Ideally we would have a way of preventing that core dump...)

In any case though we need the types to be consistent. The multiplicity is never going to be large enough to need to be anything other than an int and all other types already return int. Factorisation is an extremely expensive operation so I doubt very much that the cost of converting the exponents is significant.

Copy link
Contributor

@Jake-Moss Jake-Moss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again all looks good to me

),
]


def test_mpolys():
for P, get_context, S, is_field in _all_mpolys():
for P, get_context, S, is_field, characteristic in _all_mpolys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new argument is unused in this test function, should the characteristic_zero be replaced with a test on this instead? Functionally it's no different but is inconsistent with the new test_factor_poly_mpoly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point. I changed the division_not_supported to:

composite_characteristic = characteristic != 0 and not characteristic.is_prime()

That makes more sense because things like factor etc also depend on it.

other = self.ctx.any_as_fmpz_mod_poly(other)
if other is NotImplemented:
raise TypeError(f"Cannot interpret {other} as a polynomial")

if not self.ctx.is_prime():
raise NotImplementedError("gcd algorithm assumes that the modulus is prime")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this one was missed

Suggested change
raise NotImplementedError("gcd algorithm assumes that the modulus is prime")
raise DomainError("gcd algorithm assumes that the modulus is prime")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah. Good point.

I've added this case to the tests and fixed it.

assert P([1, 1]) // 2 == P([0, 0])
assert P([1, 1]) % 2 == P([1, 1])
elif nmod_poly_will_crash:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumable these passes will be replace when #179 is merged

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd rather not have a variable called nmod_poly_will_crash in the test code :)

Fact is it does crash a lot though...

@Jake-Moss
Copy link
Contributor

Jake-Moss commented Aug 19, 2024

In any case though we need the types to be consistent. The multiplicity is never going to be large enough to need to be anything other than an int and all other types already return int. Factorisation is an extremely expensive operation so I doubt very much that the cost of converting the exponents is significant.

Thanks make sense, more than reasonable to have that.

Comment on lines 909 to +912
c = fmpz.__new__(fmpz)
fmpz_set((<fmpz>c).val, &fac.exp[i])

res[i] = (u, c)
res[i] = (u, int(c))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could likely get away with using fmpz_get_ui here although it would be fiddly to do correctly.

Probably not worth it.

Comment on lines +670 to +674
elif factor_type == 'irreducible':
lead = nmod_poly_factor(fac, self.val)
elif factor_type == 'squarefree':
nmod_poly_factor_squarefree(fac, self.val)
lead = (<nmod>self.leading_coefficient()).val
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about this.

The doc says:

void nmod_poly_factor_squarefree(nmod_poly_factor_t res, const nmod_poly_t f)

Sets res to a square-free factorization of f.

Whereas nmod_poly_factor returns the leading coefficient nmod_poly_factor_squarefree does not mention anything about it.

The factorisation seems to be just all monic factors though so I presume we just need to take the leading coefficient like this:

In [2]: x = nmod_poly([0, 1], 7)

In [3]: (3*(x + 1)*(x + 4)**2).factor_squarefree()
Out[3]: (3, [(x + 1, 1), (x + 4, 2)])

Comment on lines 7 to +11
ctypedef struct nmod_poly_factor_struct:
nmod_poly_struct *p
long *exp
long num
long alloc
slong *exp
slong num
slong alloc
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was out of data compared to what I see in the Flint code.

On 64-bit Windows at least long and slong are not equivalent and this has been a common cause of Windows-specific crashes.

I think over time Flint itself has cleaned up its use of slong in place of long for many interfaces. Some of python-flint's code predates Flint cleaning these things up though so we have some things like this that are out of date.

It is likely that almost all of these 635 lines should change from long to slong:

$ git grep ' long ' | wc -l
635

Certainly these are out of date because prec should be slong:

$ git grep long src/flint/flintlib/arb.pxd | head
src/flint/flintlib/arb.pxd:from flint.flintlib.flint cimport ulong, slong, flint_rand_t
src/flint/flintlib/arb.pxd:    void arb_set_round(arb_t z, const arb_t x, long prec)
src/flint/flintlib/arb.pxd:    void arb_neg_round(arb_t x, const arb_t y, long prec)
src/flint/flintlib/arb.pxd:    void arb_set_si(arb_t x, long y)
src/flint/flintlib/arb.pxd:    void arb_set_ui(arb_t x, ulong y)
src/flint/flintlib/arb.pxd:    void arb_set_round_fmpz_2exp(arb_t y, const fmpz_t x, const fmpz_t exp, long prec)
src/flint/flintlib/arb.pxd:    void arb_set_round_fmpz(arb_t y, const fmpz_t x, long prec)
src/flint/flintlib/arb.pxd:    void arb_printd(const arb_t x, long digits)
src/flint/flintlib/arb.pxd:    void arb_mul_2exp_si(arb_t y, const arb_t x, long e)
src/flint/flintlib/arb.pxd:    void arb_get_abs_ubound_arf(arf_t u, const arb_t x, long prec)

https://flintlib.org/doc/arb.html

@oscarbenjamin
Copy link
Collaborator Author

I see that gh-190 is waiting on this so I'll merge once tests pass.

@oscarbenjamin oscarbenjamin merged commit f2bda3c into flintlib:master Aug 19, 2024
32 checks passed
@oscarbenjamin oscarbenjamin deleted the pr_fmpq_poly_factor_prim branch August 19, 2024 15:07
@oscarbenjamin
Copy link
Collaborator Author

Thanks for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants