Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modules/fp_graded/morphism.py test failure #33794

Closed
collares opened this issue May 4, 2022 · 17 comments
Closed

modules/fp_graded/morphism.py test failure #33794

collares opened this issue May 4, 2022 · 17 comments

Comments

@collares
Copy link
Contributor

collares commented May 4, 2022

Happened when doctesting with --long --all and 64 threads, both on x86_64 and on aarch64. Possibly related to #30680.

sage -t --long --random-seed=339659929997122211039564167058187187237 /nix/store/bfvc3kipmpmapbikj4zjlgdnglv5hrx2-sage-src-9.6.rc3/src/sage/modules/fp_graded/morphism.py
**********************************************************************
File "/nix/store/bfvc3kipmpmapbikj4zjlgdnglv5hrx2-sage-src-9.6.rc3/src/sage/modules/fp_graded/morphism.py", line 1571, in sage.modules.fp_graded.morphism.FPModuleMorphism.image
Failed example:
    K.is_injective()  # long time
Exception raised:
    Traceback (most recent call last):
      File "/nix/store/s6xgkv83xziazw85fvay1zqigfd972jb-python3-3.9.12-env/lib/python3.9/site-packages/sage/doctest/forker.py", line 695, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/nix/store/s6xgkv83xziazw85fvay1zqigfd972jb-python3-3.9.12-env/lib/python3.9/site-packages/sage/doctest/forker.py", line 1101, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.modules.fp_graded.morphism.FPModuleMorphism.image[11]>", line 1, in <module>
        K.is_injective()  # long time
      File "/nix/store/s6xgkv83xziazw85fvay1zqigfd972jb-python3-3.9.12-env/lib/python3.9/site-packages/sage/modules/fp_graded/steenrod/morphism.py", line 122, in is_injective
        finite_algebra = SteenrodAlgebra_generic(algebra.prime(), profile=self.profile())
      File "/nix/store/s6xgkv83xziazw85fvay1zqigfd972jb-python3-3.9.12-env/lib/python3.9/site-packages/sage/modules/fp_graded/steenrod/morphism.py", line 90, in profile
        return enveloping_profile_elements(elements,
      File "/nix/store/s6xgkv83xziazw85fvay1zqigfd972jb-python3-3.9.12-env/lib/python3.9/site-packages/sage/modules/fp_graded/steenrod/profile.py", line 132, in enveloping_profile_elements
        profiles = [profile_elt(x) for x in alist if x != 0]
      File "/nix/store/s6xgkv83xziazw85fvay1zqigfd972jb-python3-3.9.12-env/lib/python3.9/site-packages/sage/modules/fp_graded/steenrod/profile.py", line 132, in <listcomp>
        profiles = [profile_elt(x) for x in alist if x != 0]
      File "/nix/store/s6xgkv83xziazw85fvay1zqigfd972jb-python3-3.9.12-env/lib/python3.9/site-packages/sage/modules/fp_graded/steenrod/profile.py", line 82, in profile_elt
        minprofile = [max(0, n.exact_log(char) + 1) for n in elt]
      File "/nix/store/s6xgkv83xziazw85fvay1zqigfd972jb-python3-3.9.12-env/lib/python3.9/site-packages/sage/modules/fp_graded/steenrod/profile.py", line 82, in <listcomp>
        minprofile = [max(0, n.exact_log(char) + 1) for n in elt]
    AttributeError: 'int' object has no attribute 'exact_log'

CC: @jhpalmieri @tscrim

Component: algebra

Author: John Palmieri

Branch/Commit: 0b48601

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/33794

@collares collares added this to the sage-9.7 milestone May 4, 2022
@fchapoton
Copy link
Contributor

comment:1

works fine for me on ubuntu 20.04:

Doctesting 1 file.
sage -t --long --random-seed=339659929997122211039564167058187187237 src/sage/modules/fp_graded/morphism.py
    [432 tests, 17.66 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------

@jhpalmieri
Copy link
Member

comment:2

We can make this change, just to be on the safe side:

diff --git a/src/sage/modules/fp_graded/steenrod/profile.py b/src/sage/modules/fp_graded/steenrod/profile.py
index 5dd91e1b26..0e30b4b413 100755
--- a/src/sage/modules/fp_graded/steenrod/profile.py
+++ b/src/sage/modules/fp_graded/steenrod/profile.py
@@ -79,7 +79,7 @@ def profile_elt(elt, char=2):
         pass
 
     if char == 2:
-        minprofile = [max(0, n.exact_log(char) + 1) for n in elt]
+        minprofile = [max(0, ZZ(n).exact_log(char) + 1) for n in elt]
         return find_min_profile(minprofile, char)
 
     # odd primes:

n should already be a Sage integer, but it doesn't hurt to make sure.

@jhpalmieri
Copy link
Member

Branch: u/jhpalmieri/trac33794

@jhpalmieri
Copy link
Member

Commit: 0b48601

@jhpalmieri
Copy link
Member

Author: John Palmieri

@jhpalmieri
Copy link
Member

New commits:

0b48601trac 33794: in modules/fp_graded/steenrod/profile.py, explicitly

@tscrim
Copy link
Collaborator

tscrim commented May 5, 2022

Reviewer: Travis Scrimshaw

@tscrim
Copy link
Collaborator

tscrim commented May 5, 2022

comment:5

My guess is that it is not machine dependent but order dependent on the doctests: Something is getting cached with an int Instead of an integer, which makes its way into that part of the code. While it doesn’t hurt to do this, it might cover up a slightly deeper problem with not converting inputs to the proper parent (it might also be possible to make this, say, an element of QQ instead).

Anyways, for now, green bot => positive review.

@jhpalmieri
Copy link
Member

comment:6

It might be a sign of a deeper problem, but without more information, for example a repeatable failure, I have no idea how to diagnose it.

@collares
Copy link
Contributor Author

collares commented May 5, 2022

comment:7

The initial failure happened on NixOS CI on builder that has 64 cores. I reproduced this on the first try locally (on x86_64) by running sage -t --long -p 64 --all, even though my laptop only has 8 cores; 16GB RAM was enough to run 64 threads until the morphism.py failure.

@collares

This comment has been minimized.

@collares collares changed the title modules/fp_graded/morphism.py test failure on aarch64 modules/fp_graded/morphism.py test failure May 5, 2022
@tscrim
Copy link
Collaborator

tscrim commented May 6, 2022

comment:8

Well, since I cannot reproduce the original error, I can only say that this definitely should fix it. Tests pass for me, so I am going to set a positive review.

comment:7: Strange. Perhaps it is related to the number of cores somehow? That would be surprising to me. Do you get the failure when running it more locally, say, on this file or with the folder specifically?

@collares
Copy link
Contributor Author

collares commented May 9, 2022

comment:9

comment:8: I unfortunately have no good way of reproducing this. Running the test suite just on the modules directory is not enough to reproduce it. I don't know if this is to be trusted, since I don't believe the bug is 100% deterministic.

I noticed something while browsing the code: modules/fp_graded/module.py:vector_presentation and modules/fp_graded/free_module.py:vector_presentation both have the @cached_method decorator. This is not concerning per se, but in both cases there's __getitem__ = vector_presentation line in the corresponding files. Python does lookups differently for special methods, and misc/cachefunc.pyx has to handle them separately because of this (https://github.com/sagemath/sagetrac-mirror/blob/develop/src/sage/misc/cachefunc.pyx?h=develop&id=0b4860129e0f686594667be1ad3a3e1073478a56#n3092); I worry that the __getitem__ = vector_presentation line bypasses the check done by the decorator. I know nothing about the surrounding code, so I don't know how plausible the above theory is. In any case, I will try removing the @cached_method decorator for those two methods and see if it helps on NixOS CI.

(A similar problem might affect the line __bool__ = is_zero in modules/fp_graded/morphism.py, but this shouldn't be relevant for the test failure because this __bool__ instance is unused in the testsuite.)

@jhpalmieri
Copy link
Member

comment:10

For what it's worth, __getitem__ barely gets used. I changed the line __getitem__ = vector_presentation to

    def __getitem__(self, n, verbose=False):
        print('hello')
        return self.vector_presentation(n, verbose)

and there was only one doctest failure when I tested modules/fp_graded/:

File "src/sage/modules/fp_graded/module.py", line 826, in sage.modules.fp_graded.module.FPModule.vector_presentation
Failed example:
    M[4].dimension()
Expected:
    3
Got:
    hello
    3

@tscrim
Copy link
Collaborator

tscrim commented May 10, 2022

comment:11

Replying to @collares:

comment:8: I unfortunately have no good way of reproducing this. Running the test suite just on the modules directory is not enough to reproduce it. I don't know if this is to be trusted, since I don't believe the bug is 100% deterministic.

That makes it seem like it is some kind of caching issue with an int getting leaked in somehow through some other code path. Anyways, not serious enough to warrant a deep dive into the code to find it IMO.

Python does lookups differently for special methods, and misc/cachefunc.pyx has to handle them separately because of this (https://github.com/sagemath/sagetrac-mirror/blob/develop/src/sage/misc/cachefunc.pyx?h=develop&id=0b4860129e0f686594667be1ad3a3e1073478a56#n3092); I worry that the __getitem__ = vector_presentation line bypasses the check done by the decorator.

I don’t think it bypasses it, but this is easy to verify similar to John’s test in comment:10.

I know nothing about the surrounding code, so I don't know how plausible the above theory is. In any case, I will try removing the @cached_method decorator for those two methods and see if it helps on NixOS CI.

I don’t think that is relevant to the bugs at hand as I cannot see any relation. There could potentially be a separate issue with the caching not being done as it should, but that is a performance issue and not a true bug IMO.

@collares
Copy link
Contributor Author

comment:12

You two are absolutely right, of course! Thanks for the explanation. Besides, making such a specific guess without having a way to reproduce the failure was clearly a bad idea.

Fortunately, I now have a way to do reproduce the bug! Applying the following patch, the bug shows up every time when running sage -t --long src/sage/modules/fp_graded/morphism.py:

diff --git a/src/sage/modules/fp_graded/steenrod/morphism.py b/src/sage/modules/fp_graded/steenrod/morphism.py
index d55b51faf4..b0cf86c587 100755
--- a/src/sage/modules/fp_graded/steenrod/morphism.py
+++ b/src/sage/modules/fp_graded/steenrod/morphism.py
@@ -82,11 +82,16 @@ class SteenrodFPModuleMorphism(FPModuleMorphism):
         def _flatten(f):
             return [c for value in f for c in value.dense_coefficient_list()]
 
+        import gc
+        n = gc.collect()
+
         elements = (_flatten(self.domain().relations())
                     + _flatten(self.codomain().relations())
                     + _flatten(self.values()))
         elements = [a for a in elements if a not in (0, 1)]
 
         return enveloping_profile_elements(elements,
                                            char=self.base_ring().characteristic())
 
diff --git a/src/sage/structure/unique_representation.py b/src/sage/structure/unique_representation.py
index 5d8d4ad758..4afccaec88 100644
--- a/src/sage/structure/unique_representation.py
+++ b/src/sage/structure/unique_representation.py
@@ -989,7 +989,7 @@ class CachedRepresentation(metaclass=ClasscallMetaclass):
     :meth:`__init__<object.__init__>`.
     """
 
-    @weak_cached_function(cache=128)  # automatically a staticmethod
+    @weak_cached_function(cache=0)  # automatically a staticmethod
     def __classcall__(cls, *args, **options):
         """
         Construct a new object of this class or reuse an existing one.

@vbraun
Copy link
Member

vbraun commented May 22, 2022

Changed branch from u/jhpalmieri/trac33794 to 0b48601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants