Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement van Hoeven's algorithm for relaxed multiplication of power series #34616

Open
mantepse opened this issue Sep 29, 2022 · 17 comments
Open

Comments

@mantepse
Copy link
Collaborator

This is an experimental implementation of the algorithm presented in section 4.2. of van der Hoeven's "Relax but don't be too lazy".

CC: @tscrim @fchapoton

Component: combinatorics

Keywords: LazyPowerSeries

Author: Martin Rubey

Branch/Commit: u/mantepse/implement_van_hoeven_s_algorithm_for_relaxed_multiplication_of_power_series @ 1de1e6d

Issue created by migration from https://trac.sagemath.org/ticket/34616

@mantepse mantepse added this to the sage-9.8 milestone Sep 29, 2022
@mantepse
Copy link
Collaborator Author

@mantepse
Copy link
Collaborator Author

comment:2

Unfortunately, the code does not work yet, because I have some trouble with some details in the article.


Last 10 new commits:

52ac7afpyflakes observations
06ec3d0Merge branch 'develop' of trac.sagemath.org:sage into t/32367/replace_lazy_power_series-32367
4172688Merge branch 'u/mantepse/implement_arithmetic_product_of_lazy_symmetric_functions' of trac.sagemath.org:sage into t/32367/replace_lazy_power_series-32367
41ca99bMerge branch 'u/mantepse/replace_lazy_power_series-32367' of trac.sagemath.org:sage into t/34470/categories_lazy_series-34470
fb3a5cdMerge branch 'u/mantepse/categories_lazy_series-34470' of trac.sagemath.org:sage into t/34552/lazy_series_test_suite-34552
855d2bfremove unused variable
75c275cadd documentation and doctests for _approximate_order
5393242fixes for pycodestyle and pyflakes
081d0e5Merge branch 'u/mantepse/lazy_series_test_suite-34552' of trac.sagemath.org:sage into t/34616/implement_van_hoeven_s_algorithm_for_relaxed_multiplication_of_power_series
87d9db1non-working first attempt

@mantepse
Copy link
Collaborator Author

Author: Martin Rubey

@mantepse

This comment has been minimized.

@mantepse
Copy link
Collaborator Author

Changed keywords from none to LazyPowerSeries

@mantepse
Copy link
Collaborator Author

Commit: 87d9db1

@mantepse
Copy link
Collaborator Author

mantepse commented Oct 3, 2022

comment:3

I think the mistake is that the caching is not implemented correctly. (References below are to van der Hoeven's paper)

Sec. 2.2., pg. 484 says that Series_Rep has an attribute n, which says up to which degree the values in the cache are already correct. However, The order of φ is allowed to exceed n in order to anticipate future computations. Moreover, the coefficients φ_0, . . . , φ_{k−1} must be computed before computing φ_k.

In Sec. 4.2.1., pg. 502, the definition of of DAC_Rep does not specify n initially, so it is probably meant to be 0. However, in Sec. 4.2.3. the definition of DAC_Rep does specify n := N/2, which the current branch does not take into account.

The implementation of van der Hoeven's algorithm should really use the dense version of streams. It probably makes sense to adapt the framework slightly.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 4, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

1de1e6dugly, but working

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 4, 2022

Changed commit from 87d9db1 to 1de1e6d

@mantepse
Copy link
Collaborator Author

mantepse commented Oct 4, 2022

comment:5

Although the implementation is quite ugly in some details, I think we can learn enough for some decisions.

  • the code (without any optimizations) begins to be faster for integer multiplication, if we want to compute all the first 270 or so coefficients of the product:
sage: from sage.data_structures.stream import (Stream_cauchy_mul, Stream_cauchy_mul_fast, Stream_function)
sage: f = Stream_function(lambda n: n, True, 0)
sage: g = Stream_function(lambda n: 1, True, 0)

sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = [h1[i] for i in range(270)]
CPU times: user 40.7 ms, sys: 0 ns, total: 40.7 ms
Wall time: 40.7 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = [h2[i] for i in range(270)]
CPU times: user 43.1 ms, sys: 0 ns, total: 43.1 ms
Wall time: 43.1 ms

sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = [h1[i] for i in range(2000)]
CPU times: user 712 ms, sys: 0 ns, total: 712 ms
Wall time: 712 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = [h2[i] for i in range(2000)]
CPU times: user 1.85 s, sys: 0 ns, total: 1.85 s
Wall time: 1.85 s
  • if we are only interested in a single coefficient of the product, I don't think that we can come even close to the naive algorithm, and here the difference is, unfortunately, huge:
sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = h1[1000]
CPU times: user 233 ms, sys: 0 ns, total: 233 ms
Wall time: 233 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = h2[1000]
CPU times: user 2.43 ms, sys: 0 ns, total: 2.43 ms
Wall time: 2.44 ms
  • I am guessing that the most interesting application would be in the computation of the composition and the plethysm of power series. There, we always compute contiguous segments of coefficients of products. For example:
sage: from sage.data_structures.stream import Stream_function, Stream_cauchy_compose
sage: f = Stream_function(lambda n: n, True, 1)
sage: g = Stream_function(lambda n: n^2, True, 1)
sage: h = Stream_cauchy_compose(f, g)
sage: h[20]
289074264180
sage: [(len(x._cache), min(x._cache), max(x._cache)) for x in h._pos_powers[1:]]
[(20, 1, 20),
 (19, 2, 20),
 (18, 3, 20),
 (17, 4, 20),
 (16, 5, 20),
 (15, 6, 20),
 (14, 7, 20),
 (13, 8, 20),
 (12, 9, 20),
 (11, 10, 20),
 (10, 11, 20),
 (9, 12, 20),
 (8, 13, 20),
 (7, 14, 20),
 (6, 15, 20),
 (5, 16, 20),
 (4, 17, 20),
 (3, 18, 20),
 (2, 19, 20),
 (1, 20, 20)]
  • For the dense setting, I think that it is clear that we should be using van der Hoeven's algorithm. Of course, some polishing is needed. For example, the current implementation duplicates the caching mechanism:
sage: from sage.data_structures.stream import (Stream_cauchy_mul, Stream_cauchy_mul_fast, Stream_function)
sage: f = Stream_function(lambda n: n, True, 0)
sage: g = Stream_function(lambda n: 1, True, 0)
sage: h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5)
sage: l1 = [h1[i] for i in range(10)]
sage: len(h1._h._phi)
64
sage: h1._h._phi[:15]
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 0, 0, 0, 0, 0]
sage: h1._cache
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45]

@mantepse
Copy link
Collaborator Author

mantepse commented Oct 7, 2022

comment:6

Somewhat orthogonal to the ticket: I think the decision whether a Stream_XXX class uses a dense or a sparse cache should not depend on the input streams.

For example, for Stream_add it is actually irrelevant whether the input streams left and right are dense or sparse.

For Stream_zero and Stream_exact, a dense cache never makes sense.

@mantepse
Copy link
Collaborator Author

mantepse commented Oct 7, 2022

comment:7

See #34636

@tscrim
Copy link
Collaborator

tscrim commented Oct 11, 2022

comment:8

Thank you for doing this.

+1 on using van der Hoeven's algorithm for the dense. We might want to add some more documentation to this to explain when one version might be preferred to the other.

@mantepse
Copy link
Collaborator Author

comment:9

Thank you for looking at it :-) - and more generally, all the reviews!

The thing I am somewhat stuck with here is the class hierarchy. Consider:

class Stream_cauchy_mul_DAC():
    def __init__(self, left, right, phi, N, threshold):
        self._left = [left[k] for k in range(N)]
        self._right = [right[k] for k in range(N)]
        if phi is None:
            self._phi = [0]*(2*N)
            self._lo = None
            self._n = ZZ.zero()
        else:
            # TODO: the first N/2 entries of self._phi are already
            # computed, the computation of the next N/2 is initiated
            # by the next line.  Could / should this be done lazily?
            self._phi = [phi[k] for k in range(N)] + [0]*N
            self._lo = phi
            self._n = ZZ(N / 2)

Currently, this class does not inherit from anything. However, it shares functionality with Stream_inexact by providing the cache. Moreover, self._phi[:n] is actually the same as the cache, except that we produce (recursively) many copies of this, via get_coefficient below.

class Stream_cauchy_mul_fast(Stream_binary):
    def __init__(self, left, right, threshold=2 ** 1):
        super().__init__(left, right, False)
        self._threshold = threshold
        self._h = Stream_cauchy_mul_DAC(left, right, None,
                                        self._threshold, self._threshold)

    def get_coefficient(self, n):
        if n >= self._threshold and is_power_of_two(n):
            self._h = Stream_cauchy_mul_DAC(self._left, self._right, self._h,
                                            2*n, self._threshold)
        return self._h[n]

In summary: to my eyes, this is a complete mess. I think it would be more beautiful if the caching mechanism provided by Stream_inexact would be reused, possibly slightly generalized.

I do not understand the algorithm well enough to see how much of self._phi could actually be shared. Ideally, I'd like a single cache (provided by Stream_inexact or Stream_cauchy_mul) which is manipulated by Stream_cauchy_mul_DAC.

@tscrim
Copy link
Collaborator

tscrim commented Oct 12, 2022

comment:10

Is there a reason why these need to be two separate classes? It seems like you really just want to have one class that inherits from Stream_binary. I don't quite understand the obstruction to this from a quick look.

@mantepse
Copy link
Collaborator Author

comment:11

Stream_cauchy_mul_fast._h has it's own state (i.e., _phi). In fact, this is precisely the question: can the_phi attributes of the various Stream_cauchy_mul_DAC instances share memory?

(I should have documented: DAC is for divide and conquer)

@tscrim
Copy link
Collaborator

tscrim commented Oct 14, 2022

comment:12

It is certainly possible by passing the same list around and mutating it. However, each instance would just have to know which block is its responsibility, which creates a bit more complicated code structure and is likely harder to debug. How often are lists needed to be (re)constructed or are short-lived transient objects?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants