implement van Hoeven's algorithm for relaxed multiplication of power series #34616

mantepse · 2022-09-29T21:55:03Z

This is an experimental implementation of the algorithm presented in section 4.2. of van der Hoeven's "Relax but don't be too lazy".

CC: @tscrim @fchapoton

Component: combinatorics

Keywords: LazyPowerSeries

Author: Martin Rubey

Branch/Commit: u/mantepse/implement_van_hoeven_s_algorithm_for_relaxed_multiplication_of_power_series @ 1de1e6d

Issue created by migration from https://trac.sagemath.org/ticket/34616

The text was updated successfully, but these errors were encountered:

mantepse · 2022-09-29T21:56:42Z

Branch: u/mantepse/implement_van_hoeven_s_algorithm_for_relaxed_multiplication_of_power_series

mantepse · 2022-09-29T22:00:09Z

comment:2

Unfortunately, the code does not work yet, because I have some trouble with some details in the article.

Last 10 new commits:

`52ac7af`	`pyflakes observations`
`06ec3d0`	`Merge branch 'develop' of trac.sagemath.org:sage into t/32367/replace_lazy_power_series-32367`
`4172688`	`Merge branch 'u/mantepse/implement_arithmetic_product_of_lazy_symmetric_functions' of trac.sagemath.org:sage into t/32367/replace_lazy_power_series-32367`
`41ca99b`	`Merge branch 'u/mantepse/replace_lazy_power_series-32367' of trac.sagemath.org:sage into t/34470/categories_lazy_series-34470`
`fb3a5cd`	`Merge branch 'u/mantepse/categories_lazy_series-34470' of trac.sagemath.org:sage into t/34552/lazy_series_test_suite-34552`
`855d2bf`	`remove unused variable`
`75c275c`	`add documentation and doctests for _approximate_order`
`5393242`	`fixes for pycodestyle and pyflakes`
`081d0e5`	`Merge branch 'u/mantepse/lazy_series_test_suite-34552' of trac.sagemath.org:sage into t/34616/implement_van_hoeven_s_algorithm_for_relaxed_multiplication_of_power_series`
`87d9db1`	`non-working first attempt`

mantepse · 2022-09-29T22:00:09Z

Author: Martin Rubey

mantepse · 2022-09-29T22:00:09Z

Changed keywords from none to LazyPowerSeries

mantepse · 2022-09-29T22:00:09Z

Commit: 87d9db1

mantepse · 2022-10-03T09:39:51Z

comment:3

I think the mistake is that the caching is not implemented correctly. (References below are to van der Hoeven's paper)

Sec. 2.2., pg. 484 says that Series_Rep has an attribute n, which says up to which degree the values in the cache are already correct. However, The order of φ is allowed to exceed n in order to anticipate future computations. Moreover, the coefficients φ_0, . . . , φ_{k−1} must be computed before computing φ_k.

In Sec. 4.2.1., pg. 502, the definition of of DAC_Rep does not specify n initially, so it is probably meant to be 0. However, in Sec. 4.2.3. the definition of DAC_Rep does specify n := N/2, which the current branch does not take into account.

The implementation of van der Hoeven's algorithm should really use the dense version of streams. It probably makes sense to adapt the framework slightly.

sagetrac-git · 2022-10-04T10:43:39Z

Branch pushed to git repo; I updated commit sha1. New commits:

`1de1e6d`	`ugly, but working`

sagetrac-git · 2022-10-04T10:43:39Z

Changed commit from 87d9db1 to 1de1e6d

mantepse · 2022-10-04T11:45:01Z

comment:5

Although the implementation is quite ugly in some details, I think we can learn enough for some decisions.

the code (without any optimizations) begins to be faster for integer multiplication, if we want to compute all the first 270 or so coefficients of the product:

sage: from sage.data_structures.stream import (Stream_cauchy_mul, Stream_cauchy_mul_fast, Stream_function)
sage: f = Stream_function(lambda n: n, True, 0)
sage: g = Stream_function(lambda n: 1, True, 0)

sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = [h1[i] for i in range(270)]
CPU times: user 40.7 ms, sys: 0 ns, total: 40.7 ms
Wall time: 40.7 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = [h2[i] for i in range(270)]
CPU times: user 43.1 ms, sys: 0 ns, total: 43.1 ms
Wall time: 43.1 ms

sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = [h1[i] for i in range(2000)]
CPU times: user 712 ms, sys: 0 ns, total: 712 ms
Wall time: 712 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = [h2[i] for i in range(2000)]
CPU times: user 1.85 s, sys: 0 ns, total: 1.85 s
Wall time: 1.85 s

if we are only interested in a single coefficient of the product, I don't think that we can come even close to the naive algorithm, and here the difference is, unfortunately, huge:

sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = h1[1000]
CPU times: user 233 ms, sys: 0 ns, total: 233 ms
Wall time: 233 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = h2[1000]
CPU times: user 2.43 ms, sys: 0 ns, total: 2.43 ms
Wall time: 2.44 ms

I am guessing that the most interesting application would be in the computation of the composition and the plethysm of power series. There, we always compute contiguous segments of coefficients of products. For example:

sage: from sage.data_structures.stream import Stream_function, Stream_cauchy_compose
sage: f = Stream_function(lambda n: n, True, 1)
sage: g = Stream_function(lambda n: n^2, True, 1)
sage: h = Stream_cauchy_compose(f, g)
sage: h[20]
289074264180
sage: [(len(x._cache), min(x._cache), max(x._cache)) for x in h._pos_powers[1:]]
[(20, 1, 20),
 (19, 2, 20),
 (18, 3, 20),
 (17, 4, 20),
 (16, 5, 20),
 (15, 6, 20),
 (14, 7, 20),
 (13, 8, 20),
 (12, 9, 20),
 (11, 10, 20),
 (10, 11, 20),
 (9, 12, 20),
 (8, 13, 20),
 (7, 14, 20),
 (6, 15, 20),
 (5, 16, 20),
 (4, 17, 20),
 (3, 18, 20),
 (2, 19, 20),
 (1, 20, 20)]

For the dense setting, I think that it is clear that we should be using van der Hoeven's algorithm. Of course, some polishing is needed. For example, the current implementation duplicates the caching mechanism:

sage: from sage.data_structures.stream import (Stream_cauchy_mul, Stream_cauchy_mul_fast, Stream_function)
sage: f = Stream_function(lambda n: n, True, 0)
sage: g = Stream_function(lambda n: 1, True, 0)
sage: h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5)
sage: l1 = [h1[i] for i in range(10)]
sage: len(h1._h._phi)
64
sage: h1._h._phi[:15]
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 0, 0, 0, 0, 0]
sage: h1._cache
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45]

mantepse · 2022-10-07T13:26:21Z

comment:6

Somewhat orthogonal to the ticket: I think the decision whether a Stream_XXX class uses a dense or a sparse cache should not depend on the input streams.

For example, for Stream_add it is actually irrelevant whether the input streams left and right are dense or sparse.

For Stream_zero and Stream_exact, a dense cache never makes sense.

mantepse · 2022-10-07T16:35:59Z

comment:7

See #34636

tscrim · 2022-10-11T05:42:30Z

comment:8

Thank you for doing this.

+1 on using van der Hoeven's algorithm for the dense. We might want to add some more documentation to this to explain when one version might be preferred to the other.

mantepse · 2022-10-11T06:51:51Z

comment:9

Thank you for looking at it :-) - and more generally, all the reviews!

The thing I am somewhat stuck with here is the class hierarchy. Consider:

class Stream_cauchy_mul_DAC():
    def __init__(self, left, right, phi, N, threshold):
        self._left = [left[k] for k in range(N)]
        self._right = [right[k] for k in range(N)]
        if phi is None:
            self._phi = [0]*(2*N)
            self._lo = None
            self._n = ZZ.zero()
        else:
            # TODO: the first N/2 entries of self._phi are already
            # computed, the computation of the next N/2 is initiated
            # by the next line.  Could / should this be done lazily?
            self._phi = [phi[k] for k in range(N)] + [0]*N
            self._lo = phi
            self._n = ZZ(N / 2)

Currently, this class does not inherit from anything. However, it shares functionality with Stream_inexact by providing the cache. Moreover, self._phi[:n] is actually the same as the cache, except that we produce (recursively) many copies of this, via get_coefficient below.

class Stream_cauchy_mul_fast(Stream_binary):
    def __init__(self, left, right, threshold=2 ** 1):
        super().__init__(left, right, False)
        self._threshold = threshold
        self._h = Stream_cauchy_mul_DAC(left, right, None,
                                        self._threshold, self._threshold)

    def get_coefficient(self, n):
        if n >= self._threshold and is_power_of_two(n):
            self._h = Stream_cauchy_mul_DAC(self._left, self._right, self._h,
                                            2*n, self._threshold)
        return self._h[n]

In summary: to my eyes, this is a complete mess. I think it would be more beautiful if the caching mechanism provided by Stream_inexact would be reused, possibly slightly generalized.

I do not understand the algorithm well enough to see how much of self._phi could actually be shared. Ideally, I'd like a single cache (provided by Stream_inexact or Stream_cauchy_mul) which is manipulated by Stream_cauchy_mul_DAC.

tscrim · 2022-10-12T09:00:15Z

comment:10

Is there a reason why these need to be two separate classes? It seems like you really just want to have one class that inherits from Stream_binary. I don't quite understand the obstruction to this from a quick look.

mantepse · 2022-10-12T09:34:17Z

comment:11

Stream_cauchy_mul_fast._h has it's own state (i.e., _phi). In fact, this is precisely the question: can the_phi attributes of the various Stream_cauchy_mul_DAC instances share memory?

(I should have documented: DAC is for divide and conquer)

tscrim · 2022-10-14T01:24:13Z

comment:12

It is certainly possible by passing the same list around and mutating it. However, each instance would just have to know which block is its responsibility, which creates a bit more complicated code structure and is likely harder to debug. How often are lists needed to be (re)constructed or are short-lived transient objects?

mantepse added this to the sage-9.8 milestone Sep 29, 2022

mantepse added the p: major / 3 label Sep 29, 2022

This comment has been minimized.

Sign in to view

mantepse added t: enhancement labels Sep 29, 2022

mkoeppe removed this from the sage-9.8 milestone Jan 29, 2023

mantepse mentioned this issue Oct 23, 2023

Implement integration and Taylor series for lazy series #36233

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement van Hoeven's algorithm for relaxed multiplication of power series #34616

implement van Hoeven's algorithm for relaxed multiplication of power series #34616

mantepse commented Sep 29, 2022

mantepse commented Sep 29, 2022

mantepse commented Sep 29, 2022

mantepse commented Sep 29, 2022

This comment has been minimized.

mantepse commented Sep 29, 2022

mantepse commented Sep 29, 2022

mantepse commented Oct 3, 2022

sagetrac-git mannequin commented Oct 4, 2022

sagetrac-git mannequin commented Oct 4, 2022

mantepse commented Oct 4, 2022

mantepse commented Oct 7, 2022

mantepse commented Oct 7, 2022

tscrim commented Oct 11, 2022

mantepse commented Oct 11, 2022

tscrim commented Oct 12, 2022

mantepse commented Oct 12, 2022

tscrim commented Oct 14, 2022

implement van Hoeven's algorithm for relaxed multiplication of power series #34616

implement van Hoeven's algorithm for relaxed multiplication of power series #34616

Comments

mantepse commented Sep 29, 2022

mantepse commented Sep 29, 2022

mantepse commented Sep 29, 2022

mantepse commented Sep 29, 2022

This comment has been minimized.

mantepse commented Sep 29, 2022

mantepse commented Sep 29, 2022

mantepse commented Oct 3, 2022

sagetrac-git mannequin commented Oct 4, 2022

sagetrac-git mannequin commented Oct 4, 2022

mantepse commented Oct 4, 2022

mantepse commented Oct 7, 2022

mantepse commented Oct 7, 2022

tscrim commented Oct 11, 2022

mantepse commented Oct 11, 2022

tscrim commented Oct 12, 2022

mantepse commented Oct 12, 2022

tscrim commented Oct 14, 2022