Use wheel factorization for prime numbers #337

chiphogg · 2022-03-10T23:34:07Z

Certain existing units in the library require very large prime
numbers---so large, in fact, that our naive trial division hits the
iteration limit for constexpr loops. We don't want to force users
to provide a compiler option override, so we'd better find another way.

The solution is to use the "wheel factorization" algorithm:
https://en.wikipedia.org/wiki/Wheel_factorization

This lets us skip most composite numbers in our trial division. The
implementation presented here is configurable in terms of the size of
the "basis" of primes we use. Bigger bases let us skip more primes, but
at the cost of storing more numbers. Fortunately, it turns out that N=4
was good enough for many purposes.

However, we find we sometimes need to handle prime numbers beyond even
what wheel factorization can handle in constexpr time. As a
workaround, we provide a mechanism, known_first_factor, whereby devs
can provide a direct answer to a math problem the compiler gives up on.
We do have a few prime numbers that require this treatment on a few
configurations.

The downside is that by giving up on the ability to compute primality
for all inputs, we also can't programmatically exclude all ill-formed
Magnitude instances. I think this is fine because forming Magnitudes
manually is not part of our end user interface, and we can still exclude
a very wide variety, weighted heavily towards the most common errors.

That said, it would be great if we could someday get a primality
algorithm that could handle all 64-bit inputs in all compilers we
support!

Certain existing units in the library require very large prime numbers---so large, in fact, that our naive trial division hits the _iteration limit_ for `constexpr` loops. We don't want to force users to provide a compiler option override, so we'd better find another way. The solution is to use the "wheel factorization" algorithm: https://en.wikipedia.org/wiki/Wheel_factorization This lets us skip most composite numbers in our trial division. The implementation presented here is configurable in terms of the size of the "basis" of primes we use. Bigger bases let us skip more primes, but at the cost of storing more numbers. Fortunately, it turns out that N=3 was good enough for our purposes.

gitpod-io · 2022-03-10T23:34:12Z

chiphogg · 2022-03-10T23:38:53Z

src/core/include/units/bits/prime.h

+//   1 |                 1 |                 50.0 %
+//   2 |                 3 |                 33.3 %
+//   3 |                10 |                 26.7 %
+//   4 |                46 |                 20.5 %


Note: this number disagrees with the figure given on wikipedia. I get 43/210 for 20.5%, but they give 48/210 for 22.9%.

I think I am right---and I find that the size of primes_in_first_wheel for N=4 is indeed 42! But I did want to call out that this comment differs from what one finds on wikipedia, in case anyone else sees some mistake I made.

Actually, I think I messed up the algorithm. I'm taking this back to draft for now.

The root of the discrepancy is that it's not "prime numbers in the first wheel"; rather, it's "numbers coprime with the basis in the first wheel". This means for N=4, 121 is inside the first wheel, and my current implementation would neglect it because it's not prime, but this is inappropriate because (n * 210 + 121) is not guaranteed to be prime for all n.

Since we're using N=3, the current version actually "works"! But I'm not going to try to land an incorrect algorithm. 😅

This line shouldn't _actually_ ever be reachable, but I can't fault the compiler for not figuring that out.

JohelEGP · 2022-03-11T00:25:47Z

src/core/include/units/bits/prime.h

+constexpr std::size_t num_primes_between(std::size_t start, std::size_t end) {
+  std::size_t n = 0;
+
+  for (auto k = start; k < end; ++k) {
+    if (is_prime_by_trial_division(k)) { ++n; }
+  }
+
+  return n;
+}
+
+template<std::size_t Start, std::size_t End>
+constexpr auto primes_between() {
+  std::array<std::size_t, num_primes_between(Start, End)> results;
+  auto iter = std::begin(results);
+
+  for (auto k = Start; k < End; ++k) {
+    if (is_prime_by_trial_division(k)) {
+      *iter = k;
+      ++iter;
+    }
+  }
+
+  assert(iter == std::end(results));
+  return results;
+}


This looks like a good use case for https://github.com/lefticus/tools/blob/main/include/lefticus/tools/static_views.hpp#L81, presented at https://www.youtube.com/watch?v=ABg4_EV5L3w.
Rather than doing the work twice (once for the size, once more for the actual population), you can define a primes_between that returns an oversized_array<std::size_t> (until we can use std::vector), and to_span([] { return primes_between(...); }) will take care of shrinking its size to fit efficiently.

Looks appealing! But doesn't that rely on consteval? I tried using consteval in another PR, and I found that we're stuck supporting a compiler which doesn't support it.

Fortunately, I think the impact of doing the work twice is minimal, since the parts we do twice are generally going to be pretty small for practical purposes.

If I am not wrong, this technique is not about consteval. It should work as well with constexpr on older compilers.

JohelEGP · 2022-03-11T00:32:30Z

src/core/include/units/bits/prime.h

+  std::size_t product = 1;
+  for (const auto &v : values) { product *= v; }
+  return product;


Suggested change

std::size_t product = 1;

for (const auto &v : values) { product *= v; }

return product;

return std::accumulate(values.begin(), values.end(), std::size_t{1}, std::multiplies{});

or even better:

return std::reduce(values.begin(), values.end(), std::size_t{1}, std::multiplies{});

JohelEGP · 2022-03-11T00:38:20Z

test/unit_test/static/prime_test.cpp

+#include <type_traits>
+#include <utility>
+
+namespace units::detail


Suggested change

namespace units::detail

using namespace units::detail;

namespace

Learning opportunity for me here: what's the advantage of doing it this way? I've seen both ways (top-level anonymous namespace; or, putting the tests in the same namepsace). Are there advantages/disadvantages for each?

I suggested it because that's just how other tests do it. As it stands, my suggestion might not be enough since the tests could be using names in ::units.

Suggested change

namespace units::detail

using namespace units;

using namespace units::detail;

namespace

Are there advantages/disadvantages for each?

Probably. I don't remember. You certainly have more opportunities to mess up if you reopen a namespace you don't own rather than using using declarations or using directives. See https://quuxplusone.github.io/blog/2021/10/27/dont-reopen-namespace-std/#don-t-do-this-instead-prefer-to, which is related to std and thus probably more dangerous than reopening units::detail.

Putting all the tests in the same named namespace may end up as an ODR violation if someone will copy-paste some test fixtures or just create a type with the same name. Anonymous namespace guarantees internal linkage so no ODR violations between TUs possible.

This is why I use detail subnamespace only in the header files where the usage of an anonymous one is a problem. For implementation details of *.cpp file I always prefer anonymous namespaces.

Makes sense. Wouldn't the best approach then be to enclose an anonymous namespace inside of units::detail? It seems like that would preserve the benefit of internal linkage, while avoiding the need for a separate using directive for every level of the namespace hierarchy.

mpusz · 2022-03-12T15:49:54Z

src/core/include/units/bits/prime.h

+//
+// [1] https://en.wikipedia.org/wiki/Wheel_factorization
+template<std::size_t BasisSize>
+struct WheelFactorizer {


This seems to be the first class with such a naming convention. Please use standard_case.

Sorry: day-job habits rearing their head once again. I'll fix this.

mpusz · 2022-03-12T15:50:33Z

src/core/include/units/magnitude.h

+namespace detail
+{
+// Higher numbers use fewer trial divisions, at the price of more storage space.
+using Factorizer = WheelFactorizer<4>;


standard_case naming for aliases as well.

mpusz · 2022-03-12T15:55:46Z

test/unit_test/static/prime_test.cpp

+#include <type_traits>
+#include <utility>
+
+namespace units::detail


Putting all the tests in the same named namespace may end up as an ODR violation if someone will copy-paste some test fixtures or just create a type with the same name. Anonymous namespace guarantees internal linkage so no ODR violations between TUs possible.

This is why I use detail subnamespace only in the header files where the usage of an anonymous one is a problem. For implementation details of *.cpp file I always prefer anonymous namespaces.

We are well into a regime of diminishing returns, but we'd better start by seeing if the easy thing works. Besides, setting this to 7 trips the step limit in _generating_ the algorithm!

chiphogg

Current status on this PR: it seems likely that even the efficiency gains from moving to wheel factorization may not be enough to satisfy clang-13, which is failing due to its own constexpr step limit. Same deal with MSVC.

In googling around, I got the impression that other people had encountered similar situations (i.e., clang hitting step limits, but gcc not hitting them). Somebody suggested that gcc was better at memoization than clang, and that manual memoization might produce better results.

Is this feasible? I don't yet know. There are 47,461 prime numbers less than sqrt(334524384739). This means we would need to memoize the first 47,462 primes. We could fit them in 32-bit integers and cast when computing, which would mean 189,848 B of storage. Or, we could just store them as 64-bit integers and use 379,696 B (I don't know if casting counts as a "step" against our limit). This amount of memory strikes me as "a lot, but workable". (Assuming I can figure out how to generate and use such a list of numbers.)

We might also try Pollard's rho algorithm. I don't have any experience with this (just as I didn't with wheel factorization before this). From reading the description, it sounds like it is relatively liberal with its steps, which is the opposite of what we want. Also, it seems like it would only work for composite numbers, but 334524384739 is prime, and we need to support it.

Basically, I think we're blocked on finding a way to compute prime factorizations, at compile time, which can handle "large" numbers.

If there's some way to tell individual compilers, in code, "ignore your step limit and keep going!", then maybe that could work. Alternatively, maybe there's something clever we can do with memoization; that's what I'll try next. I'm also open to other suggestions!

chiphogg · 2022-03-12T18:05:18Z

src/core/include/units/bits/prime.h

+//
+// [1] https://en.wikipedia.org/wiki/Wheel_factorization
+template<std::size_t BasisSize>
+struct WheelFactorizer {


Sorry: day-job habits rearing their head once again. I'll fix this.

chiphogg · 2022-03-12T18:07:29Z

test/unit_test/static/prime_test.cpp

+#include <type_traits>
+#include <utility>
+
+namespace units::detail


Makes sense. Wouldn't the best approach then be to enclose an anonymous namespace inside of units::detail? It seems like that would preserve the benefit of internal linkage, while avoiding the need for a separate using directive for every level of the namespace hierarchy.

JohelEGP · 2022-03-12T21:41:04Z

Is this feasible? I don't yet know. There are 47,461 prime numbers less than sqrt(334524384739). This means we would need to memoize the first 47,462 primes. We could fit them in 32-bit integers and cast when computing, which would mean 189,848 B of storage. Or, we could just store them as 64-bit integers and use 379,696 B (I don't know if casting counts as a "step" against our limit). This amount of memory strikes me as "a lot, but workable". (Assuming I can figure out how to generate and use such a list of numbers.)

Rather than an algorithm that computes all the way to the result, you could use a coroutine-like struct that saves the current progress and have the algorithm operate on that. Then you could choose what to memoize.

chiphogg · 2022-03-12T23:28:37Z

Ah---I just had an idea! We could cut this gordian knot by adding a mechanism to directly specify the first factor for a given number.

Find a number too big to handle; compiler complains
Compute the first factor offline, by some other means
Define an inline variable, say, first_factor<334524384739> = 334524384739
Explicitly defined first-factor variables take precedence, so we skip the offending computation

If a user supplies an incorrect fact, the program behaviour is undefined.

This will unblock progress for the Magnitude migration without getting bogged down in compile time prime factorizations. The need for this should be extremely rare, so I don't think this will be a big deal.

@JohelEGP, @mpusz: how does this solution sound?

mpusz · 2022-03-14T15:33:55Z

I am not sure if I can fully understand the issue here, but the suggested workaround can be a good start to continue the work. If needed we can come back to this later...

This reverts commit 8707385. It didn't fix the problem, and it caused some new ones. We need a different approach.

We introduce the `known_first_factor` variable template.

I verified that we hit GCC 10's constexpr limit with `wheel_factorizer<1>`, but pass with `wheel_factorizer<4>`. I hope this number is enough smaller than the square root of the previous value that the other compilers will be able to handle it. If not: we'll go lower.

Apparently, the constexpr depth which clang and MSVC can handle is too shallow for me to write a unit test that works on all supported compilers.

Perhaps this will also satisfy Apple's Clang 13? Since `reduce` is newer, it may be more likely to be `constexpr` compatible.

If _this_ isn't `constexpr` compatible, I'm going to propose removing support for the MacOS clang build.

chiphogg · 2022-03-19T18:11:26Z

Status, for clarity:

We now have the first passing build. That's an encouraging milestone!

I will need to update my local "convert to Magnitude" branch to adapt to the extensive upstream changes from clang-format. That will take some time, but I've done this kind of thing before. Once I've verified that this solution is actually part of a passing migration build, then I'll take it out of "draft" state and make it ready for review. (Of course, if you folks want to leave review comments in the meantime, I'll take them into account!)

🚀

mpusz · 2022-03-19T18:46:08Z

src/core/include/units/bits/prime.h

+      for (const auto& p : coprimes_in_first_wheel) {
+        if (const auto k = first_factor_maybe(n, wheel + p)) {
+          return *k;
+        }
+      }


mpusz · 2022-03-19T19:09:13Z

src/core/include/units/bits/prime.h

+    for (const auto& p : basis) {
+      if (const auto k = first_factor_maybe(n, p)) {
+        return *k;
+      }
+    }


Please use detail::find_if() and provide a TODO comment. See more in: e1f7266.

If I do that, how will I know which value to return without calling first_factor_maybe() again? Granted, that's not very expensive, but it seems a little inelegant.

In more detail: I guess this would look like:

const auto basis_iter = find_if(basis.begin(), basis.end(), [n](auto iter) { return first_factor_maybe(n, *iter); }); if (basis_iter) { return *first_factor_maybe(n, *basis_iter); }

Are we sure that's better?

In the existing code, I can see the nuisance of repeating the if block four times. The best solution that occurs to me would be to separate the value generation from the evaluation---maybe have some kind of iterator interface which returns the basis primes in order, then first-wheel coprimes (except 1), then coprimes in order for each wheel indefinitely. I assume that's doable with something like ranges or coroutines, although I'm not very well versed in either. I think it would be fine to leave as a future refactoring.

That said---I did realize in grappling with this comment that one of the four repetitions was redundant! It stems from my earlier mistake of using primes; 1 is not a prime, but it is coprime, so we were checking it twice per wheel. That's fixed now.

Let me know your thoughts on how to proceed here: status quo, or a find_if solution.

Also, to be clear about why "which value to return" is even an issue: first_factor_maybe can return either k itself, or n (if k * k exceeds n).

Right, initially I thought it will be easier to simplify... Sorry, for causing confusion.

However, it looks like the same pattern in all cases so maybe a custom algorithm can be provided to cover those?

I want that too, but I think first_factor_maybe() is that algorithm---at least, as much of it as I was able to pull out. I think the next level of refactoring would be to separate out the trial number generation, probably placing it behind some kind of iterator interface. Then we could just keep taking values until we find one that works, and we'd have only a single call to first_factor_maybe() (or more likely, we'd just get rid of that function at that point).

If you have a ready made solution for this, I'd love to add it! Otherwise, I think the code is reasonably clean and efficient for a first pass.

mpusz · 2022-03-19T19:09:25Z

src/core/include/units/bits/prime.h

+    for (auto it = std::next(std::begin(coprimes_in_first_wheel)); it != std::end(coprimes_in_first_wheel); ++it) {
+      if (const auto k = first_factor_maybe(n, *it)) {
+        return *k;
+      }
+    }


This stems from an earlier mistake where I was using primes in the first wheel, rather than coprimes-to-the-basis. 1 is not prime, so we used to need to handle it separately (in an implementation which was, to be clear, wrong). It _is_ coprime, so now we get it for free!

chiphogg · 2022-03-19T22:13:35Z

I got the builds to pass in the Great ratio -> Magnitude Migration! So this should be ready for review.

JohelEGP · 2022-04-19T13:45:05Z

Ah---I just had an idea! We could cut this gordian knot by adding a mechanism to directly specify the first factor for a given number.
* Find a number too big to handle; compiler complains

* Compute the first factor offline, by some other means

* Define an inline variable, say, `first_factor<334524384739> = 334524384739`

* Explicitly defined first-factor variables take precedence, so we skip the offending computation
If a user supplies an incorrect fact, the program behaviour is undefined.

This will unblock progress for the Magnitude migration without getting bogged down in compile time prime factorizations. The need for this should be extremely rare, so I don't think this will be a big deal.

@JohelEGP, @mpusz: how does this solution sound?

Perhaps you could use something like what's exposed at https://www.youtube.com/watch?v=wFRlUNSMK8s at strategic points. However, they don't show how it affects RAM consumption during compilation, so I don't know if it's feasible in this case.

chiphogg · 2022-05-01T17:16:13Z

Perhaps you could use something like what's exposed at https://www.youtube.com/watch?v=wFRlUNSMK8s at strategic points. However, they don't show how it affects RAM consumption during compilation, so I don't know if it's feasible in this case.

Explicit memoization is an appealing idea, although I'm not sure how I would structure the computation in a way that breaks it up and avoids the step limit. One way or another, we have to test every prime number smaller than sqrt(N).

We might be able to use "blocks" of divisors, with the block result memoized: say, DivisibleByAnyElementOfBlock<N, BlockNum>. We could pick a block size that's comfortably within the step limit. However, it's not clear to me that this would save us anything, since the total number of computations for each N would be the same for blocked and un-blocked strategies. And, the first time we try computing for a given value of N, is the first time we would compute any blocks for N: thus, there won't be any previously-memoized result we can look up for a given block.

I don't plan to attempt tackling this problem in the foreseeable future, but I hope some strategy can be made to work. It definitely feels like we should be able to compute prime factors at compile time, even for big numbers!

chiphogg commented Mar 10, 2022

View reviewed changes

chiphogg added 2 commits March 10, 2022 23:41

Add tests to support claims in comment

24b284f

Add missing header for std::integral

c8a44ad

chiphogg marked this pull request as draft March 10, 2022 23:48

Satisfy complaint

6c73947

This line shouldn't _actually_ ever be reachable, but I can't fault the compiler for not figuring that out.

JohelEGP requested changes Mar 11, 2022

View reviewed changes

chiphogg added 5 commits March 11, 2022 03:42

Fix wheel factorization algorithm

73a5611

Use std::accumulate

bfa8db6

static_cast for first factor

59d9cd1

Remove constexpr-incompatible assert()

70640a1

Try upping the basis size

a719a8b

mpusz requested changes Mar 12, 2022

View reviewed changes

Try upping the basis size further

8707385

We are well into a regime of diminishing returns, but we'd better start by seeing if the easy thing works. Besides, setting this to 7 trips the step limit in _generating_ the algorithm!

chiphogg commented Mar 12, 2022

View reviewed changes

chiphogg added 10 commits March 19, 2022 13:21

Revert "Try upping the basis size further"

1e8460d

This reverts commit 8707385. It didn't fix the problem, and it caused some new ones. We need a different approach.

Run clang-format-15 on changed files

28c4fe3

Switch tests to use top-level, anonymous namespace

a99e5f9

Work around numbers with very large first factors

166dd1e

We introduce the `known_first_factor` variable template.

Merge branch 'master' into chiphogg/prime-wheel

48b6280

Convert names to standard_case

c339383

Remove offending unit test

438feb3

Apparently, the constexpr depth which clang and MSVC can handle is too shallow for me to write a unit test that works on all supported compilers.

Replace accumulate with reduce

f495ad9

Perhaps this will also satisfy Apple's Clang 13? Since `reduce` is newer, it may be more likely to be `constexpr` compatible.

Replace reduce with bespoke implementation

6872117

If _this_ isn't `constexpr` compatible, I'm going to propose removing support for the MacOS clang build.

mpusz requested changes Mar 19, 2022

View reviewed changes

chiphogg marked this pull request as ready for review March 19, 2022 22:12

mpusz approved these changes Mar 21, 2022

View reviewed changes

mpusz merged commit 3729a9f into mpusz:master Mar 21, 2022

chiphogg deleted the chiphogg/prime-wheel branch May 1, 2022 17:09

JohelEGP mentioned this pull request Oct 12, 2023

Constexpr unit symbol #501

Closed

Use wheel factorization for prime numbers #337

Use wheel factorization for prime numbers #337

Conversation

chiphogg commented Mar 10, 2022 • edited Loading

gitpod-io bot commented Mar 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohelEGP Mar 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chiphogg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohelEGP commented Mar 12, 2022

chiphogg commented Mar 12, 2022

mpusz commented Mar 14, 2022

chiphogg commented Mar 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chiphogg commented Mar 19, 2022

JohelEGP commented Apr 19, 2022

chiphogg commented May 1, 2022

chiphogg commented Mar 10, 2022 •

edited

Loading

JohelEGP Mar 11, 2022 •

edited

Loading