Aerobus
is a C++-20 pure header library for general algebra on polynomials, discrete rings and associated structures.
Everything in Aerobus
is expressed as types.
We say that again as it is the most fundamental characteristic of Aerobus
:
Everything is expressed as types
The library serves two main purposes :
- Express algebra structures and associated operations in type arithmetic, compile-time;
- Provide portable and fast evaluation functions for polynomials.
It is designed to be 'quite easily' extensible.
Given these functions are "generated" at compile time and do not rely on inline assembly, they are actually platform independent, yielding exact same results if processors have same capabilities (such as Fused-Multiply-Add instructions).
- Clone or download the repository somewhere, or just download
aerobus.h
- In your code, add :
#include "aerobus.h"
- Compile with -std=c++20 (at least) -I<install_location>
Aerobus
provides a definition for low-degree (up to 997) Conway polynomials. To use them, define AEROBUS_CONWAY_IMPORTS
before including aerobus.h
.
Install Cmake
Install a recent compiler (supporting c++20), such as MSVC, G++ or Clang++
Move to the top directory then :
cmake -S . -B build
cmake --build build
cd build && ctest
Terminal should write :
100% tests passed, 0 tests failed out of 48
Alternate way :
make tests
From top directory.
Benchmarks are written for Intel CPUs having AVX512f and AVX512vl flags, they work only on Linux operating system using g++.
In addition of Cmake
and compiler, install OpenMP
.
And Google's Benchmark library.
Then move to top directory :
rm -rf build
mkdir build
cd build
cmake ..
make benchmarks
./benchmarks
Aerobus
predefines several simple euclidean domains, such as :
aerobus::i32
: integers (32 bits)aerobus::i64
: integers (64 bits)aerobus::zpz<p>
: integers modulo p (prime number) on 32 bits
All these types represent the Ring, meaning the algebraic structure. They have a nested type val<i>
where i
is a scalar native value (int32_t or int64_t) to represent actual values in the ring.
They have the following "operations", required by the IsEuclideanDomain concept :
add_t
: a type (specialization of val), representing addition between two valuessub_t
: a type (specialization of val), representing subtraction between two valuesmul_t
: a type (specialization of val), representing multiplication between two valuesdiv_t
: a type (specialization of val), representing division between two valuesmod_t
: a type (specialization of val), representing modulus between two values
and the following "elements" :
- one : the neutral element for multiplication, val<1>
- zero : the neutral element for addition, val<0>
Aerobus
defines polynomials as a variadic template structure, with coefficient in an arbitrary discrete euclidean domain. As i32
or i64
, they are given same operations and elements, which make them a euclidean domain by themselves. Similarly, aerobus::polynomial
represents the algebraic structure, actual values are in aerobus::polynomial::val
.
In addition, values have an evaluation function :
template<typename valueRing> static constexpr valueRing eval(const valueRing& x) {...}
Which can be used at compile time (constexpr evaluation) or runtime.
Aerobus
predefines some well known families of polynomials, such as Hermite or Bernstein :
using B23 = aerobus::known_polynomials::bernstein<2, 3>; // 3X^2(1-X)
constexpr float x = B32::eval(2.0F); // -12
They have their coefficients either in aerobus::i64
or aerobus::q64
.
Complete list is (but is meant to be extended):
chebyshev_T
chebyshev_U
laguerre
hermite_prob
hermite_phys
bernstein
legendre
bernoulli
When the tag AEROBUS_CONWAY_IMPORTS
is defined at compile time (-DAEROBUS_CONWAY_IMPORTS
), aerobus
provides definition for all Conway polynomials CP(p, n)
for p
up to 997 and low values for n
(usually less than 10).
They can be used to construct finite fields of order
using F2 = zpz<2>;
using PF2 = polynomial<F2>;
using F4 = Quotient<PF2, ConwayPolynomial<2, 2>::type>;
Aerobus
provides definition for Taylor expansion of known functions. They are all templates in two parameters, degree of expansion (size_t
) and Integers (typename
). Coefficients then live in FractionField<Integers>
.
They can be used and evaluated:
using namespace aerobus;
using aero_atanh = atanh<i64, 6>;
constexpr float val = aero_atanh::eval(0.1F); // approximation of arctanh(0.1) using taylor expansion of degree 6
Exposed functions are:
exp
-
expm1
$e^x - 1$ -
lnp1
$\ln(x+1)$ -
geom
$\frac{1}{1-x}$ sin
cos
tan
sh
cosh
tanh
asin
acos
acosh
asinh
atanh
Having the capacity of specifying the degree is very important, as users may use other formats than float64
or float32
which require higher or lower degree to achieve correct or acceptable precision.
It's possible to define Taylor expansion by implementing a coeff_at
structure which must meet the following requirement :
- Being template in Integers (
typename
) and index (size_t
); - Exposing a type alias
type
, some specialization ofFractionField<Integers>::val
.
For example, to define the serie
template<typename Integers, size_t i>
struct my_coeff_at {
using type = typename FractionField<Integers>::one;
};
template<typename Integers, size_t degree>
using my_serie = taylor<Integers, my_coeff_at, degree>;
static constexpr double x = my_serie<i64, 3>::eval(3.0);
On x86-64 and CUDA platforms at least, using proper compiler directives, these functions yield very performant assembly, similar or better than standard library implementation in fast math. For example, this code:
double compute_expm1(const size_t N, double* in, double* out) {
using V = aerobus::expm1<aerobus::i64, 13>;
for (size_t i = 0; i < N; ++i) {
out[i] = V::eval(in[i]);
}
}
Yields this assembly (clang 17, -mavx2 -O3
) where we can see a pile of Fused-Multiply-Add vector instructions, generated because we unrolled completely the Horner evaluation loop:
compute_expm1(unsigned long, double const*, double*):
lea rax, [rdi-1]
cmp rax, 2
jbe .L5
mov rcx, rdi
xor eax, eax
vxorpd xmm1, xmm1, xmm1
vbroadcastsd ymm14, QWORD PTR .LC1[rip]
vbroadcastsd ymm13, QWORD PTR .LC3[rip]
shr rcx, 2
vbroadcastsd ymm12, QWORD PTR .LC5[rip]
vbroadcastsd ymm11, QWORD PTR .LC7[rip]
sal rcx, 5
vbroadcastsd ymm10, QWORD PTR .LC9[rip]
vbroadcastsd ymm9, QWORD PTR .LC11[rip]
vbroadcastsd ymm8, QWORD PTR .LC13[rip]
vbroadcastsd ymm7, QWORD PTR .LC15[rip]
vbroadcastsd ymm6, QWORD PTR .LC17[rip]
vbroadcastsd ymm5, QWORD PTR .LC19[rip]
vbroadcastsd ymm4, QWORD PTR .LC21[rip]
vbroadcastsd ymm3, QWORD PTR .LC23[rip]
vbroadcastsd ymm2, QWORD PTR .LC25[rip]
.L3:
vmovupd ymm15, YMMWORD PTR [rsi+rax]
vmovapd ymm0, ymm15
vfmadd132pd ymm0, ymm14, ymm1
vfmadd132pd ymm0, ymm13, ymm15
vfmadd132pd ymm0, ymm12, ymm15
vfmadd132pd ymm0, ymm11, ymm15
vfmadd132pd ymm0, ymm10, ymm15
vfmadd132pd ymm0, ymm9, ymm15
vfmadd132pd ymm0, ymm8, ymm15
vfmadd132pd ymm0, ymm7, ymm15
vfmadd132pd ymm0, ymm6, ymm15
vfmadd132pd ymm0, ymm5, ymm15
vfmadd132pd ymm0, ymm4, ymm15
vfmadd132pd ymm0, ymm3, ymm15
vfmadd132pd ymm0, ymm2, ymm15
vfmadd132pd ymm0, ymm1, ymm15
vmovupd YMMWORD PTR [rdx+rax], ymm0
add rax, 32
cmp rcx, rax
jne .L3
mov rax, rdi
and rax, -4
vzeroupper
Given a set (type) satisfies the IsEuclideanDomain
concept, Aerobus
allows to define its field of fractions.
This new type is again a euclidean domain, especially a field, and therefore we can define polynomials over it.
For example, integers modulo p
is not a field when p
is not prime. We then can define its field of fraction and polynomials over it this way:
using namespace aerobus;
using ZmZ = zpz<8>;
using Fzmz = FractionField<ZmZ>;
using Pfzmz = polynomial<Fzmz>;
The same operation would stand for any set that users would have implemented in place of ZmZ
.
For example, we can easily define rational functions by taking the ring of fractions of polynomials:
using namespace aerobus;
using RF64 = FractionField<polynomial<q64>>;
Which also have an evaluation function, as polynomial do.
Given a ring R
, Aerobus
provides automatic implementation for quotient ring R
is commutative (and we assume it is).
For example, if we want R
to be aerobus::i64
, we can express arithmetic modulo 17 using:
using namespace aerobus;
using ZpZ = Quotient<i64, i64::val<17>>;
As we could have using zpz<17>
.
This is mainly used to define finite fields of order
Aerobus
gives an implementation for continued fractions. It can be used this way:
using namespace aerobus;
using T = ContinuedFraction<1,2,3,4>;
constexpr double x = T::val;
As practical examples, aerobus
gives continued fractions of
constexpr double A_SQRT3 = aerobus::SQRT3_fraction::val; // 1.7320508075688772935
When compiled with nvcc
and the flag WITH_CUDA_FP16
, Aerobus
provides some support of 16 bits integers and floats (aka __half
).
Unfortunately, NVIDIA did not put enough constexpr in its cuda_fp16.h
header, so we had to implement our own constexpr static_cast from int16_t to __half
to make integers polynomials work with __half
. See this bug.
More, it's (at this time), not easily possible to make it work for __half2
because of another bug.
A workaround is to modify cuda_fp16.h
and add a constexpr modifier to line 5039. This works but only tested on Linux with CUDA 16.1.
Once done, nvcc generates splendid assembly, same as for double
or float
:
HFMA2.MMA R5, R6, RZ, 0.0013885498046875, 0.0013885498046875 ;
HFMA2 R5, R6, R5, 0.008331298828125, 0.008331298828125 ;
HFMA2.MMA R5, R6, R5, 0.041656494140625, 0.041656494140625 ;
HFMA2 R5, R6, R5, 0.1666259765625, 0.1666259765625 ;
HFMA2.MMA R5, R6, R5, 0.5, 0.5 ;
HFMA2 R5, R6, R5, 1, 1 ;
HFMA2.MMA R5, R6, R5, RZ ;
HFMA2 R7, R5, RZ.H0_H0, 0.0013885498046875, 0.0013885498046875 ;
HFMA2.MMA R7, R5, R7, 0.008331298828125, 0.008331298828125 ;
HFMA2 R7, R5, R7, 0.041656494140625, 0.041656494140625 ;
HFMA2.MMA R7, R5, R7, 0.1666259765625, 0.1666259765625 ;
HFMA2 R7, R5, R7, 0.5, 0.5 ;
HFMA2.MMA R7, R5, R7, 1, 1 ;
HFMA2 R7, R5, R7, RZ.H0_H0 ;
Please push to make these bug fixed by NVIDIA.