GitHub

Integer representation of floating point arithmetic suitable for FPGA designs

Author: Yuri Gershtein Date: March 2018

Functionality:

note all integers are assumed to be signed

all variables have units, stored in a map <string,int>, with string a unit (i.e. "phi") and int the power. "2" is always present in the map, and it's int pair is referred to as 'shift'. units are properly combined / propagated through calculations. adding/subtracting variables with different units throws an exception. adding/subtracting variables with different shifts is allowed and is handled correctly.

Verilog_print() method takes a vector of the outputs and produces a proper Verilog module

calculate() method re-calculates the variable double and int values based on its operands returns false in case of overflows and/or mismatches between double and int calculations.

the maximum and minimum values that the variable assumes are stored and updated each time calculate() is called if IMATH_ROOT is defined, all values are also stored in a histogram

var_def (string name, string units, double fmax, double K):

               define variable with bit value fval = K*ival, and maximum absolute value fmax.
               calculates nbins on its own.
               one can assign value to it using set_ methods.

var_param (string name, double fval, int nbits):

               define a parameter. K is calculated based on the fval and nbits.
	   one can assign value to it using set_ methods.

          (string name, string units, double fmax, double K):

               define parameter with bit value fval = K*ival.
               calculates nbins on its own.
               one can assign value to it using set_ methods.

var_add (string name, var_base *p1, var_base *p2, double range = -1, int nmax = 18):

var_subtract (string name, var_base *p1, var_base *p2, double range = -1, int nmax = 18):

               add/subtract variables. Bit length increases by 1, but capped at nmax.
               if range>0 specified, bit length is decreased to drop unnecessary high bits.

var_mult (string name, var_base *p1, var_base *p2, double range = -1, int nmax = 18):

               multiplication. Bit length is a sum of the lengths of the operads, but capped at nmax.
               if range>0 specified, bit length is decreased to drop unnecessary high bits or post-shift is reduced.

var_timesC (string name, var_base *p1, double cF, int ps = 17):

               multiplication by a constant. Bit length stays the same
               ps defines number of bits used to represent the constant

var_DSP_postadd (string name, var_base *p1, var_base *p2, var_base *p3, double range = -1, int nmax = 18):

               explicit instantiation of the 3-clock DSP postaddition: p1*p2+p3
               range and nmax have the same meaning as for the var_mult.

var_shift (string name, var_base *p1, int shift):

               shifts the variable right by shift (equivalent to multiplication by pow(2, -shift));
               Units stay the same, nbits are adjusted.

var_neg (string name, var_base *p1):

               multiplies the variable by -1

var_inv (string name, var_base *p1, double offset, int nbits, int n, unsigned int shift, mode m, int nbaddr=-1):

               LUT-based inversion, f = 1./(offset + f1) and  i = 2^n / (offsetI + i1)
               nbits is the width of the LUT (signed)
               m is from enum mode {pos, neg, both} and refers to possible sign values of f
                        for pos and neg, the most significant bit of p1 (i.e. the sign bit) is ignored
               shift is a shift applied in i1<->address conversions (used to reduce size of LUT)
               nbaddr: if not specified, it is taken to be equal to p1->get_nbits()

var_nounits (string name, var_base *p1, int ps = 17):

               convert a number with units to a number - needed for trig function expansion (i.e. 1 - 0.5*phi^2)
               ps is a number of bits to represent the unit conversion constant

var_adjustK (string name, var_base *p1, double Knew, double epsilon = 1e-5, bool do_assert = false, int nbits = -1):

               adjust variable shift so the K is as close to Knew as possible (needed for bit length adjustments) 
               if do_assert is true, throw an exeption if Knew/Kold is not a power of two
               epsilon is a comparison precision, nbits forces the bit length (possibly discarding MSBs)

bool calculate (int debug_level):

                 runs through the entire formula tree recalculating both ineteger and floating point values

                 returns true if everything is OK, false if obvious problems with the calculation exist, i.e
                              -  integer value does not fit into the alotted number of bins
                              -  integer value is more then 10% or more then 2 away from fval_/K_ 
                 debug_level:  0 - no warnings
                               1 - limited warning
                               2 - as 1, but also include explicit warnings when LUT was used out of its range
                               3 - maximum complaints level

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Makefile		Makefile
Makefile.inc		Makefile.inc
README.md		README.md
imath.cc		imath.cc
imath.h		imath.h
test.cc		test.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

gershtein/imath

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages