Skip to content

gershtein/imath

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Integer representation of floating point arithmetic suitable for FPGA designs

Author: Yuri Gershtein Date: March 2018

Functionality:

note all integers are assumed to be signed

all variables have units, stored in a map <string,int>, with string a unit (i.e. "phi") and int the power. "2" is always present in the map, and it's int pair is referred to as 'shift'. units are properly combined / propagated through calculations. adding/subtracting variables with different units throws an exception. adding/subtracting variables with different shifts is allowed and is handled correctly.

Verilog_print() method takes a vector of the outputs and produces a proper Verilog module

calculate() method re-calculates the variable double and int values based on its operands returns false in case of overflows and/or mismatches between double and int calculations.

the maximum and minimum values that the variable assumes are stored and updated each time calculate() is called if IMATH_ROOT is defined, all values are also stored in a histogram

var_def (string name, string units, double fmax, double K):

               define variable with bit value fval = K*ival, and maximum absolute value fmax.
               calculates nbins on its own.
               one can assign value to it using set_ methods.

var_param (string name, double fval, int nbits):

               define a parameter. K is calculated based on the fval and nbits.
	   one can assign value to it using set_ methods.

          (string name, string units, double fmax, double K):

               define parameter with bit value fval = K*ival.
               calculates nbins on its own.
               one can assign value to it using set_ methods.

var_add (string name, var_base *p1, var_base *p2, double range = -1, int nmax = 18):

var_subtract (string name, var_base *p1, var_base *p2, double range = -1, int nmax = 18):

               add/subtract variables. Bit length increases by 1, but capped at nmax.
               if range>0 specified, bit length is decreased to drop unnecessary high bits.

var_mult (string name, var_base *p1, var_base *p2, double range = -1, int nmax = 18):

               multiplication. Bit length is a sum of the lengths of the operads, but capped at nmax.
               if range>0 specified, bit length is decreased to drop unnecessary high bits or post-shift is reduced.

var_timesC (string name, var_base *p1, double cF, int ps = 17):

               multiplication by a constant. Bit length stays the same
               ps defines number of bits used to represent the constant

var_DSP_postadd (string name, var_base *p1, var_base *p2, var_base *p3, double range = -1, int nmax = 18):

               explicit instantiation of the 3-clock DSP postaddition: p1*p2+p3
               range and nmax have the same meaning as for the var_mult.

var_shift (string name, var_base *p1, int shift):

               shifts the variable right by shift (equivalent to multiplication by pow(2, -shift));
               Units stay the same, nbits are adjusted.

var_neg (string name, var_base *p1):

               multiplies the variable by -1

var_inv (string name, var_base *p1, double offset, int nbits, int n, unsigned int shift, mode m, int nbaddr=-1):

               LUT-based inversion, f = 1./(offset + f1) and  i = 2^n / (offsetI + i1)
               nbits is the width of the LUT (signed)
               m is from enum mode {pos, neg, both} and refers to possible sign values of f
                        for pos and neg, the most significant bit of p1 (i.e. the sign bit) is ignored
               shift is a shift applied in i1<->address conversions (used to reduce size of LUT)
               nbaddr: if not specified, it is taken to be equal to p1->get_nbits()

var_nounits (string name, var_base *p1, int ps = 17):

               convert a number with units to a number - needed for trig function expansion (i.e. 1 - 0.5*phi^2)
               ps is a number of bits to represent the unit conversion constant

var_adjustK (string name, var_base *p1, double Knew, double epsilon = 1e-5, bool do_assert = false, int nbits = -1):

               adjust variable shift so the K is as close to Knew as possible (needed for bit length adjustments) 
               if do_assert is true, throw an exeption if Knew/Kold is not a power of two
               epsilon is a comparison precision, nbits forces the bit length (possibly discarding MSBs)

bool calculate (int debug_level):

                 runs through the entire formula tree recalculating both ineteger and floating point values

                 returns true if everything is OK, false if obvious problems with the calculation exist, i.e
                              -  integer value does not fit into the alotted number of bins
                              -  integer value is more then 10% or more then 2 away from fval_/K_ 
                 debug_level:  0 - no warnings
                               1 - limited warning
                               2 - as 1, but also include explicit warnings when LUT was used out of its range
                               3 - maximum complaints level

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published