Math library #25

suarezvictor · 2021-10-01T17:29:18Z

suarezvictor
Oct 1, 2021

I achieved calculation of a float function "1/sqrt(x)" with C and PipelineC and results match. So we are now ready to implement more complex functions.

$ ./vtop 
Verilator float 0.499154, uint32 0x3EFF9110, C float 0.499154, uint32 0x3EFF910F: PASS

As seen the result of calculations are within rounding errors (only the least significant bits of respective mantissas are different)
The commands to achieve this are as follows:

$ ./src/pipelinec ./examples/llvm/rsqrtf.c --sim_comb --edaplay
$ cd [..]/all_vhdl_files
$ ghdl -i --std=08 `cat ../vhdl_files.txt`
$ ghdl -m --std=08 top
$ yosys -m ghdl
> ghdl --std=08 top
> proc; opt; fsm; opt; memory; opt; write_verilog top.v
> exit

Note the need of std08 VHDL.

Source of function implementation is as follows (note that it can be compiled with a regular C/C++ compiler):

#include "uintN_t.h"
#define FLOAT float
#define llvm_dis_float_rsqrt_K0 0.5
#define llvm_dis_float_rsqrt_K1 1.5
#define llvm_dis_float_rsqrt_K2 1597463007
#define LOAD(x) x
#define BITCAST_I32(x) float_31_0(x)
#define BITCAST_FLOAT(x) float_uint32(x)

#define MUL(a, b) ((a)*(b)) //may use instead a primitive optimized for using FPGA multipliers

float llvm_dis_Z11float_rsqrtf( FLOAT a0)
{
  FLOAT a2;
  FLOAT a3;
  FLOAT a4;
  uint32_t a5;
  uint32_t a6;
  uint32_t a7;
  uint32_t a8;
  FLOAT a9;
  FLOAT a10;
  FLOAT a11;
  FLOAT a12;
  FLOAT a13;
  a2 = LOAD(llvm_dis_float_rsqrt_K0); // %2 = load float, float* @float_rsqrt_K0, align 4, !tbaa !3
  a3 = MUL(a2, a0); //a2 * a0; // %3 = fmul float %2, %0
  a4 = LOAD(llvm_dis_float_rsqrt_K1); // %4 = load float, float* @float_rsqrt_K1, align 4, !tbaa !3
  a5 = BITCAST_I32(a0); // %5 = bitcast float %0 to i32
  a6 = LOAD(llvm_dis_float_rsqrt_K2); // %6 = load i32, i32* @float_rsqrt_K2, align 4, !tbaa !7
  a7 = a5 >> 1; // %7 = lshr i32 %5, 1
  a8 = a6 - a7; // %8 = sub i32 %6, %7
  a9 = BITCAST_FLOAT(a8); // %9 = bitcast i32 %8 to float
  a10 = MUL(a3, a9); //a3 * a9; // %10 = fmul float %3, %9
  a11 = MUL(a10, a9); //a10 * a9; // %11 = fmul float %10, %9
  a12 = a4 - a11; // %12 = fsub float %4, %11
  a13 = MUL(a12, a9); //a12 * a9; // %13 = fmul float %12, %9
  return a13; // ret float %13
}

#pragma MAIN app
float app()
{
  float x = 4.0;
  return llvm_dis_Z11float_rsqrtf(x);
}

The optimized version of the * operator is not used, it relies on the PipelineC default * operator. The function as shown, is a translated version of a normal C function using LLVM and a C code generator.

Program to test the simulated calculation with respect to the results of compilation is as follows:

#if 0
#include "pipelinec_verilator.h"
#include "verilated.h"
#include "Vtop.h" 
#else

#include "verilated.cpp" //verilator library
#include "obj_dir/Vtop.cpp" //generated by verilator
#include "obj_dir/Vtop__Syms.cpp" //generated by verilator

#endif

inline uint32_t float_31_0(float a) { union _noname { float f; uint32_t i;} conv; conv.f = a; return conv.i; }
inline float float_uint32(uint32_t a) { union _noname { float f; uint32_t i;} conv; conv.i = a; return conv.f; }
#include "../../examples/llvm/rsqrtf.c"

#define clk clk_None

#include <iostream>
using namespace std;

int main(int argc, char *argv[]) {
 Vtop* g_top = new Vtop;
    
  g_top->clk = 0;
  g_top->eval();

  g_top->clk = 1;
  g_top->eval();

  float verilator_return = float_uint32(g_top->app_return_output);
  float c_return = app();
  bool pass = abs(verilator_return - c_return) < 1e-6;

  printf("Verilator float %f, uint32 0x%08X, C float %f, uint32 0x%08X: %s\n",
    verilator_return, float_31_0(verilator_return),
    c_return, float_31_0(c_return),
    pass ? "PASS":"FAIL");

  return !pass;
}

For compilation and execution, this commands are needed:

$ verilator -Wall -cc top.v
$ touch uintN_t.h
$ clang++ -O3 -I. -I/usr/share/verilator/include main.cpp -o vtop

Lots of improvements can be done, first one I think would be to connect the argument to the function to the simulator, instead of the constant value so better tests can be run.

JulianKemmerer · 2021-10-03T04:40:56Z

JulianKemmerer
Oct 3, 2021
Maintainer

I am working on a easy to automate/replicate version of this - I may have hit a Verilator bug we'll see

0 replies

suarezvictor · 2021-10-03T12:50:53Z

suarezvictor
Oct 3, 2021
Author

Did my example work for you? I had no issues

2 replies

JulianKemmerer Oct 3, 2021
Maintainer

In a adapted version of your example I was getting timing loops in Verilator
I just pulled the nightly oss cad build to try
And now getting some kind of weird assertion failure from ghdl (maybe catching the Verilator issue earlier in ghdl)...
Will share soon I guess, was hoping to get it working

JulianKemmerer Oct 3, 2021
Maintainer

See below @suarezvictor 😒

JulianKemmerer · 2021-10-03T16:18:24Z

JulianKemmerer
Oct 3, 2021
Maintainer

I just committed some example test structure for 'math_pkg'

If you run
./src/pipelinec ./examples/verilator/math_pkg/rsqrtf/rsqrtf.c --sim_comb --verilator --main_cpp ./examples/verilator/math_pkg/rsqrtf/test.cpp

You should get a verilator compile failure like so

%Warning-MULTITOP: /home/julian/pipelinec_output/verilator/../top/top.v:1518:8: Multiple top level modules
                                                                              : ... Suggest see manual; fix the duplicates, or use --top-module to select top.
                   ... For warning description see https://verilator.org/warn/MULTITOP?v=4.213
                   ... Use "/* verilator lint_off MULTITOP */" and lint_on around source to disable this message.
                                                                              : ... Top module 'bin_op_eq_uint1_t_uint5_t_uint1_t_0clk_de264c78'
  190 | module bin_op_eq_uint1_t_uint5_t_uint1_t_0clk_de264c78(left, right, return_output);
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                              : ... Top module 'top'
 1518 | module top(clk_None, x_DEBUG_INPUT_MAIN_val, result_DEBUG_OUTPUT_MAIN_return_output);
      |        ^~~
%Warning-UNOPTFLAT: /home/julian/pipelinec_output/verilator/../top/top.v:404:15: Signal unoptimizable: Feedback to clock or circular logic: 'top.test_bench_0clk_4ca9d0ec.llvm_dis_z11float_rsqrtf_rsqrtf_c_l97_c10_7ff1.bin_op_minus_rsqrtf_c_l47_c9_07b8.bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_70bd_return_output'
  404 |   wire [25:0] bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_70bd_return_output;
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    /home/julian/pipelinec_output/verilator/../top/top.v:404:15:      Example path: top.test_bench_0clk_4ca9d0ec.llvm_dis_z11float_rsqrtf_rsqrtf_c_l97_c10_7ff1.bin_op_minus_rsqrtf_c_l47_c9_07b8.bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_70bd_return_output
                    /home/julian/pipelinec_output/verilator/../top/top.v:601:28:      Example path: ASSIGNW
                    /home/julian/pipelinec_output/verilator/../top/top.v:404:15:      Example path: top.test_bench_0clk_4ca9d0ec.llvm_dis_z11float_rsqrtf_rsqrtf_c_l97_c10_7ff1.bin_op_minus_rsqrtf_c_l47_c9_07b8.bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_70bd_return_output
%Error: Exiting due to 2 warning(s)

And lolz
I hadnt see the top part of the message about MULTITOP - that seems like some kind of easier dumb mistake than Feedback to clock or circular logic: (maybe the first causes the second)

0 replies

JulianKemmerer · 2021-10-03T16:20:36Z

JulianKemmerer
Oct 3, 2021
Maintainer

in trying to narrow down to bin_op_minus_float_float_float from above , I had then also done
./src/pipelinec ./examples/verilator/math_pkg/fp32sub/fp32sub.c --sim_comb --verilator --main_cpp ./examples/verilator/math_pkg/fp32sub/test.cpp

And get just the original circular logic error - darn - nothing silly about MULTITOP to blame

%Warning-UNOPTFLAT: /home/julian/pipelinec_output/verilator/../top/top.v:262:15: Signal unoptimizable: Feedback to clock or circular logic: 'top.test_bench_0clk_97e9cc1c.fp32sub_fp32sub_c_l34_c10_e66f.bin_op_minus_fp32sub_c_l8_c10_6202.bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_df61_return_output'
  262 |   wire [25:0] bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_df61_return_output;
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    ... For warning description see https://verilator.org/warn/UNOPTFLAT?v=4.213
                    ... Use "/* verilator lint_off UNOPTFLAT */" and lint_on around source to disable this message.
                    /home/julian/pipelinec_output/verilator/../top/top.v:262:15:      Example path: top.test_bench_0clk_97e9cc1c.fp32sub_fp32sub_c_l34_c10_e66f.bin_op_minus_fp32sub_c_l8_c10_6202.bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_df61_return_output
                    /home/julian/pipelinec_output/verilator/../top/top.v:459:28:      Example path: ASSIGNW
                    /home/julian/pipelinec_output/verilator/../top/top.v:262:15:      Example path: top.test_bench_0clk_97e9cc1c.fp32sub_fp32sub_c_l34_c10_e66f.bin_op_minus_fp32sub_c_l8_c10_6202.bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_df61_return_output
%Error: Exiting due to 1 warning(s)

0 replies

JulianKemmerer · 2021-10-03T16:22:32Z

JulianKemmerer
Oct 3, 2021
Maintainer

its supposed to run like u24mult demo
./src/pipelinec ./examples/verilator/math_pkg/u24mult/u24mult.c --sim_comb --verilator --main_cpp ./examples/verilator/math_pkg/u24mult/test.cpp

================== Doing Verilator Simulation ================================
Compiling...
Starting simulation...
100 inputs checked.
Test passed!

0 replies

JulianKemmerer · 2021-10-03T16:29:35Z

JulianKemmerer
Oct 3, 2021
Maintainer

Oh final data point like I mentioned - a ghdl error now when using last night oss cad suite build tar ball

./src/pipelinec ./examples/verilator/math_pkg/fp32sub/fp32sub.c --sim_comb --verilator --main_cpp ./examples/verilator/math_pkg/fp32sub/test.cpp

ERROR: Assert `n.id != 0' failed in frontends/ghdl/ghdl.cc:204.

0 replies

JulianKemmerer · 2021-10-03T18:12:47Z

JulianKemmerer
Oct 3, 2021
Maintainer

Per above, my thinking is its a ghdl plugin for yosys problem - before jumping to it being a Verilator issue.

0 replies

JulianKemmerer · 2021-10-03T18:46:38Z

JulianKemmerer
Oct 3, 2021
Maintainer

Also I setup an example trying to even more closely mimic your original working setup at the start of this discussion.

./src/pipelinec examples/llvm/rsqrtf.c --sim_comb --verilator --main_cpp ./examples/llvm/main.cpp

On my (slightly older) local build gets verilator error like above

%Warning-UNOPTFLAT: /home/julian/pipelinec_output/verilator/../top/top.v:414:15: Signal unoptimizable: Feedback to clock or circular logic: 'top.app_0clk_cc987c38.llvm_dis_z11float_rsqrtf_rsqrtf_c_l52_c10_a68e.bin_op_minus_rsqrtf_c_l43_c9_ec93.bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_2f13_return_output'
  414 |   wire [25:0] bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_2f13_return_output;
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    ... For warning description see https://verilator.org/warn/UNOPTFLAT?v=4.213
                    ... Use "/* verilator lint_off UNOPTFLAT */" and lint_on around source to disable this message.
                    /home/julian/pipelinec_output/verilator/../top/top.v:414:15:      Example path: top.app_0clk_cc987c38.llvm_dis_z11float_rsqrtf_rsqrtf_c_l52_c10_a68e.bin_op_minus_rsqrtf_c_l43_c9_ec93.bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_2f13_return_output
                    /home/julian/pipelinec_output/verilator/../top/top.v:611:28:      Example path: ASSIGNW
                    /home/julian/pipelinec_output/verilator/../top/top.v:414:15:      Example path: top.app_0clk_cc987c38.llvm_dis_z11float_rsqrtf_rsqrtf_c_l52_c10_a68e.bin_op_minus_rsqrtf_c_l43_c9_ec93.bin_op_minus_bin_op_minus_float_float_float_c_l143_c18_2f13_return_output
%Error: Exiting due to 1 warning(s)

And on my latest oss cad suite binaries gets ghdl assertion also like we've seen

ERROR: Assert `n.id != 0' failed in frontends/ghdl/ghdl.cc:204.

0 replies

JulianKemmerer · 2021-10-03T19:19:18Z

JulianKemmerer
Oct 3, 2021
Maintainer

Actually reading about the verilator warning it might be ok
https://verilator.org/guide/latest/warnings.html#cmdoption-arg-UNOPTFLAT

Often UNOPTFLAT is caused by logic that isn’t truly circular as viewed by synthesis which analyzes interconnection per-bit, but is circular to simulation which analyzes per-bus. ... If UNOPTFLAT is suppressed the code may get a DIDNOTCONVERGE error.

Explaining why synthesizes fine. We may want to disable this warning and just watch out for DIDNOTCONVERGE?

I am trying to see if this really is showing up as circular verilog or not...

0 replies

JulianKemmerer · 2021-10-03T19:20:45Z

JulianKemmerer
Oct 3, 2021
Maintainer

Also this VHDL that was related to our last GHDL plugin for yosys issue

PipelineC/src/VHDL.py

Line 3047 in 85c7bca

    
           rv += " " + " " + "-- Some tools dont like if read_pipe is never fully driven, dummy drive\n"

Changes the behavior from GHDL assertion to Verilator circular logic error and back if un/commented

0 replies

suarezvictor · 2021-10-03T23:10:56Z

suarezvictor
Oct 3, 2021
Author

I receive verilator warnings that I just ignored since the code was generated, anyways. Just for checking: did my sqrt example work for you? I've got it running. Maybe that's the case, please just confirm El dom., 3 oct. 2021 16:20, Julian Kemmerer ***@***.***> escribió:

…

Also this VHDL that was related to our last GHDL plugin for yosys issue https://github.com/JulianKemmerer/PipelineC/blob/85c7bcace91b48b52714e0c118a05578c26d9888/src/VHDL.py#L3047 Changes the behavior from GHDL assertion to Verilator circular logic error and back if un/commented — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACBHVWL3VK2LJQ4G6SHFLHLUFCUJPANCNFSM5FFFAISQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

1 reply

JulianKemmerer Oct 3, 2021
Maintainer

I am having some better luck - will explain soon

JulianKemmerer · 2021-10-03T23:26:33Z

JulianKemmerer
Oct 3, 2021
Maintainer

I just checked in a fix to the generated vhdl that should help with circular logic problems and set verilator to ignore that UNOPTFLAT warning.

Right now all the ~math library tests (10 random nums each) all pass. Im just happy they compile :-p

TESTS: fp32sub  rsqrtf  u24add  u24mult

./src/pipelinec ./examples/verilator/math_pkg/$TEST/$TEST.c --sim_comb --verilator --main_cpp ./examples/verilator/math_pkg/$TEST/test.cpp

I swore I saw something weird in the fp32sub - will try to run that for increasingly more random numbers.

But feel free to give it a go yourself.

What do ya think @suarezvictor ?

0 replies

suarezvictor · 2021-10-04T00:02:36Z

suarezvictor
Oct 4, 2021
Author

I'm truly inspired with the progress, now that we can write and algorithm and easyly test it behaves correctly. Next step will be (I hope) to give some estimation of time performance and resource usage El dom., 3 oct. 2021 20:26, Julian Kemmerer ***@***.***> escribió:

…

I just checked in a fix to the generated vhdl that should help with circular logic problems and set verilator to ignore that UNOPTFLAT warning. Right now all the ~math library tests (10 random nums each) all pass. Im just happy they compile :-p TESTS: fp32sub rsqrtf u24add u24mult ./src/pipelinec ./examples/verilator/math_pkg/$TEST/$TEST.c --sim_comb --verilator --main_cpp ./examples/verilator/math_pkg/$TEST/test.cpp I swore I saw something weird in the fp32sub - will try to run that for increasingly more random numbers. But feel free to give it a go yourself. What do ya think @suarezvictor ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACBHVWN7GGMYLPE7O6GGEWDUFDRDHANCNFSM5FFFAISQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

0 replies

JulianKemmerer · 2021-10-04T00:07:30Z

JulianKemmerer
Oct 4, 2021
Maintainer

I just updated the above tests to do more test cases
The u24 add and mult pass 10^8 random test cases (10^9 was longer than I was willing to wait at the moment). They seem fine.

I think we need to prescribe a range for this approximate rsqrtf function. It fails for very small inputs (large outputs) it seems:
Out of 1000 random floats these were too far off:

x: 1.25234e-38 c_result: 8.93592e+18 result: 1.31415e+19 err: 4.20562e+18 allowed_err: 1.78718e+16
FAILED
x: 9.72432e-39 c_result: 1.01408e+19 result: 1.40952e+19 err: 3.95445e+18 allowed_err: 2.02815e+16
FAILED
x: 1.07689e-38 c_result: 9.6364e+18 result: 1.36578e+19 err: 4.02138e+18 allowed_err: 1.92728e+16
FAILED
x: 1.62758e-38 c_result: 7.83844e+18 result: 1.20374e+19 err: 4.19901e+18 allowed_err: 1.56769e+16
FAILED
x: 1.807e-38 c_result: 7.43912e+18 result: 1.15095e+19 err: 4.0704e+18 allowed_err: 1.48782e+16
FAILED
x: 3.38687e-39 c_result: 1.71831e+19 result: 1.78247e+19 err: 6.4158e+17 allowed_err: 3.43662e+16
FAILED
x: 1.71803e-38 c_result: 7.62929e+18 result: 1.17713e+19 err: 4.14199e+18 allowed_err: 1.52586e+16
FAILED
x: 1.52756e-38 c_result: 8.09096e+18 result: 1.23317e+19 err: 4.24075e+18 allowed_err: 1.61819e+16
FAILED
x: 2.1005e-38 c_result: 6.89983e+18 result: 1.06459e+19 err: 3.74608e+18 allowed_err: 1.37997e+16
FAILED
x: 1.6355e-38 c_result: 7.81942e+18 result: 1.20141e+19 err: 4.1947e+18 allowed_err: 1.56388e+16
FAILED
x: 8.50958e-39 c_result: 1.08404e+19 result: 1.481e+19 err: 3.96963e+18 allowed_err: 2.16808e+16
FAILED
1000 outputs checked.
Test failed!

But there is defintely an issue with the fp32 sub. Just 100 tests shows a bunch of failures (passing cases included here too)

x: 3.36737e+26 y: -1.39159e+22 c_result: 3.36751e+26 result: 3.36751e+26 err: 0 allowed_err: 3.36751e+21
x: -1.12099e+32 y: 4.13144e-13 c_result: -1.12099e+32 result: -1.12099e+32 err: 1.354e+26 allowed_err: 1.12099e+27
x: -1.54089e-17 y: -0.0379264 c_result: 0.0379264 result: 0.0379263 err: 6.33299e-08 allowed_err: 3.79264e-07
x: -8.06074e-11 y: 1.47113e+35 c_result: -1.47113e+35 result: -1.47113e+35 err: 1.9807e+28 allowed_err: 1.47113e+30
x: 5.98616e+10 y: 5.04974e-28 c_result: 5.98616e+10 result: 5.98616e+10 err: 0 allowed_err: 598616
x: 1.10303e-37 y: -2.28461e-28 c_result: 2.28461e-28 result: 2.28461e-28 err: 0 allowed_err: 2.28461e-33
x: 7.17222e-27 y: -6.19151e-29 c_result: 7.23413e-27 result: 7.23413e-27 err: 0 allowed_err: 7.23413e-32
x: -nan y: 20.9711 c_result: -nan result: -nan err: nan allowed_err: nan
x: 9.85586e+08 y: -8.33829e+10 c_result: 8.43685e+10 result: 8.43684e+10 err: 8192 allowed_err: 843685
x: 0.678622 y: 2.09199e-16 c_result: 0.678622 result: 0.678621 err: 8.9407e-07 allowed_err: 6.78622e-06
x: -236.379 y: -4.34675e-08 c_result: -236.379 result: -49.688 err: 186.691 allowed_err: 0.00236379
FAILED
x: -4.24265e-17 y: 1.07442e+24 c_result: -1.07442e+24 result: -1.08886e+24 err: 1.4437e+22 allowed_err: 1.07442e+19
FAILED
x: 1.43085e-33 y: 211.816 c_result: -211.816 result: -211.816 err: 0.000106812 allowed_err: 0.00211816
x: 3.99278e-24 y: -2.40796e+30 c_result: 2.40796e+30 result: 2.40797e+30 err: 5.7424e+24 allowed_err: 2.40796e+25
x: 5.49991e-34 y: -0.00114418 c_result: 0.00114418 result: 0.00118776 err: 4.35747e-05 allowed_err: 1.14418e-08
FAILED
x: -1.35503e-20 y: -1.99302e+31 c_result: 1.99302e+31 result: 1.99104e+31 err: 1.98034e+28 allowed_err: 1.99302e+26
FAILED
x: -2.19691e-18 y: -4.90335e-10 c_result: 4.90335e-10 result: 4.90335e-10 err: 0 allowed_err: 4.90335e-15
x: -19.3449 y: -558.719 c_result: 539.374 result: 539.374 err: 0 allowed_err: 0.00539374
x: -3.15925e-05 y: -3.34204e+33 c_result: 3.34204e+33 result: 3.34204e+33 err: 0 allowed_err: 3.34204e+28
x: 4.93803e-17 y: 1.26071e+11 c_result: -1.26071e+11 result: -1.26071e+11 err: 0 allowed_err: 1.26071e+06
x: 6.15032e+14 y: 5.08654e-05 c_result: 6.15032e+14 result: -3.2327e+14 err: 9.38302e+14 allowed_err: 6.15032e+09
FAILED
x: 4.93427e+25 y: -2.73736e-21 c_result: 4.93427e+25 result: 4.93427e+25 err: 0 allowed_err: 4.93427e+20
x: -6.15621e-12 y: 3.0836e+35 c_result: -3.0836e+35 result: -3.0836e+35 err: 0 allowed_err: 3.0836e+30
x: 2.79239e+12 y: 4.66107e-05 c_result: 2.79239e+12 result: 2.79239e+12 err: 0 allowed_err: 2.79239e+07
x: 2.97991e-08 y: -3.26495e+14 c_result: 3.26495e+14 result: 3.27044e+14 err: 5.49689e+11 allowed_err: 3.26495e+09
FAILED
x: 0.0948505 y: 1.05059e+35 c_result: -1.05059e+35 result: -1.05059e+35 err: 0 allowed_err: 1.05059e+30
x: -1.45481e-27 y: 1.53469e+11 c_result: -1.53469e+11 result: -1.53469e+11 err: 0 allowed_err: 1.53469e+06
x: 3.47184e+09 y: 2.29797e+22 c_result: -2.29797e+22 result: -2.29648e+22 err: 1.49114e+19 allowed_err: 2.29797e+17
FAILED
x: 0.000175854 y: -1.66838e-24 c_result: 0.000175854 result: 0.00020663 err: 3.07762e-05 allowed_err: 1.75854e-09
FAILED
x: -47500.5 y: -7.3382e-26 c_result: -47500.5 result: -41686.6 err: 5813.92 allowed_err: 0.475005
FAILED
x: -5.56866e-31 y: -0.91684 c_result: 0.91684 result: 0.872721 err: 0.0441195 allowed_err: 9.16841e-06
FAILED
x: -4.66368e+37 y: -3.42815e+14 c_result: -4.66368e+37 result: -4.66305e+37 err: 6.32304e+33 allowed_err: 4.66368e+32
FAILED
x: -3.58115e+19 y: 2.8978e-16 c_result: -3.58115e+19 result: -3.58115e+19 err: 2.19902e+13 allowed_err: 3.58115e+14
x: -6.83276e+06 y: -3.26926e-27 c_result: -6.83276e+06 result: -6.8325e+06 err: 259 allowed_err: 68.3276
FAILED
x: -2.83125e-29 y: 2.61103e-26 c_result: -2.61387e-26 result: -2.61387e-26 err: 3.08149e-33 allowed_err: 2.61387e-31
x: -3.32686e-33 y: -6.96942e-35 c_result: -3.25717e-33 result: -3.25717e-33 err: 3.67342e-40 allowed_err: 3.25717e-38
x: -2.29528e-25 y: 1.86825e+23 c_result: -1.86825e+23 result: -1.86825e+23 err: 0 allowed_err: 1.86825e+18
x: -2.37559e+12 y: -1.16915e+11 c_result: -2.25868e+12 result: -2.25868e+12 err: 0 allowed_err: 2.25868e+07
x: -9.52058e-32 y: -75.1218 c_result: 75.1218 result: 75.1142 err: 0.00753784 allowed_err: 0.000751218
FAILED
x: -1.46065e-33 y: -0.00219461 c_result: 0.00219461 result: 0.00207889 err: 0.000115725 allowed_err: 2.19461e-08
FAILED
x: 927764 y: 6.24164e-35 c_result: 927764 result: 906524 err: 21239.2 allowed_err: 9.27764
FAILED
x: nan y: 2.31808e+30 c_result: nan result: nan err: nan allowed_err: nan
x: nan y: 7.36111e+28 c_result: nan result: 1.72673e+38 err: nan allowed_err: nan
x: 1.25234e-38 y: -3.47757e+07 c_result: 3.47757e+07 result: 3.47757e+07 err: 4 allowed_err: 347.757
x: -2.2769e-26 y: 1.40746e+32 c_result: -1.40746e+32 result: 4.08493e+31 err: 1.81595e+32 allowed_err: 1.40746e+27
FAILED
x: -2.80621e-11 y: 2.79777e+06 c_result: -2.79777e+06 result: -2.79777e+06 err: 0 allowed_err: 27.9777
x: -5.54839e+14 y: -483.032 c_result: -5.54839e+14 result: -5.52764e+14 err: 2.0746e+12 allowed_err: 5.54839e+09
FAILED
x: -1.68715e-15 y: -1.78741e+06 c_result: 1.78741e+06 result: 1.75629e+06 err: 31122.2 allowed_err: 17.8741
FAILED
x: -2.65482e+19 y: 3.21315e-36 c_result: -2.65482e+19 result: -2.65482e+19 err: 4.39805e+12 allowed_err: 2.65482e+14
x: -1.01284e+26 y: -1.23395e+29 c_result: 1.23294e+29 result: 1.23294e+29 err: 9.44473e+21 allowed_err: 1.23294e+24
x: 1.84511e+37 y: -2.5592e+31 c_result: 1.84512e+37 result: 1.84512e+37 err: 0 allowed_err: 1.84512e+32
x: 1.51413e+13 y: 2.55367e-26 c_result: 1.51413e+13 result: 6.45159e+12 err: 8.6897e+12 allowed_err: 1.51413e+08
FAILED
x: 1.99133e+30 y: 7.2842e+11 c_result: 1.99133e+30 result: 1.99133e+30 err: 0 allowed_err: 1.99133e+25
x: -1.97019e+10 y: 6.11909e+15 c_result: -6.11911e+15 result: -6.11911e+15 err: 5.36871e+08 allowed_err: 6.11911e+10
x: -1.32664e-13 y: -3.0587e-33 c_result: -1.32664e-13 result: -7.62408e-14 err: 5.6423e-14 allowed_err: 1.32664e-18
FAILED
x: -1.06391e-21 y: -13.8849 c_result: 13.8849 result: 13.8652 err: 0.0196247 allowed_err: 0.000138849
FAILED
x: 2.34639e-17 y: -1.63247e+17 c_result: 1.63247e+17 result: 1.63249e+17 err: 1.85543e+12 allowed_err: 1.63247e+12
FAILED
x: 0.0132794 y: 3.62512e+34 c_result: -3.62512e+34 result: -3.62512e+34 err: 0 allowed_err: 3.62512e+29
x: 5.029e-13 y: -1.88508e-05 c_result: 1.88508e-05 result: 1.88508e-05 err: 0 allowed_err: 1.88508e-10
x: 2.3745e-12 y: -1.01378e+26 c_result: 1.01378e+26 result: 1.01378e+26 err: 0 allowed_err: 1.01378e+21
x: 6.76191e+13 y: -0.0859732 c_result: 6.76191e+13 result: 6.76195e+13 err: 3.69099e+08 allowed_err: 6.76191e+08
x: 4.66335e+16 y: 5.09659e+09 c_result: 4.66335e+16 result: 4.66335e+16 err: 0 allowed_err: 4.66335e+11
x: 4.7203e-07 y: -5.37527e+13 c_result: 5.37527e+13 result: 6.24601e+13 err: 8.70741e+12 allowed_err: 5.37527e+08
FAILED
x: 3.38635e-12 y: -5.18376e+08 c_result: 5.18376e+08 result: -4.92899e+08 err: 1.01127e+09 allowed_err: 5183.76
FAILED
x: 1.22984e+18 y: 7.15845e+18 c_result: -5.92861e+18 result: -5.92861e+18 err: 0 allowed_err: 5.92861e+13
x: -5.6923e+32 y: 1.32091e+12 c_result: -5.6923e+32 result: -5.93597e+32 err: 2.43666e+31 allowed_err: 5.6923e+27
FAILED
x: 668.103 y: -7.31714e-19 c_result: 668.103 result: 681.601 err: 13.4977 allowed_err: 0.00668103
FAILED
x: 4.63866e+14 y: 2.34538e-30 c_result: 4.63866e+14 result: 4.63865e+14 err: 7.71752e+08 allowed_err: 4.63866e+09
x: -1.71209e+06 y: 1.44123e+13 c_result: -1.44123e+13 result: -1.44123e+13 err: 1.04858e+06 allowed_err: 1.44123e+08
x: -3.76943e+21 y: -1.5784e-13 c_result: -3.76943e+21 result: -3.76941e+21 err: 1.23849e+16 allowed_err: 3.76943e+16
x: 4.24711e+37 y: 3.50517e+25 c_result: 4.24711e+37 result: 4.23205e+37 err: 1.50544e+35 allowed_err: 4.24711e+32
FAILED
x: -1.95126e-24 y: -4.68906e+35 c_result: 4.68906e+35 result: 4.56658e+35 err: 1.22483e+34 allowed_err: 4.68906e+30
FAILED
x: 5.33358e+20 y: -1.27814e-29 c_result: 5.33358e+20 result: 5.52038e+20 err: 1.86801e+19 allowed_err: 5.33358e+15
FAILED
x: 2.34875e-09 y: -2.95156e+34 c_result: 2.95156e+34 result: 2.95164e+34 err: 7.97233e+29 allowed_err: 2.95156e+29
FAILED
x: 1.63486e+28 y: 8.88674e+22 c_result: 1.63485e+28 result: 1.63485e+28 err: 0 allowed_err: 1.63485e+23
x: 4.91093e-16 y: 7.57993e+31 c_result: -7.57993e+31 result: -7.57993e+31 err: 0 allowed_err: 7.57993e+26
x: -1.52485 y: 2.1529e-26 c_result: -1.52485 result: -1.52485 err: 3.57628e-07 allowed_err: 1.52485e-05
x: 2.63872e+28 y: 0.00331545 c_result: 2.63872e+28 result: 2.61245e+28 err: 2.62677e+26 allowed_err: 2.63872e+23
FAILED
x: 0.985107 y: -4.91278e-25 c_result: 0.985107 result: 0.985116 err: 9.05991e-06 allowed_err: 9.85107e-06
x: 7.22232e+17 y: -1.87349e+08 c_result: 7.22232e+17 result: -7.78953e+17 err: 1.50118e+18 allowed_err: 7.22232e+12
FAILED
x: -2.41739e+26 y: 3.3524e-25 c_result: -2.41739e+26 result: -2.42229e+26 err: 4.89946e+23 allowed_err: 2.41739e+21
FAILED
x: -1.62252 y: -2.80515e-28 c_result: -1.62252 result: -1.62252 err: 0 allowed_err: 1.62252e-05
x: 6.54036e-14 y: -6.37708e+31 c_result: 6.37708e+31 result: 6.37708e+31 err: 1.93428e+25 allowed_err: 6.37708e+26
x: -1.9001e+26 y: 0.997526 c_result: -1.9001e+26 result: -1.9001e+26 err: 0 allowed_err: 1.9001e+21
x: 1.45119e-08 y: 5.03051e+08 c_result: -5.03051e+08 result: -5.03051e+08 err: 32 allowed_err: 5030.51
x: 2.77053e-27 y: -1.2694e-30 c_result: 2.7718e-27 result: 2.7718e-27 err: 0 allowed_err: 2.7718e-32
x: -6.2932e-11 y: 2.08589e-13 c_result: -6.31406e-11 result: -6.31406e-11 err: 6.93889e-18 allowed_err: 6.31406e-16
x: 5.32492e-06 y: 5.84684e+06 c_result: -5.84684e+06 result: -5.82397e+06 err: 22870 allowed_err: 58.4684
FAILED
x: -3.6045e+12 y: -0.22266 c_result: -3.6045e+12 result: -3.60355e+12 err: 9.56301e+08 allowed_err: 3.6045e+07
FAILED
x: 6.70186e+29 y: -49.119 c_result: 6.70186e+29 result: 6.70186e+29 err: 0 allowed_err: 6.70186e+24
x: 56869.4 y: 5.6651e+25 c_result: -5.6651e+25 result: -5.5602e+25 err: 1.04905e+24 allowed_err: 5.6651e+20
FAILED
x: -4.19507e-20 y: 9.89309e+17 c_result: -9.89309e+17 result: -9.89309e+17 err: 0 allowed_err: 9.89309e+12
x: -4.76491e+37 y: 9.00249e+30 c_result: -4.76491e+37 result: -4.76491e+37 err: 5.0706e+30 allowed_err: 4.76491e+32
x: -2.76974e-07 y: 1.43578e+22 c_result: -1.43578e+22 result: -1.43578e+22 err: 0 allowed_err: 1.43578e+17
x: -2.62365e-16 y: 3.1234e+28 c_result: -3.1234e+28 result: -3.12341e+28 err: 8.73638e+22 allowed_err: 3.1234e+23
x: 3.81717e-35 y: -6.53164e+37 c_result: 6.53164e+37 result: 6.53174e+37 err: 1.02426e+33 allowed_err: 6.53164e+32
FAILED
x: -1.05139e-22 y: -2.9441e+28 c_result: 2.9441e+28 result: 2.92873e+28 err: 1.53661e+26 allowed_err: 2.9441e+23
FAILED
x: 2.93295e-31 y: -1.11143e-15 c_result: 1.11143e-15 result: 1.11143e-15 err: 1.16467e-21 allowed_err: 1.11143e-20
x: 1.36415e+24 y: 5.71097e-35 c_result: 1.36415e+24 result: 1.00567e+24 err: 3.58483e+23 allowed_err: 1.36415e+19
FAILED
x: -0.00114765 y: 1.31907e+15 c_result: -1.31907e+15 result: -1.31907e+15 err: 0 allowed_err: 1.31907e+10
100 outputs checked.
Test failed!

1 reply

JulianKemmerer Oct 4, 2021
Maintainer

Actually for rsqrtf it makes more sense to do like you did and compare against the simulated llvm func implementation instead of 1/sqrt as I am now above

suarezvictor · 2021-10-04T00:50:02Z

suarezvictor
Oct 4, 2021
Author

I really think and agree the comparison should't be with the C math library for fast implementations like the provided one. But comparison with verilated version should match within a rounding error. Said in another way, even without conversion to logic, rsqrt is not the same as 1/sqrt as calculated by the standard implementation. In this special case, a model of error should be applied to calculated tolerance (that already exist) but for the moment we can move forward without tests that deep. El dom., 3 oct. 2021 21:26, Julian Kemmerer ***@***.***> escribió:

…

Actually for rsqrtf it makes more sense to do like you did and compare against the simulated llvm func implementation instead of 1/sqrt as I am now above — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACBHVWKTV6ARUDYNOJYTKYLUFDYCTANCNFSM5FFFAISQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

2 replies

JulianKemmerer Oct 4, 2021
Maintainer

I switched over to comparing to the llvm rsqrtf func. It still fails in the way I described above it seems (for small inputs). But I want to fix the fp32 sub (which is used inside rsqrtf) and then return to see if still broken.

JulianKemmerer Oct 4, 2021
Maintainer

I fixed some fp32sub errors last night but some remain. And I think I have narrowed the remaining issue it down to a bug in i25sub (part of fp32sub). Added another math_pkg test case and have issues in front of me I plan to dig into after work.

suarezvictor · 2021-10-06T10:37:01Z

suarezvictor
Oct 6, 2021
Author

May main idea for floating point opwrations mayor diverge a bit of the current course. What I would to is to copy the implementation tricks of a proven library and port ir to pipelineC. See for example this one that seems very clean: https://github.com/LiraNuna/soft-ieee754/blob/master/includes/ieee754.hpp Subatraction and adding operations are based on two interesting functions called "renormalize" and "from unsigned". If a once the " BITS(x, 5, 3)"-like macros are implemented, then the same you can also copy the template-constant operations to get also double precision with a single implementation, by calculating compile-time constants appropiately. El mié., 6 oct. 2021 07:10, Victor Suarez Rovere ***@***.***> escribió:

…

Maybe a silly proposal but what about using add function and flip the sign bit of the second operand? Only one function to debug and may use less resources also. El mié., 6 oct. 2021 06:51, Victor Suarez Rovere ***@***.***> escribió: > Another way of testing it is to use the bit-exact clase on "bitregs.h" > and run a debugger on that - another advantage of C compatibility. I can > try that route (bit please be patient until I finish my consulting tasks) > > El mié., 6 oct. 2021 01:43, Julian Kemmerer ***@***.***> > escribió: > >> OK up to 1 million test cases for fp32sub - but getting a handful of >> interesting failures >> >> x: float -1.734040e-15, uint32 0xA6F9E6C9 y: float -1.777618e-15, uint32 0xA7001746 c_result: float 4.357847e-17, uint32 0x2448F860 result: float 4.357857e-17, uint32 0x2448F880 err: 1.05879e-22 allowed_err: 4.35785e-23 FAILED >> x: float 1.822899e+13, uint32 0x5584A225 y: float 1.729264e+13, uint32 0x557BA419 c_result: float 9.363543e+11, uint32 0x535A0310 result: float 9.363553e+11, uint32 0x535A0320 err: 1.04858e+06 allowed_err: 936354 FAILED >> x: float 4.330267e+12, uint32 0x547C0E03 y: float 4.572113e+12, uint32 0x548510E6 c_result: float -2.418459e+11, uint32 0xD2613C90 result: float -2.418462e+11, uint32 0xD2613CA0 err: 262144 allowed_err: 241846 FAILED >> x: float -5.907621e-11, uint32 0xAE81E8F3 y: float -5.666673e-11, uint32 0xAE793911 c_result: float -2.409479e-12, uint32 0xAC298D50 result: float -2.409482e-12, uint32 0xAC298D60 err: 3.46945e-18 allowed_err: 2.40948e-18 FAILED >> x: float -4.415078e+06, uint32 0xCA86BCCB y: float -4.192051e+06, uint32 0xCA7FDCCB c_result: float -2.230268e+05, uint32 0xC859CCB0 result: float -2.230270e+05, uint32 0xC859CCC0 err: 0.25 allowed_err: 0.223027 FAILED >> x: float -1.395268e+11, uint32 0xD201F1C5 y: float -1.337852e+11, uint32 0xD1F931C1 c_result: float -5.741552e+09, uint32 0xCFAB1C90 result: float -5.741560e+09, uint32 0xCFAB1CA0 err: 8192 allowed_err: 5741.55 FAILED >> x: float -5.685577e+17, uint32 0xDCFC7D87 y: float -5.888213e+17, uint32 0xDD02BE9E c_result: float 2.026362e+16, uint32 0x5A8FFB50 result: float 2.026366e+16, uint32 0x5A8FFB60 err: 3.43597e+10 allowed_err: 2.02636e+10 FAILED >> 1000000 outputs checked. >> Test failed! >> >> The first one >> >> x: float -1.734040e-15, uint32 0xA6F9E6C9 >> y: float -1.777618e-15, uint32 0xA7001746 >> c_result: float 4.357847e-17, uint32 0x2448F860 >> result: float 4.357857e-17, uint32 0x2448F880 >> err: 1.05879e-22 allowed_err: 4.35785e-23 FAILED >> >> Similar to other cases the lest significant 4 bits of the mantissa >> (rightmost hex char) are zeros but then the first non zero bits are >> slightly off ... maybe in 'rounding' sort of way idk >> >> Any thoughts on this kinda of mismatch? >> >> I am going to confirm I see this in modelsim too for sanity - easy >> enough to do after this many tries >> >> — >> You are receiving this because you were mentioned. >> Reply to this email directly, view it on GitHub >> <#25 (comment)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/ACBHVWOD4JRLH72AMW7MXKDUFPHX7ANCNFSM5FFFAISQ> >> . >> Triage notifications on the go with GitHub Mobile for iOS >> <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> >> or Android >> <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. >> >> >

0 replies

JulianKemmerer · 2021-10-06T22:38:49Z

JulianKemmerer
Oct 6, 2021
Maintainer

OK the FP32 add shows similar issues so I think the same from from fp32sub exists

fp32 add

x: float 4.015406e+03, uint32 0x457AF67D y: float -4.258397e+03, uint32 0xC585132D c_result: float -2.429915e+02, uint32 0xC372FDD0 result: float -2.429917e+02, uint32 0xC372FDE0 err: 0.000244141 allowed_err: 0.000242991 FAILED
x: float 3.079372e-02, uint32 0x3CFC431B y: float -3.176257e-02, uint32 0xBD021978 c_result: float -9.688530e-04, uint32 0xBA7DFAA0 result: float -9.688549e-04, uint32 0xBA7DFAC0 err: 1.86265e-09 allowed_err: 9.68853e-10 FAILED
x: float 6.835042e+35, uint32 0x7B03A35C y: float -6.513701e+35, uint32 0xFAFAE60D c_result: float 3.213411e+34, uint32 0x78C60AB0 result: float 3.213415e+34, uint32 0x78C60AC0 err: 3.96141e+28 allowed_err: 3.21341e+28 FAILED
x: float -1.684015e+07, uint32 0xCB807AEB y: float 1.606630e+07, uint32 0x4B7526FD c_result: float -7.738490e+05, uint32 0xC93CED90 result: float -7.738500e+05, uint32 0xC93CEDA0 err: 1 allowed_err: 0.773849 FAILED
x: float -9.023538e+15, uint32 0xDA003B71 y: float 8.817388e+15, uint32 0x59FA9AF1 c_result: float -2.061504e+14, uint32 0xD73B7E20 result: float -2.061509e+14, uint32 0xD73B7E40 err: 5.36871e+08 allowed_err: 2.0615e+08 FAILED
x: float -6.069733e-05, uint32 0xB87E9543 y: float 6.428654e-05, uint32 0x3886D193 c_result: float 3.589212e-06, uint32 0x3670DE30 result: float 3.589215e-06, uint32 0x3670DE40 err: 3.63798e-12 allowed_err: 3.58921e-12 FAILED
x: float 5.333232e+08, uint32 0x4DFE4EEF y: float -5.494575e+08, uint32 0xCE03003A c_result: float -1.613430e+07, uint32 0xCB7630A0 result: float -1.613434e+07, uint32 0xCB7630C0 err: 32 allowed_err: 16.1343 FAILED
x: float 6.323657e+26, uint32 0x6C02C529 y: float -6.097366e+26, uint32 0xEBFC2E5F c_result: float 2.262910e+25, uint32 0x6995BF30 result: float 2.262914e+25, uint32 0x6995BF40 err: 3.68935e+19 allowed_err: 2.26291e+19 FAILED
x: float 3.810096e-06, uint32 0x367FB0F5 y: float -3.834480e-06, uint32 0xB680A9EF c_result: float -2.438378e-08, uint32 0xB2D17480 result: float -2.438401e-08, uint32 0xB2D17500 err: 2.27374e-13 allowed_err: 2.43838e-14 FAILED
x: float -5.382599e+36, uint32 0xFC8194D4 y: float 5.219303e+36, uint32 0x7C7B4CDF c_result: float -1.632965e+35, uint32 0xF9FB9920 result: float -1.632968e+35, uint32 0xF9FB9940 err: 3.16913e+29 allowed_err: 1.63297e+29 FAILED
x: float -3.024259e-02, uint32 0xBCF7BF4F y: float 3.187624e-02, uint32 0x3D0290A8 c_result: float 1.633646e-03, uint32 0x3AD62010 result: float 1.633648e-03, uint32 0x3AD62020 err: 1.86265e-09 allowed_err: 1.63365e-09 FAILED
x: float 5.217902e+05, uint32 0x48FEC7C7 y: float -5.445391e+05, uint32 0xC904F1B2 c_result: float -2.274891e+04, uint32 0xC6B1B9D0 result: float -2.274894e+04, uint32 0xC6B1B9E0 err: 0.03125 allowed_err: 0.0227489 FAILED
x: float -1.727113e-12, uint32 0xABF311CF y: float 1.822396e-12, uint32 0x2C003D60 c_result: float 9.528326e-14, uint32 0x29D68F10 result: float 9.528337e-14, uint32 0x29D68F20 err: 1.0842e-19 allowed_err: 9.52833e-20 FAILED
x: float 1.492741e-08, uint32 0x328039B6 y: float -1.454009e-08, uint32 0xB279CBFB c_result: float 3.873177e-10, uint32 0x2FD4EE20 result: float 3.873186e-10, uint32 0x2FD4EE40 err: 8.88178e-16 allowed_err: 3.87318e-16 FAILED
x: float -4.595889e+15, uint32 0xD9829F7E y: float 4.434701e+15, uint32 0x597C1563 c_result: float -1.611882e+14, uint32 0xD7129990 result: float -1.611885e+14, uint32 0xD71299A0 err: 2.68435e+08 allowed_err: 1.61188e+08 FAILED
1000000 outputs checked.
Test failed!

float BIN_OP_PLUS_float_float_float(float left, float right)
{
  // Get exponent for left and right
  uint8_t left_exponent;
  left_exponent = float_30_23(left);
  uint8_t right_exponent;
  right_exponent = float_30_23(right);
    
  float x;
  float y;
  // Step 1: Copy inputs so that left's exponent >= than right's.
  // ?????????MAYBE TODO: 
  //    Is this only needed for shift operation that takes unsigned only?
  //    ALLOW SHIFT BY NEGATIVE?????
  //    OR NO since that looses upper MSBs of mantissa which not acceptable? IDK too many drinks
  if ( right_exponent > left_exponent ) // Lazy switch to GT
  {
     x = right;  
     y = left;
  }
  else
  { 
     x = left;
     y = right;
  }
  
  // Step 2: Break apart into S E M
  // X
  uint23_t x_mantissa; 
  x_mantissa = float_22_0(x);
  uint8_t x_exponent;
  x_exponent = float_30_23(x);
  uint1_t x_sign;
  x_sign = float_31_31(x);
  // Y
  uint23_t y_mantissa;
  y_mantissa = float_22_0(y);
  uint8_t y_exponent;
  y_exponent = float_30_23(y);
  uint1_t y_sign;
  y_sign = float_31_31(y);
  
  // Mantissa needs +3b wider
  //  [sign][overflow][hidden][23 bit mantissa]
  // Put 0's in overflow bit and sign bit
  // Put a 1 hidden bit if exponent is non-zero.
  // X
  // Determine hidden bit
  uint1_t x_hidden_bit;
  if(x_exponent == 0) // lazy swith to ==
  {
    x_hidden_bit = 0;
  }
  else
  {
    x_hidden_bit = 1;
  }
  // Apply hidden bit
  uint24_t x_mantissa_w_hidden_bit; 
  x_mantissa_w_hidden_bit = uint1_uint23(x_hidden_bit, x_mantissa);
  // Y
  // Determine hidden bit
  uint1_t y_hidden_bit;
  if(y_exponent == 0) // lazy swith to ==
  {
    y_hidden_bit = 0;
  }
  else
  {
    y_hidden_bit = 1;
  }
  // Apply hidden bit
  uint24_t y_mantissa_w_hidden_bit; 
  y_mantissa_w_hidden_bit = uint1_uint23(y_hidden_bit, y_mantissa);

  // Step 3: Un-normalize Y (including hidden bit) so that xexp == yexp.
  // Already swapped left/right based on exponent
  // diff will be >= 0
  uint8_t diff;
  diff = x_exponent - y_exponent;
  // Shift y by diff (bit manip pipelined function)
  uint24_t y_mantissa_w_hidden_bit_unnormalized;
  y_mantissa_w_hidden_bit_unnormalized = y_mantissa_w_hidden_bit >> diff;
  
  // Step 4: If necessary, negate mantissas (twos comp) such that add makes sense
  // STEP 2.B moved here
  // Make wider for twos comp/sign
  int25_t x_mantissa_w_hidden_bit_sign_adj;
  int25_t y_mantissa_w_hidden_bit_sign_adj;
  if(x_sign) //if(x_sign == 1)
  {
    x_mantissa_w_hidden_bit_sign_adj = uint24_negate(x_mantissa_w_hidden_bit); //Returns +1 wider signed, int25t
  }
  else
  {
    x_mantissa_w_hidden_bit_sign_adj = x_mantissa_w_hidden_bit;
  }
  if(y_sign) // if(y_sign == 1)
  {
    y_mantissa_w_hidden_bit_sign_adj = uint24_negate(y_mantissa_w_hidden_bit_unnormalized);
  }
  else
  {
    y_mantissa_w_hidden_bit_sign_adj = y_mantissa_w_hidden_bit_unnormalized;
  }
  
  // Step 5: Compute sum 
  int26_t sum_mantissa;
  sum_mantissa = x_mantissa_w_hidden_bit_sign_adj + y_mantissa_w_hidden_bit_sign_adj;

  // Step 6: Save sign flag and take absolute value of sum.
  uint1_t sum_sign;
  sum_sign = int26_25_25(sum_mantissa);
  uint26_t sum_mantissa_unsigned;
  sum_mantissa_unsigned = int26_abs(sum_mantissa);

  // Step 7: Normalize sum and exponent. (Three cases.)
  uint1_t sum_overflow;
  sum_overflow = uint26_24_24(sum_mantissa_unsigned);
  uint8_t sum_exponent_normalized;
  uint23_t sum_mantissa_unsigned_normalized;
  if (sum_overflow) //if ( sum_overflow == 1 )
  {
     // Case 1: Sum overflow.
     //         Right shift significand by 1 and increment exponent.
     sum_exponent_normalized = x_exponent + 1;
     sum_mantissa_unsigned_normalized = uint26_23_1(sum_mantissa_unsigned);
  }
    else if(sum_mantissa_unsigned == 0) // laxy switch to ==
    {
     //
     // Case 3: Sum is zero.
     sum_exponent_normalized = 0;
     sum_mantissa_unsigned_normalized = 0;
  }
  else
  {
     // Case 2: Sum is nonzero and did not overflow.
     // Dont waste zeros at start of mantissa
     // Find position of first non-zero digit from left
     // Know bit25(sign) and bit24(overflow) are not set
     // Hidden bit is [23], can narrow down to 24b wide including hidden bit 
     uint24_t sum_mantissa_unsigned_narrow;
     sum_mantissa_unsigned_narrow = sum_mantissa_unsigned;
     uint5_t leading_zeros; // width = ceil(log2(len(sumsig)))
     leading_zeros = count0s_uint24(sum_mantissa_unsigned_narrow); // Count from left/msbs downto, uintX_count0s counts from right
     // NOT CHECKING xexp < adj
     // Case 2b: Adjust significand and exponent.
     sum_exponent_normalized = x_exponent - leading_zeros;
     sum_mantissa_unsigned_normalized = sum_mantissa_unsigned_narrow << leading_zeros;
    }
  
  // Declare the output portions
  uint23_t z_mantissa;
  uint8_t z_exponent;
  uint1_t z_sign;
  z_sign = sum_sign;
  z_exponent = sum_exponent_normalized;
  z_mantissa = sum_mantissa_unsigned_normalized;
  // Assemble output  
  return float_uint1_uint8_uint23(z_sign, z_exponent, z_mantissa);
}

0 replies

suarezvictor · 2021-10-06T23:37:48Z

suarezvictor
Oct 6, 2021
Author

Did you see the header-only library I posted? Isn't there good tricks to copy? In such library all is based around an interesting normalization function that seems to simplify everything El mié., 6 oct. 2021 19:39, Julian Kemmerer ***@***.***> escribió:

…

OK the FP32 add shows similar issues so I think the same from from fp32sub exists fp32 add x: float 4.015406e+03, uint32 0x457AF67D y: float -4.258397e+03, uint32 0xC585132D c_result: float -2.429915e+02, uint32 0xC372FDD0 result: float -2.429917e+02, uint32 0xC372FDE0 err: 0.000244141 allowed_err: 0.000242991 FAILED x: float 3.079372e-02, uint32 0x3CFC431B y: float -3.176257e-02, uint32 0xBD021978 c_result: float -9.688530e-04, uint32 0xBA7DFAA0 result: float -9.688549e-04, uint32 0xBA7DFAC0 err: 1.86265e-09 allowed_err: 9.68853e-10 FAILED x: float 6.835042e+35, uint32 0x7B03A35C y: float -6.513701e+35, uint32 0xFAFAE60D c_result: float 3.213411e+34, uint32 0x78C60AB0 result: float 3.213415e+34, uint32 0x78C60AC0 err: 3.96141e+28 allowed_err: 3.21341e+28 FAILED x: float -1.684015e+07, uint32 0xCB807AEB y: float 1.606630e+07, uint32 0x4B7526FD c_result: float -7.738490e+05, uint32 0xC93CED90 result: float -7.738500e+05, uint32 0xC93CEDA0 err: 1 allowed_err: 0.773849 FAILED x: float -9.023538e+15, uint32 0xDA003B71 y: float 8.817388e+15, uint32 0x59FA9AF1 c_result: float -2.061504e+14, uint32 0xD73B7E20 result: float -2.061509e+14, uint32 0xD73B7E40 err: 5.36871e+08 allowed_err: 2.0615e+08 FAILED x: float -6.069733e-05, uint32 0xB87E9543 y: float 6.428654e-05, uint32 0x3886D193 c_result: float 3.589212e-06, uint32 0x3670DE30 result: float 3.589215e-06, uint32 0x3670DE40 err: 3.63798e-12 allowed_err: 3.58921e-12 FAILED x: float 5.333232e+08, uint32 0x4DFE4EEF y: float -5.494575e+08, uint32 0xCE03003A c_result: float -1.613430e+07, uint32 0xCB7630A0 result: float -1.613434e+07, uint32 0xCB7630C0 err: 32 allowed_err: 16.1343 FAILED x: float 6.323657e+26, uint32 0x6C02C529 y: float -6.097366e+26, uint32 0xEBFC2E5F c_result: float 2.262910e+25, uint32 0x6995BF30 result: float 2.262914e+25, uint32 0x6995BF40 err: 3.68935e+19 allowed_err: 2.26291e+19 FAILED x: float 3.810096e-06, uint32 0x367FB0F5 y: float -3.834480e-06, uint32 0xB680A9EF c_result: float -2.438378e-08, uint32 0xB2D17480 result: float -2.438401e-08, uint32 0xB2D17500 err: 2.27374e-13 allowed_err: 2.43838e-14 FAILED x: float -5.382599e+36, uint32 0xFC8194D4 y: float 5.219303e+36, uint32 0x7C7B4CDF c_result: float -1.632965e+35, uint32 0xF9FB9920 result: float -1.632968e+35, uint32 0xF9FB9940 err: 3.16913e+29 allowed_err: 1.63297e+29 FAILED x: float -3.024259e-02, uint32 0xBCF7BF4F y: float 3.187624e-02, uint32 0x3D0290A8 c_result: float 1.633646e-03, uint32 0x3AD62010 result: float 1.633648e-03, uint32 0x3AD62020 err: 1.86265e-09 allowed_err: 1.63365e-09 FAILED x: float 5.217902e+05, uint32 0x48FEC7C7 y: float -5.445391e+05, uint32 0xC904F1B2 c_result: float -2.274891e+04, uint32 0xC6B1B9D0 result: float -2.274894e+04, uint32 0xC6B1B9E0 err: 0.03125 allowed_err: 0.0227489 FAILED x: float -1.727113e-12, uint32 0xABF311CF y: float 1.822396e-12, uint32 0x2C003D60 c_result: float 9.528326e-14, uint32 0x29D68F10 result: float 9.528337e-14, uint32 0x29D68F20 err: 1.0842e-19 allowed_err: 9.52833e-20 FAILED x: float 1.492741e-08, uint32 0x328039B6 y: float -1.454009e-08, uint32 0xB279CBFB c_result: float 3.873177e-10, uint32 0x2FD4EE20 result: float 3.873186e-10, uint32 0x2FD4EE40 err: 8.88178e-16 allowed_err: 3.87318e-16 FAILED x: float -4.595889e+15, uint32 0xD9829F7E y: float 4.434701e+15, uint32 0x597C1563 c_result: float -1.611882e+14, uint32 0xD7129990 result: float -1.611885e+14, uint32 0xD71299A0 err: 2.68435e+08 allowed_err: 1.61188e+08 FAILED 1000000 outputs checked. Test failed! float BIN_OP_PLUS_float_float_float(float left, float right) { // Get exponent for left and right uint8_t left_exponent; left_exponent = float_30_23(left); uint8_t right_exponent; right_exponent = float_30_23(right); float x; float y; // Step 1: Copy inputs so that left's exponent >= than right's. // ?????????MAYBE TODO: // Is this only needed for shift operation that takes unsigned only? // ALLOW SHIFT BY NEGATIVE????? // OR NO since that looses upper MSBs of mantissa which not acceptable? IDK too many drinks if ( right_exponent > left_exponent ) // Lazy switch to GT { x = right; y = left; } else { x = left; y = right; } // Step 2: Break apart into S E M // X uint23_t x_mantissa; x_mantissa = float_22_0(x); uint8_t x_exponent; x_exponent = float_30_23(x); uint1_t x_sign; x_sign = float_31_31(x); // Y uint23_t y_mantissa; y_mantissa = float_22_0(y); uint8_t y_exponent; y_exponent = float_30_23(y); uint1_t y_sign; y_sign = float_31_31(y); // Mantissa needs +3b wider // [sign][overflow][hidden][23 bit mantissa] // Put 0's in overflow bit and sign bit // Put a 1 hidden bit if exponent is non-zero. // X // Determine hidden bit uint1_t x_hidden_bit; if(x_exponent == 0) // lazy swith to == { x_hidden_bit = 0; } else { x_hidden_bit = 1; } // Apply hidden bit uint24_t x_mantissa_w_hidden_bit; x_mantissa_w_hidden_bit = uint1_uint23(x_hidden_bit, x_mantissa); // Y // Determine hidden bit uint1_t y_hidden_bit; if(y_exponent == 0) // lazy swith to == { y_hidden_bit = 0; } else { y_hidden_bit = 1; } // Apply hidden bit uint24_t y_mantissa_w_hidden_bit; y_mantissa_w_hidden_bit = uint1_uint23(y_hidden_bit, y_mantissa); // Step 3: Un-normalize Y (including hidden bit) so that xexp == yexp. // Already swapped left/right based on exponent // diff will be >= 0 uint8_t diff; diff = x_exponent - y_exponent; // Shift y by diff (bit manip pipelined function) uint24_t y_mantissa_w_hidden_bit_unnormalized; y_mantissa_w_hidden_bit_unnormalized = y_mantissa_w_hidden_bit >> diff; // Step 4: If necessary, negate mantissas (twos comp) such that add makes sense // STEP 2.B moved here // Make wider for twos comp/sign int25_t x_mantissa_w_hidden_bit_sign_adj; int25_t y_mantissa_w_hidden_bit_sign_adj; if(x_sign) //if(x_sign == 1) { x_mantissa_w_hidden_bit_sign_adj = uint24_negate(x_mantissa_w_hidden_bit); //Returns +1 wider signed, int25t } else { x_mantissa_w_hidden_bit_sign_adj = x_mantissa_w_hidden_bit; } if(y_sign) // if(y_sign == 1) { y_mantissa_w_hidden_bit_sign_adj = uint24_negate(y_mantissa_w_hidden_bit_unnormalized); } else { y_mantissa_w_hidden_bit_sign_adj = y_mantissa_w_hidden_bit_unnormalized; } // Step 5: Compute sum int26_t sum_mantissa; sum_mantissa = x_mantissa_w_hidden_bit_sign_adj + y_mantissa_w_hidden_bit_sign_adj; // Step 6: Save sign flag and take absolute value of sum. uint1_t sum_sign; sum_sign = int26_25_25(sum_mantissa); uint26_t sum_mantissa_unsigned; sum_mantissa_unsigned = int26_abs(sum_mantissa); // Step 7: Normalize sum and exponent. (Three cases.) uint1_t sum_overflow; sum_overflow = uint26_24_24(sum_mantissa_unsigned); uint8_t sum_exponent_normalized; uint23_t sum_mantissa_unsigned_normalized; if (sum_overflow) //if ( sum_overflow == 1 ) { // Case 1: Sum overflow. // Right shift significand by 1 and increment exponent. sum_exponent_normalized = x_exponent + 1; sum_mantissa_unsigned_normalized = uint26_23_1(sum_mantissa_unsigned); } else if(sum_mantissa_unsigned == 0) // laxy switch to == { // // Case 3: Sum is zero. sum_exponent_normalized = 0; sum_mantissa_unsigned_normalized = 0; } else { // Case 2: Sum is nonzero and did not overflow. // Dont waste zeros at start of mantissa // Find position of first non-zero digit from left // Know bit25(sign) and bit24(overflow) are not set // Hidden bit is [23], can narrow down to 24b wide including hidden bit uint24_t sum_mantissa_unsigned_narrow; sum_mantissa_unsigned_narrow = sum_mantissa_unsigned; uint5_t leading_zeros; // width = ceil(log2(len(sumsig))) leading_zeros = count0s_uint24(sum_mantissa_unsigned_narrow); // Count from left/msbs downto, uintX_count0s counts from right // NOT CHECKING xexp < adj // Case 2b: Adjust significand and exponent. sum_exponent_normalized = x_exponent - leading_zeros; sum_mantissa_unsigned_normalized = sum_mantissa_unsigned_narrow << leading_zeros; } // Declare the output portions uint23_t z_mantissa; uint8_t z_exponent; uint1_t z_sign; z_sign = sum_sign; z_exponent = sum_exponent_normalized; z_mantissa = sum_mantissa_unsigned_normalized; // Assemble output return float_uint1_uint8_uint23(z_sign, z_exponent, z_mantissa); } — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACBHVWPNPI3WP3HVPPRIE53UFTFYLANCNFSM5FFFAISQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

1 reply

JulianKemmerer Oct 7, 2021
Maintainer

I did see ieee754.hpp but its a little difficult to translate to C - looking at it more.

It seems like I am dropping lsbs from the mantissa but idk where/how.

suarezvictor · 2021-10-07T00:54:37Z

suarezvictor
Oct 7, 2021
Author

Hopefully on a couple of days I can provide an implementation El mié., 6 oct. 2021 21:46, Julian Kemmerer ***@***.***> escribió:

…

I did see ieee754.hpp but its a little difficult to translate to C - looking at it more. It seems like I am dropping lsbs from the mantissa but idk where/how. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACBHVWNOUMYJWUDWHR7WPU3UFTUVHANCNFSM5FFFAISQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

1 reply

JulianKemmerer Oct 7, 2021
Maintainer

I think that I just am not using enough fractional bits for the mantissa.

x: float 4.015406e+03, uint32 0x457AF67D 
y: float -4.258397e+03, uint32 0xC585132D 
c_result: float -2.429915e+02, uint32 0xC372FDD0     11000011011100101111110111010000
result: float -2.429917e+02,   uint32 0xC372FDE0     11000011011100101111110111100000
err: 0.000244141 allowed_err: 0.000242991 FAILED

I think losing some lsbs from y mantissa when shifting to righ to to match x exponent - idk where else the lsbs would be coming from

uint8_t diff;
  diff = x_exponent - y_exponent;
  uint24_t y_mantissa_w_hidden_bit_unnormalized;
  y_mantissa_w_hidden_bit_unnormalized = y_mantissa_w_hidden_bit >> diff;

And no worries take your time

This stuff drains me so I'll probably be needing some time myself too

suarezvictor · 2021-10-07T16:19:44Z

suarezvictor
Oct 7, 2021
Author

I wish you solve the issue, but computing floating points with justo integers is an old problem with many proven implementations. If it keeps being hard, maybe best solution is to copy another implementation that already faced such problems and maybe others not yet discovered El jue., 7 oct. 2021 00:53, Julian Kemmerer ***@***.***> escribió:

…

I think that I just am not using enough fractional bits for the mantissa. x: float 4.015406e+03, uint32 0x457AF67D y: float -4.258397e+03, uint32 0xC585132D c_result: float -2.429915e+02, uint32 0xC372FDD0 11000011011100101111110111010000 result: float -2.429917e+02, uint32 0xC372FDE0 11000011011100101111110111100000 err: 0.000244141 allowed_err: 0.000242991 FAILED I think losing some lsbs from y mantissa when shifting to righ to to match x exponent - idk where else the lsbs would be coming from uint8_t diff; diff = x_exponent - y_exponent; uint24_t y_mantissa_w_hidden_bit_unnormalized; y_mantissa_w_hidden_bit_unnormalized = y_mantissa_w_hidden_bit >> diff; And no worries take your time This stuff drains me so I'll probably be needing some time myself too — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACBHVWM57CFTKTJZMQLP7F3UFUKU5ANCNFSM5FFFAISQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

1 reply

JulianKemmerer Oct 7, 2021
Maintainer

That makes alot of sense - I was about to give up and start fresh from for ex. Berkeley softfloat

JulianKemmerer · 2021-10-07T23:10:57Z

JulianKemmerer
Oct 7, 2021
Maintainer

But then, inspired by looking at softfloat code and seeing they pad their mantissa on the right with 6 bits - I did the same - giving myself more lsbs.

And now the fp32add is passing tests - and another 10x more random tests after that too. Seems good. So I am going to get that checked in tonight. Finally getting over this bug it seems.

0 replies

suarezvictor · 2021-10-08T00:57:40Z

suarezvictor
Oct 8, 2021
Author

This is quite good Julian, could you show the softfloat code you have used?

…

On Thu, Oct 7, 2021 at 8:11 PM Julian Kemmerer ***@***.***> wrote: But then, inspired by looking at softfloat code and seeing they pad their mantissa on the right with 6 bits - I did the same - giving myself more lsbs. And now the fp32add is passing tests - and another 10x more random tests after that too. Seems good. So I am going to get that checked in tonight. Finally getting over this bug it seems. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACBHVWO2PCC5UDCGF3PFR5LUFYSI3ANCNFSM5FFFAISQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

2 replies

JulianKemmerer Oct 8, 2021
Maintainer

The Berkeley code is
softfloat.c.txt
~706 , then line ~717+718 are what I noticed
Ill share the PipelineC version soon

JulianKemmerer Oct 8, 2021
Maintainer

float BIN_OP_PLUS_float_float_float(float left, float right)
{
  // Get exponent for left and right
  uint8_t left_exponent;
  left_exponent = float_30_23(left);
  uint8_t right_exponent;
  right_exponent = float_30_23(right);
    
  float x;
  float y;
  // Step 1: Copy inputs so that left's exponent >= than right's.
  //    Is this only needed for shift operation that takes unsigned only?
  //    ALLOW SHIFT BY NEGATIVE?????
  if ( right_exponent > left_exponent ) // Lazy switch to GT
  {
     x = right;  
     y = left;
  }
  else
  { 
     x = left;
     y = right;
  }
  
  // Step 2: Break apart into S E M
  // X
  uint23_t x_mantissa; 
  x_mantissa = float_22_0(x);
  uint8_t x_exponent;
  x_exponent = float_30_23(x);
  uint1_t x_sign;
  x_sign = float_31_31(x);
  // Y
  uint23_t y_mantissa;
  y_mantissa = float_22_0(y);
  uint8_t y_exponent;
  y_exponent = float_30_23(y);
  uint1_t y_sign;
  y_sign = float_31_31(y);
  
  // Mantissa needs +3b wider
  //  [sign][overflow][hidden][23 bit mantissa]
  // Put 0's in overflow bit and sign bit
  // Put a 1 hidden bit if exponent is non-zero.
  // X
  // Determine hidden bit
  uint1_t x_hidden_bit;
  if(x_exponent == 0) // lazy swith to ==
  {
    x_hidden_bit = 0;
  }
  else
  {
    x_hidden_bit = 1;
  }
  // Apply hidden bit
  uint24_t x_mantissa_w_hidden_bit; 
  x_mantissa_w_hidden_bit = uint1_uint23(x_hidden_bit, x_mantissa);
  // Y
  // Determine hidden bit
  uint1_t y_hidden_bit;
  if(y_exponent == 0) // lazy swith to ==
  {
    y_hidden_bit = 0;
  }
  else
  {
    y_hidden_bit = 1;
  }
  // Apply hidden bit
  uint24_t y_mantissa_w_hidden_bit; 
  y_mantissa_w_hidden_bit = uint1_uint23(y_hidden_bit, y_mantissa);

  // Step 4: If necessary, negate mantissas (twos comp) such that add makes sense
  // STEP 2.B moved here
  // Make wider for twos comp/sign
  int25_t x_mantissa_w_hidden_bit_sign_adj;
  int25_t y_mantissa_w_hidden_bit_sign_adj;
  if(x_sign) //if(x_sign == 1)
  {
    x_mantissa_w_hidden_bit_sign_adj = uint24_negate(x_mantissa_w_hidden_bit); //Returns +1 wider signed, int25t
  }
  else
  {
    x_mantissa_w_hidden_bit_sign_adj = x_mantissa_w_hidden_bit;
  }
  if(y_sign) // if(y_sign == 1)
  {
    y_mantissa_w_hidden_bit_sign_adj = uint24_negate(y_mantissa_w_hidden_bit);
  }
  else
  {
    y_mantissa_w_hidden_bit_sign_adj = y_mantissa_w_hidden_bit;
  }
  
  // Padd both x and y on right with zeros (shift left) such that 
  // when y is shifted to the right it doesnt drop mantissa lsbs (as much)
  int31_t x_mantissa_w_hidden_bit_sign_adj_rpad = int25_uint6(x_mantissa_w_hidden_bit_sign_adj, 0);
  int31_t y_mantissa_w_hidden_bit_sign_adj_rpad = int25_uint6(y_mantissa_w_hidden_bit_sign_adj, 0);

  // Step 3: Un-normalize Y (including hidden bit) so that xexp == yexp.
  // Already swapped left/right based on exponent
  // diff will be >= 0
  uint8_t diff;
  diff = x_exponent - y_exponent;
  // Shift y by diff (bit manip pipelined function)
  int31_t y_mantissa_w_hidden_bit_sign_adj_rpad_unnormalized;
  y_mantissa_w_hidden_bit_sign_adj_rpad_unnormalized = y_mantissa_w_hidden_bit_sign_adj_rpad >> diff;
  
  // Step 5: Compute sum 
  int32_t sum_mantissa;
  sum_mantissa = x_mantissa_w_hidden_bit_sign_adj_rpad + y_mantissa_w_hidden_bit_sign_adj_rpad_unnormalized;

  // Step 6: Save sign flag and take absolute value of sum.
  uint1_t sum_sign;
  sum_sign = int32_31_31(sum_mantissa);
  uint31_t sum_mantissa_unsigned;
  sum_mantissa_unsigned = int32_abs(sum_mantissa);

  // Step 7: Normalize sum and exponent. (Three cases.)
  uint1_t sum_overflow;
  sum_overflow = uint31_30_30(sum_mantissa_unsigned);
  uint8_t sum_exponent_normalized;
  uint23_t sum_mantissa_unsigned_normalized;
  if (sum_overflow) //if ( sum_overflow == 1 )
  {
     // Case 1: Sum overflow.
     //         Right shift significand by 1 and increment exponent.
     sum_exponent_normalized = x_exponent + 1;
     sum_mantissa_unsigned_normalized = uint31_29_7(sum_mantissa_unsigned);
  }
  else if(sum_mantissa_unsigned == 0) // laxy switch to ==
  {
     //
     // Case 3: Sum is zero.
     sum_exponent_normalized = 0;
     sum_mantissa_unsigned_normalized = 0;
  }
  else
  {
     // Case 2: Sum is nonzero and did not overflow.
     // Dont waste zeros at start of mantissa
     // Find position of first non-zero digit from left
     // Know bit25(sign) and bit24(overflow) are not set
     // Hidden bit is [23], can narrow down to 24b wide including hidden bit 
     uint30_t sum_mantissa_unsigned_narrow;
     sum_mantissa_unsigned_narrow = sum_mantissa_unsigned;
     uint5_t leading_zeros; // width = ceil(log2(len(sumsig)))
     leading_zeros = count0s_uint30(sum_mantissa_unsigned_narrow); // Count from left/msbs downto, uintX_count0s counts from right
     // NOT CHECKING xexp < adj
     // Case 2b: Adjust significand and exponent.
     sum_exponent_normalized = x_exponent - leading_zeros;
     uint30_t sum_mantissa_unsigned_normalized_rpad = sum_mantissa_unsigned_narrow << leading_zeros;
     sum_mantissa_unsigned_normalized = uint30_28_6(sum_mantissa_unsigned_normalized_rpad);
  }
  
  // Declare the output portions
  uint23_t z_mantissa;
  uint8_t z_exponent;
  uint1_t z_sign;
  z_sign = sum_sign;
  z_exponent = sum_exponent_normalized;
  z_mantissa = sum_mantissa_unsigned_normalized;
  // Assemble output  
  return float_uint1_uint8_uint23(z_sign, z_exponent, z_mantissa);
}

float BIN_OP_MINUS_float_float_float(float left, float right)
{
  uint1_t right_sign;
  right_sign = float_31_31(right);
  uint31_t right_everythingelse = float_30_0(right);
  uint32_t negated_right_unsigned = uint1_uint31(!right_sign, right_everythingelse);
  float negated_right = float_uint32(negated_right_unsigned);
  return left + negated_right;
}

JulianKemmerer · 2021-10-08T01:33:39Z

JulianKemmerer
Oct 8, 2021
Maintainer

All of the math_pkg https://github.com/JulianKemmerer/PipelineC/tree/master/examples/verilator/math_pkg functions should be passing tests now 👍

Thinking about what to do next...
Might spend a little time on VGA stuff

1 reply

JulianKemmerer Oct 8, 2021
Maintainer

oh well what I should really do is make sure to report that GHDL assertion failure

JulianKemmerer · 2021-10-09T00:54:14Z

JulianKemmerer
Oct 9, 2021
Maintainer

I started an issue:
ghdl/ghdl-yosys-plugin#159

0 replies

JulianKemmerer · 2021-10-09T03:05:33Z

JulianKemmerer
Oct 9, 2021
Maintainer

I want to put some thought into next steps for math package

I am picturing something as simple as a github wiki page with a bunch of tables.
One table for each fpga part/family (similar enough where '#LUTS' for examples is meaningful to compare).

Table header looks like ex.

FPGA Part	FuncName	FMAX	Latency	LUTS	DSPs	Download

Where last column has link to download some .zip of the source files.

Which then makes me wonder - how to go about packaging these vhdl outputs - folks are probably going to use more than one of these 'cores' at a time too - can't interfere with each other.

2 replies

JulianKemmerer Oct 9, 2021
Maintainer

oh and another column for link to the repo C source/tests that produced the output

JulianKemmerer Oct 9, 2021
Maintainer

Also any opinions on where how to publish this? I was picturing on github? Via the PipelineC wiki?

suarezvictor · 2021-10-09T13:28:35Z

suarezvictor
Oct 9, 2021
Author

I like you tables on a github wiki. What I'd suggest is that we make a new repo for libraries and projects "made with pipelinec" like most other projects do. That would stimulate additions by others El sáb., 9 oct. 2021 00:07, Julian Kemmerer ***@***.***> escribió:

…

Also any opinions on where how to publish this? I was picturing on github? Via the PipelineC wiki? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACBHVWNBP25A7XHVMVLTGITUF6WYJANCNFSM5FFFAISQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

0 replies

suarezvictor · 2021-10-09T13:31:59Z

suarezvictor
Oct 9, 2021
Author

Abother download column can have verilog sources translated with yosys El sáb., 9 oct. 2021 10:28, Victor Suarez Rovere ***@***.***> escribió:

…

I like you tables on a github wiki. What I'd suggest is that we make a new repo for libraries and projects "made with pipelinec" like most other projects do. That would stimulate additions by others El sáb., 9 oct. 2021 00:07, Julian Kemmerer ***@***.***> escribió: > Also any opinions on where how to publish this? I was picturing on > github? Via the PipelineC wiki? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#25 (reply in thread)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ACBHVWNBP25A7XHVMVLTGITUF6WYJANCNFSM5FFFAISQ> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > >

1 reply

JulianKemmerer Oct 9, 2021
Maintainer

I am thinking about generating a single VHDL file multi-module output - like the big single file top.v you get generated by yosys.

That way in both cases the user has one VHDL file or one Verilog file to include in their project. If the user is using a bunch of these cores I wonder if the multiple copies of modules in these big files will interfere hmmm...

suarezvictor · 2021-10-10T22:36:27Z

suarezvictor
Oct 10, 2021
Author

Page 203 of intel HLS document list math functions supported (math.h like functions and others like fixed point support): https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/hls/mnl-hls-reference.pdf
It would be useful to know fmax and area metrics

1 reply

JulianKemmerer Oct 11, 2021
Maintainer

Hmm thats a nice list of functionality we can try to copy 👍

JulianKemmerer · 2021-10-11T17:03:56Z

JulianKemmerer
Oct 11, 2021
Maintainer

I bet its illegal or against some terms to, for example, run Intel HLS and publicly share the output HDL i.e. doing what we are doing.
Seems like a unique opportunity we have here using PipelineC to do the pipelining instead of some big vendor HLS tool

0 replies

suarezvictor · 2021-10-11T19:16:47Z

suarezvictor
Oct 11, 2021
Author

by no means I'm proposing to use intel HLS, I just posted the list of supported functions to offer an alternative in regards to xilinx code, I've seen many sources published with Apache license. It seems, for example, that for expressing an integer of different widths a template type called ap_int<> is normally used (both intel and xilinx use such convention) that can be also used to follow such convention

…

On Mon, Oct 11, 2021 at 2:04 PM Julian Kemmerer ***@***.***> wrote: I bet its illegal or against some terms to, for example, run Intel HLS and publicly share the output HDL i.e. doing what we are doing. Seems like a unique opportunity we have here using PipelineC to do the pipelining instead of some big vendor HLS tool — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACBHVWPFV3BE4SPL457654DUGMKIPANCNFSM5FFFAISQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

1 reply

JulianKemmerer Oct 12, 2021
Maintainer

Oh interesting - I didnt notice that shared use of ap_int<>

Math library #25

suarezvictor Oct 1, 2021

Replies: 41 comments · 30 replies

JulianKemmerer Oct 3, 2021 Maintainer

suarezvictor Oct 3, 2021 Author

JulianKemmerer Oct 3, 2021 Maintainer

JulianKemmerer Oct 3, 2021 Maintainer

JulianKemmerer Oct 3, 2021 Maintainer

JulianKemmerer Oct 3, 2021 Maintainer

JulianKemmerer Oct 3, 2021 Maintainer

JulianKemmerer Oct 3, 2021 Maintainer

JulianKemmerer Oct 3, 2021 Maintainer

JulianKemmerer Oct 3, 2021 Maintainer

JulianKemmerer Oct 3, 2021 Maintainer

JulianKemmerer Oct 3, 2021 Maintainer

suarezvictor Oct 3, 2021 Author

JulianKemmerer Oct 3, 2021 Maintainer

JulianKemmerer Oct 3, 2021 Maintainer

suarezvictor Oct 4, 2021 Author

JulianKemmerer Oct 4, 2021 Maintainer

JulianKemmerer Oct 4, 2021 Maintainer

suarezvictor Oct 4, 2021 Author

JulianKemmerer Oct 4, 2021 Maintainer

JulianKemmerer Oct 4, 2021 Maintainer

suarezvictor Oct 6, 2021 Author

JulianKemmerer Oct 6, 2021 Maintainer

suarezvictor Oct 6, 2021 Author

JulianKemmerer Oct 7, 2021 Maintainer

suarezvictor Oct 7, 2021 Author

JulianKemmerer Oct 7, 2021 Maintainer

suarezvictor Oct 7, 2021 Author

JulianKemmerer Oct 7, 2021 Maintainer

JulianKemmerer Oct 7, 2021 Maintainer

suarezvictor Oct 8, 2021 Author

JulianKemmerer Oct 8, 2021 Maintainer

JulianKemmerer Oct 8, 2021 Maintainer

JulianKemmerer Oct 8, 2021 Maintainer

JulianKemmerer Oct 8, 2021 Maintainer

JulianKemmerer Oct 9, 2021 Maintainer

JulianKemmerer Oct 9, 2021 Maintainer

JulianKemmerer Oct 9, 2021 Maintainer

JulianKemmerer Oct 9, 2021 Maintainer

suarezvictor Oct 9, 2021 Author

suarezvictor Oct 9, 2021 Author

JulianKemmerer Oct 9, 2021 Maintainer

suarezvictor Oct 10, 2021 Author

JulianKemmerer Oct 11, 2021 Maintainer

JulianKemmerer Oct 11, 2021 Maintainer

suarezvictor Oct 11, 2021 Author

JulianKemmerer Oct 12, 2021 Maintainer

suarezvictor
Oct 1, 2021

Replies: 41 comments 30 replies

JulianKemmerer
Oct 3, 2021
Maintainer

suarezvictor
Oct 3, 2021
Author

JulianKemmerer Oct 3, 2021
Maintainer

JulianKemmerer Oct 3, 2021
Maintainer

JulianKemmerer
Oct 3, 2021
Maintainer

JulianKemmerer
Oct 3, 2021
Maintainer

JulianKemmerer
Oct 3, 2021
Maintainer

JulianKemmerer
Oct 3, 2021
Maintainer

JulianKemmerer
Oct 3, 2021
Maintainer

JulianKemmerer
Oct 3, 2021
Maintainer

JulianKemmerer
Oct 3, 2021
Maintainer

JulianKemmerer
Oct 3, 2021
Maintainer

suarezvictor
Oct 3, 2021
Author

JulianKemmerer Oct 3, 2021
Maintainer

JulianKemmerer
Oct 3, 2021
Maintainer

suarezvictor
Oct 4, 2021
Author

JulianKemmerer
Oct 4, 2021
Maintainer

JulianKemmerer Oct 4, 2021
Maintainer

suarezvictor
Oct 4, 2021
Author

JulianKemmerer Oct 4, 2021
Maintainer

JulianKemmerer Oct 4, 2021
Maintainer

suarezvictor
Oct 6, 2021
Author

JulianKemmerer
Oct 6, 2021
Maintainer

suarezvictor
Oct 6, 2021
Author

JulianKemmerer Oct 7, 2021
Maintainer

suarezvictor
Oct 7, 2021
Author

JulianKemmerer Oct 7, 2021
Maintainer

suarezvictor
Oct 7, 2021
Author

JulianKemmerer Oct 7, 2021
Maintainer

JulianKemmerer
Oct 7, 2021
Maintainer

suarezvictor
Oct 8, 2021
Author

JulianKemmerer Oct 8, 2021
Maintainer

JulianKemmerer Oct 8, 2021
Maintainer

JulianKemmerer
Oct 8, 2021
Maintainer

JulianKemmerer Oct 8, 2021
Maintainer

JulianKemmerer
Oct 9, 2021
Maintainer

JulianKemmerer
Oct 9, 2021
Maintainer

JulianKemmerer Oct 9, 2021
Maintainer

JulianKemmerer Oct 9, 2021
Maintainer

suarezvictor
Oct 9, 2021
Author

suarezvictor
Oct 9, 2021
Author

JulianKemmerer Oct 9, 2021
Maintainer

suarezvictor
Oct 10, 2021
Author

JulianKemmerer Oct 11, 2021
Maintainer

JulianKemmerer
Oct 11, 2021
Maintainer

suarezvictor
Oct 11, 2021
Author

JulianKemmerer Oct 12, 2021
Maintainer