Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More i686 testing issues #8812

Closed
staticfloat opened this issue Oct 25, 2014 · 15 comments
Closed

More i686 testing issues #8812

staticfloat opened this issue Oct 25, 2014 · 15 comments
Labels
system:32-bit Affects only 32-bit systems

Comments

@staticfloat
Copy link
Member

I'd really like to get the MARCH=i686 builds working, but there is definitely something funky going on with the floating point arithmetic. Many things that should be "precise" in our floating-point math are only approximate once you compile for i686. I've got some fixes pushed to this branch (which may themselves be a little suspect), but in any case we should probably try to get these fixed, since we need to compile 32-bit binaries as i686 to allow for older AMD processors.

@staticfloat staticfloat added the system:32-bit Affects only 32-bit systems label Oct 25, 2014
@staticfloat
Copy link
Member Author

The reason I'm opening this issue is because I'd like to see if there's something we can do to fix floating point behavior systematically. The floating point inaccuracy is causing all sorts of problems:

julia> sind(30)
0.49999999999999994

julia> rationalize(Int32, -2.7, tol=0)
-27//10

julia> rationalize(Int32, -2.7, tol=0) == -2.7
false

I'm not sure if I should make the equivalence check here an approximate equivalence check, or whether we really can get good floating point precision when compiling against i686.

@ViralBShah
Copy link
Member

Someone who knows what defaults LLVM uses should chime in here. Are any of these, especially fast-math enabled?

http://llvm.org/docs/LangRef.html#fastmath

@eschnett
Copy link
Contributor

The problem on non-SSE Architectures (such as i686) is not lack of
precision, but excess precision. Thus is still inconvenient, since results
differ depending on where "too much" precision is used.

The flag -ffloat-store prevents this excess precision. It also slows down
execution a bit.

@tkelman
Copy link
Contributor

tkelman commented Oct 26, 2014

I guess even i686 uses legacy 80-bit x87 floating point math then? If we can't come up with a systematic solution then we do need to draw the line somewhere regarding how old of a processor we can realistically support. pentium4 with SSE allows us to not worry about the x87 floating point issues, but would rule out at least 1 real user.

@eschnett
Copy link
Contributor

Yes, i686 may not have SSE2 instructions. I would simply add the -ffloat-store for this architecture, as this is the accepted way to obtain reproducible math results there. Bonus points for auto-detecting SSE2, and using -fsse2-math (sp?) instead in this case.

@tkelman
Copy link
Contributor

tkelman commented Oct 27, 2014

I would simply add the -ffloat-store for this architecture, as this is the accepted way to obtain reproducible math results there.

Worth trying. How many different places will we need it? Just openlibm, or across all deps?

Bonus points for auto-detecting SSE2, and using -fsse2-math (sp?) instead in this case.

We don't currently have any compile-time processor feature detection, do we? Keying this off of MARCH might be simplest.

@eschnett
Copy link
Contributor

-ffloat-store should be necessary everywhere that may perform floating-point operations. So probably everywhere. It may also be necessary when calling LLVM to generate code.

@staticfloat
Copy link
Member Author

@eschnett Unfortunately, adding -ffloat-store to the CFLAGS of LLVM, Julia, and openlibm doesn't work for me. I still get the same errors.

@eschnett
Copy link
Contributor

@staticfloat You would also need to ensure that Julia uses the equivalent of this flag when generating code by calling LLVM. I don't think there is a flag for this -- this is rather a code generation issue. It may even be necessary to modify Julia's code generator to store an immediately re-load every floating point value after performing a float-point operation.

@staticfloat
Copy link
Member Author

@vtjnash @Keno Do either of you know how I might go about doing this?

@vtjnash
Copy link
Member

vtjnash commented Oct 30, 2014

Most direct approach is usually to checkout a source copy of clang and see what it translates the flag to

@staticfloat
Copy link
Member Author

Unfortunately, clang doesn't support -ffloat-store. I found the option -mlimit-float-precision, but that looks like it maps to LimitFloatPrecision in LLVM-land which I believe is meant for much lower precision floating point operations?

@eschnett
Copy link
Contributor

Apparently clang removed this option some time in the past two years. Yes, LimitFloatPrecision seems to be for something else.

I investigated a bit, and am quite surprised at how difficult this is. Apparently, C99 mandates that this excess precision can be removed by rounding, so that e.g.

a = b+d+c;

may have too much precision, while

tmp = a+b;
a = tmp+c;

will not. GCC agrees with this, Clang doesn't -- and I also don't know whether this is Clang or LLVM's optimizer.

One way out I found is described here: http://stackoverflow.com/questions/17663780/is-there-a-document-describing-how-clang-handles-excess-floating-point-precision.

One can set the i387 control word to round to a particular precision, e.g. double or float. Setting it once at startup would be fine; however, one then has to choose between single and double precision. Thus one would need to do this before every floating point operation. Setting the control word is probably expensive.

Another option would be to store and re-load floating point values to memory after each operation. This is probably cheaper than changing the i387 control word. If LLVM doesn't support this out of the box, then it may be necessary to implement this in Julia's code generator. This should be straightforward, but tedious. This would essentially implement the effects of -ffloat-store explicitly.

Taking a step back: The older AMD processors you mention, do they support SSE2 intrinsics? If so, it would be much easier to force LLVM to use SSE2 for math instead of i387.

@staticfloat
Copy link
Member Author

No, the whole problem here is that the SSE2 instructions are not supported
on that architecture. Reimplementing -ffloat-store is way too much work.
An LLVM option to set this would be acceptable, but I have no idea how to
do that. We shouldn't lose sleep over this, it's more of a completeness and
correctness thing than anything else. I doubt anyone truly cares about
this, since all remotely modern hardware has SSE2. I'll leave this open a
little longer, and then if we can't come up with a good solution, I'll just
provide an i686 build on demand.
-E

On Thu, Oct 30, 2014 at 7:05 PM, Erik Schnetter notifications@github.com
wrote:

Apparently clang removed this option some time in the past two years. Yes,
LimitFloatPrecision seems to be for something else.

I investigated a bit, and am quite surprised at how difficult this is.
Apparently, C99 mandates that this excess precision can be removed by
rounding, so that e.g.

a = b+d+c;

may have too much precision, while

tmp = a+b;
a = tmp+c;

will not. GCC agrees with this, Clang doesn't -- and I also don't know
whether this is Clang or LLVM's optimizer.

One way out I found is described here:
http://stackoverflow.com/questions/17663780/is-there-a-document-describing-how-clang-handles-excess-floating-point-precision
.

One can set the i387 control word to round to a particular precision, e.g.
double or float. Setting it once at startup would be fine; however, one
then has to choose between single and double precision. Thus one would need
to do this before every floating point operation. Setting the control word
is probably expensive.

Another option would be to store and re-load floating point values to
memory after each operation. This is probably cheaper than changing the
i387 control word. If LLVM doesn't support this out of the box, then it may
be necessary to implement this in Julia's code generator. This should be
straightforward, but tedious. This would essentially implement the effects
of -ffloat-store explicitly.

Taking a step back: The older AMD processors you mention, do they support
SSE2 intrinsics? If so, it would be much easier to force LLVM to use SSE2
for math instead of i387.


Reply to this email directly or view it on GitHub
#8812 (comment).

@nalimilan
Copy link
Member

Agreed, it's not worth spending too much time on it.

If there's a simple solution which requires little work, even at the cost of a very slow execution, then go for it, and ship a special i686 build with a big warning on start. People using old CPUs like that are not going to ask for speed anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:32-bit Affects only 32-bit systems
Projects
None yet
Development

No branches or pull requests

6 participants