-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
^(::Float64, ::Integer) incorrect subnormal results #19872
Comments
This shows the cause of the problem pretty clearly:
Basically, it's doing: function foo(x)
for i = 1:10
x = x*x
end
return 1 / x
end The last iteration of the loop overflows to give |
Even without overflow, we still have the issue with intermediate rounding:
i.e. the |
Basically if we want this to be accurate to < 1 ulp (which is a typical reasonable standard), the LLVM approach is fine for exponents between -1 and |
We could also try using |
to get back the exact v0.4 implementation, basically, we just need to do: diff --git a/src/intrinsics.cpp b/src/intrinsics.cpp
index 7eedd9f..0125c79 100644
--- a/src/intrinsics.cpp
+++ b/src/intrinsics.cpp
@@ -1456,7 +1456,7 @@ static Value *emit_untyped_intrinsic(intrinsic f, Value *x, Value *y, Value *z,
x = FP(x);
y = JL_INT(y);
Type *tx = x->getType(); // TODO: LLVM expects this to be i32
-#if JL_LLVM_VERSION >= 30600
+#if 0
Type *ts[1] = { tx };
Value *powi = Intrinsic::getDeclaration(jl_Module, Intrinsic::powi,
ArrayRef<Type*>(ts)); If we want to conditionally use diff --git a/src/codegen.cpp b/src/codegen.cpp
index 83a80f0..2df6ae3 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -389,10 +389,8 @@ static Function *expect_func;
static Function *jldlsym_func;
static Function *jlnewbits_func;
static Function *jltypeassert_func;
-#if JL_LLVM_VERSION < 30600
static Function *jlpow_func;
static Function *jlpowf_func;
-#endif
//static Function *jlgetnthfield_func;
static Function *jlgetnthfieldchecked_func;
//static Function *jlsetnthfield_func;
@@ -5950,7 +5948,6 @@ static void init_julia_llvm_env(Module *m)
"jl_gc_diff_total_bytes", m);
add_named_global(diff_gc_total_bytes_func, *jl_gc_diff_total_bytes);
-#if JL_LLVM_VERSION < 30600
Type *powf_type[2] = { T_float32, T_float32 };
jlpowf_func = Function::Create(FunctionType::get(T_float32, powf_type, false),
Function::ExternalLinkage,
@@ -5968,7 +5965,7 @@ static void init_julia_llvm_env(Module *m)
&pow,
#endif
false);
-#endif
+
std::vector<Type*> array_owner_args(0);
array_owner_args.push_back(T_pjlvalue);
jlarray_data_owner_func =
diff --git a/src/intrinsics.cpp b/src/intrinsics.cpp
index 7eedd9f..82e1a09 100644
--- a/src/intrinsics.cpp
+++ b/src/intrinsics.cpp
@@ -1456,20 +1456,26 @@ static Value *emit_untyped_intrinsic(intrinsic f, Value *x, Value *y, Value *z,
x = FP(x);
y = JL_INT(y);
Type *tx = x->getType(); // TODO: LLVM expects this to be i32
+ Function *powi = (tx == T_float64 ? jlpow_func : jlpowf_func);
#if JL_LLVM_VERSION >= 30600
- Type *ts[1] = { tx };
- Value *powi = Intrinsic::getDeclaration(jl_Module, Intrinsic::powi,
- ArrayRef<Type*>(ts));
+ if (ConstantInt *cy = dyn_cast<ConstantInt>(y)) {
+ if (cy->isMinusOne() || !cy->uge(5)) {
+ powi = Intrinsic::getDeclaration(jl_Module,
+ Intrinsic::powi,
+ makeArrayRef(tx));
+ }
+ }
+#endif
+ // issue #6506
+ if (!powi->isIntrinsic()) {
+ powi = static_cast<Function*>(prepare_call(powi));
+ y = builder.CreateSIToFP(y, tx);
+ }
#if JL_LLVM_VERSION >= 30700
return builder.CreateCall(powi, {x, y});
#else
return builder.CreateCall2(powi, x, y);
#endif
-#else
- // issue #6506
- return builder.CreateCall2(prepare_call(tx == T_float64 ? jlpow_func : jlpowf_func),
- x, builder.CreateSIToFP(y, tx));
-#endif
}
case sqrt_llvm_fast: {
x = FP(x); |
Nice, that patch looks like a good way to go. |
Alternatively, if we want to preserve all of the constant-folding abilities, we can do effectively the same, but as an LLVM pass scheduled near the end of our pipeline. |
Note this is indeed the actual implementation (https://llvm.org/svn/llvm-project/compiler-rt/tags/Apple/Libcompiler_rt-14/lib/powidf2.c), not just similar to it. Its also defined this way regardless of compiler optimization. Per the spec for this function: — Built-in Function: double __builtin_powi (double, int) We could also do this branch at runtime, if that sounds worthwhile: switch %p powf [-1 inv, 0 one, 1 unity, 2 sqr, 3 triple ]
inv:
1 / x
one:
1
unity:
x
square:
x*x
triple:
x*x*x
powf:
powf(x, float(p)) |
Finally, I'll just note that the |
scratch that. we can't handle this until we can infer function purity. however, it turns out that LLVM already does all of the valid optimizations I proposed above, as long as you don't explicitly tell it to give you the low precision answer (as we do now). I'll put up a PR. |
the powi intrinsic optimization is that it is inaccurate, where it is equally accurate (e.g. tiny constant powers) LLVM will already recongnize and optimize and call to a function named `powf`. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate, When it is equally accurate (e.g. tiny constant powers) LLVM will already recongnize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872
The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix JuliaLang#19872
For some reason
powi
seems to be flushing some subnormals to zero:From 0.4 (correct behaviour):
On 0.5 (incorrect)
On master (incorrect)
The text was updated successfully, but these errors were encountered: