[perf] Fix Taichi CPU backend compile parameter to pair performance w…

…ith Numba. (#7731) Issue: #7442 ### Brief Summary In this issue, Numba is a magnitude faster than Taichi due to the absence of automatic vectorization. The root cause is the incorrect passage of the `fast_flag`. To solve this problem, `fast_flag` is now added to the initialization of cpu codegen. Numba and Taichi now reveal comparable performance. Here's perf comparison: numba: 13052.542478MFlops taichi(master): 6544.274409MFlops taichi(this pr): 12778.240179MFlops --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
taichi-dev · Apr 13, 2023 · 4eea1ec · 4eea1ec
1 parent 0d26ffa
commit 4eea1ec
Showing 1 changed file with 7 additions and 0 deletions.
diff --git a/taichi/codegen/llvm/codegen_llvm.cpp b/taichi/codegen/llvm/codegen_llvm.cpp
@@ -2542,6 +2542,13 @@ void TaskCodeGenLLVM::initialize_context() {
   TI_ASSERT(tlctx != nullptr);
   llvm_context = tlctx->get_this_thread_context();
   builder = std::make_unique<llvm::IRBuilder<>>(*llvm_context);
+  if (compile_config.fast_math) {
+    llvm::FastMathFlags fast_flags;
+    fast_flags.setNoInfs();
+    fast_flags.setNoSignedZeros();
+    fast_flags.setAllowReassoc();
+    builder->setFastMathFlags(fast_flags);
+  }
 }
 
 llvm::Value *TaskCodeGenLLVM::get_arg(int i) {