sync branches: develop and infrt (#39509)

* 【Pten】Adjust the Empyt dev_api (#39143) * adjust the Empyt dev_api * fix merge conflict * fix sparse_utils_kernel * Fix code conflict of empty dev_api (#39430) * fix code conflict * clear cache * just try * [PluggableDevice] custom kernel supports multi cpp_dtype registering (#39385) * [PTen] Add standard kernel suffix set (#39404) * add standard_suffix_set_and_remove_reshape_with_xshape * revert reshape change * polish reduce name * [pten] update isnan registration (#39419) * update isnan registration * fix compile * [bf16] add bf16 kernel: dropout & reshape & slice (#39395) * add dropout * add reshape * add slice * refien slice unittest * refine slice unittest * add cpu bf16 kernel * [bf16] add bf16 kernel: squeeze & unsqueeze & stack (#39402) * add squeeze unsqueeze stack * add unittest * add cpu kernel * Modify the unsqueeze dimension of input data in conv1d NCL And NLC format (#38425) * optimize conv1d forward * add conv opt * Optimize memory copy * delete share data with * set num_filters=512 * add nlc optimize * Optimize num_filter=512 data on A100 and V100 * Fix the workspace_size size setting of filter * 【Pten】Refactor C++ API code-gen (#39408) * refactor C++ API code-gen * fix windows problem of C++ API * Refactored Python-C Attributes Parsing Functions (#39328) * Add _get_parameter method to Lamb optimizer (#39416) * add _get_parameter func to lamb * remove duplicate code * mkldnn layout issue fix (#39422) * mkldnn conv fix * definetion * fix compile error on jetson (#39441) * move Masked select to pten (#39193) * move masked select cpu kernel * add masked selected gpu kernel; test=develop * fix bugs; test=develop * bug fix; test=develop * bug fix; test=develop * add namespace to set mask array; test=develop * fix bug; test=develop * fix bugs; test=develop * fix ddim bug; test=develop * fix npu op bug; test=develop * fix xpu dependecy bug; test=develop * move kernel args to sig.cc; test=develop * 【PaddlePaddle Hackathon】31. Add Java frontend for Paddle Inference (#37162) * fix check error of ResetHolder (#39439) * Added python-c code generation for final state Eager Dygraph (#39233) * Removed debug info * Added automatic code generation for final state Eager Dygraph * Modified backward yaml * Added EagerUtils helper functions for final state CodeGen * Adjusted CMakeFiles to support compilation for final state auto generated codes * Added python-c code generation for final state Eager Dygraph * Fixed minor issue * Fixed yaml.load() method failure * Fixed minor issues * Refactored Python-C Attributes Parsing Functions * Fixed minor issue with Python-C AddFunctions * Fixed issues from merge * Fixed merge issues * change dtype of pooling mask to 'int32' for Paddle2ONNX (#39314) * change dtype of pooling mask to 'int32' for Paddle2ONNX * empty commit to rerun ci * fix format * share MemOptVarInfos of external variables into cinn_launch subgraph (#39209) * add a graph pass to share MemOptVarInfos of external variables into subgraph * update pass name * fix compile failed * add share_mem_opt_info_to_subgraph_pass test * share_mem_opt_info_to_subgraph_pass_test pass * modify some codes for better style and more robust * update cmake * [NPU] add reduce_min (#39019) [NPU] add reduce_min * [MLU] add mlu kernel for accuracy op (#39337) * [MLU] add mlu kernel for accuracy op * fix license format * fix error message * [Dy2St]Handle `a, b = paddle.shape(x)` in Static Analysis (#39245) * refine Assign * add UT * 【Pten】Auto-Generate InterMeta register (#39436) * fix code conflict * generate inter_meta register * clear cache * just try * add sign c++ api * polish some code * Support different dtypes of inputs for elementwise ops (#38859) * improve backward performance * support different dtypes for elementwise ops * Add profiler node tree implementation (#39316) * add event node implementation * modify profiler.stop interface * fix according to review * fix file mode * modify class method name in event_node.cc * modify LLONG_MAX to ULLONG_MAX * fix ci error * fix ci error * add print pten kernel tool (#39371) * test=document_fix;add print pten kernel tool * test=document_fix * test=document_fix * test=document_fix * test=document_fix * add print_pten_kernels tool * add print_pten_kernels tool * fix windows complie * notest,test=rocm_ci * add merge tool * add comments * [new-exec] set type of op-kernel op by place (#39458) * Add log for executor (#39459) * add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173. * add log for Executor Co-authored-by: liutiexing <liutiexing@google.com> * [Paddle Inference] support ernie quant model with interleaved (#39424) * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * 统一 ps 开发 - python (#39431) * delete gloo connect retry * the_one_ps dirs reconstruct * . * . * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * the one ps dirs modify * the one ps dirs modify * the one ps dirs modify * the one ps dirs modify * refactor ps optimize * refactor ps optimize * refactor ps optimize * . * . * . * . * . * . * refactor theoneps * the_one_ps * add ps pass unittest * add ps pass unittest * ps unitest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * add cpu_async_ps_mode test * add cpu_async_ps_mode test * add cpu_async_ps_mode test * ps unittest ready * ps unittest ready * solve dist_pass init conflict * solve import CommContext error * unittest ok * implement AllocateFrom * solve setup.py.in conflict * solve conflict * solve conflict * solve conflict * . * . * cpu-async-ps minimize test ok & gpu minimize test ok Co-authored-by: zkh2016 <zhangkaihuo@baidu.com> * [PTen] Move grad GetExpectedPtenKernelArgs into pten (#39418) * move grad get expected pten kernel args * fix reduce sum error * fix element_sub_grad failed * revert kernel judge change * fix compilation warning on mac (#39438) * get build time (#39368) * fix prelu trt convert (#39389) * Optimize bilinear interpolation foward (#39243) * bilinear_fw init * optimize code * pre-compute linear_interp input index * Optimize performance of softmax_bwd when axis!=-1 (#38609) * Optimize performance of softmax_bwd when axis!=-1 * fix * fix * fix * fix * [PTen] Remove pten core's dependency on fluid xxx_info.h (#39401) * ermove xxx_info include * fix namespace error * resolve conflict * skip xpu context in registry * fix macro error * resolve conflict * resolve conflict * revert xpu convert * remove trans to fluid place * remove useless headers * [Pten] move operators/math/math_function_* to pten/kernels/func (#39300) * move operators/math/math_function_* to pten/kernels/func * namespace from `paddle::operators::math` to `pten::funcs` * [MLU] add pool2d and pool2d_grad mlu kernel (#39453) * [MLU]support c_gen_cncl_id_op run on MLU device (#39336) Co-authored-by: zhangna <zhangna@cambricon.com> * [bf16] add bf16 kernel: transpose & unbind (#39457) * add transpose unbind * add unittest * refine transpose unittest * uniform_random op for mlu (#39450) * [MLU] add pool2d pytest (#39454) * Added shape (U)INT8/BF16/FP32 oneDNN kernel (#36033) * added shape oneDNN kernel * removed unnecessary import from test * added skipping tests for GPU * refactoring * refactored shape kernel * added tests in new framework * removed one line * minor change * added newline at EOF * added formatting * added attributes as extra * move memcpy.h into cc file (#39469) * Add TensorRT inspector into Paddle-TRT (#38362) * Fix add profiler node tree implementation cmake error (#39474) * add event node implementation * modify profiler.stop interface * fix according to review * fix file mode * modify class method name in event_node.cc * modify LLONG_MAX to ULLONG_MAX * fix ci error * fix ci error * fix dependency error * unify naming style (#39481) * [Pten] Generate Wrapped InferMeta by Yaml (#39482) * generate wrapped_infer_meta * add test for wrapped_infer_meta * Update test_meta_fn_utils.cc * change the dir of generated file Co-authored-by: Chen Weihang <chenweihang@baidu.com> Co-authored-by: Chen Weihang <chenwhpro@163.com> * Adjusted python-level trace_op to accomodate final state Eager Dygraph (#39319) * Removed debug info * Added automatic code generation for final state Eager Dygraph * Modified backward yaml * Added EagerUtils helper functions for final state CodeGen * Adjusted CMakeFiles to support compilation for final state auto generated codes * Added python-c code generation for final state Eager Dygraph * Fixed minor issue * Fixed yaml.load() method failure * Fixed minor issues * Refactored Python-C Attributes Parsing Functions * Fixed minor issue with Python-C AddFunctions * Adjusted python-level trace_op to accomodate final state Eager Dygraph * Added Logs for final state Eager Dygraph * Fixed merge issues * Fixed minor issue * Fixed get_tensor method for EagerTensor (#39414) * Enabled Eager OpTest #1 * Enabled Eager OpTest #1 * Fixed get_tensor method for EagerTensor * [Approver Update] update check approver of qili93, test=document_fix (#39483) * [MLU] add mlu kernel for c_broadcast op (#39470) * update xpu test build script and fix get_test_cover_info, *test=kunlun (#39235) * fix gather_nd, *test=kunlun (#39283) * [pten] add split kernel (#39060) * add split kernel * add split kernel signature * fix split bug * modify MakePtenScalarArrayFromVarList * modify MakePtenScalarArrayFromVarList * fix split windows register error * add test case for split kernel * replace raw split kernel with pten kernel * fix makeScalar/ScalarArray bug * remove debug log * remove int64_t type in buildPtcontext * update by code review * fix split dev test failed * change DenseTensorMeta to MetaTensor * change split api code from auto gen to manual * split cuda kernel support bfloat16 type * fix conflict * rm raw split kernel * merge develop branch * change to pten::errors * new may of test cases, *test=kunlun (#39444) * new may of test cases, *test=kunlun * new may of test cases, *test=kunlun * new may of test cases, *test=kunlun * [PTen] Add HasAttr for ArgumentMappingContext (#39464) * add has_attr for arg map context * skip useless attr now * skip attr if not exists * fix typo * [ROCm] fix missing dcu kernel in operator.cmake, test=develop (#39480) Co-authored-by: zyfncg <zhangyunfei07@baidu.com> Co-authored-by: Aganlengzi <aganlengzi@gmail.com> Co-authored-by: Chen Weihang <chenweihang@baidu.com> Co-authored-by: Leo Chen <chenqiuliang@baidu.com> Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com> Co-authored-by: crystal <62974595+Zjq9409@users.noreply.github.com> Co-authored-by: Zhanlue Yang <jim19930609@gmail.com> Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: wenbin <wang3323032@qq.com> Co-authored-by: Wilber <jiweibo@baidu.com> Co-authored-by: hong <43953930+phlrain@users.noreply.github.com> Co-authored-by: chenyanlann <62465397+chenyanlann@users.noreply.github.com> Co-authored-by: Wei Shengyu <weisy11@163.com> Co-authored-by: TeFeng Chen <ctfeng66@163.com> Co-authored-by: furnace <34057289+windstamp@users.noreply.github.com> Co-authored-by: fwenguang <95677191+fwenguang@users.noreply.github.com> Co-authored-by: 0x45f <23097963+0x45f@users.noreply.github.com> Co-authored-by: Zhang Ting <zhangting_2017@163.com> Co-authored-by: chenjian <chenjian26@baidu.com> Co-authored-by: Shang Zhizhou <shangzhizhou@baidu.com> Co-authored-by: liutiexing <74819124+liutiexing@users.noreply.github.com> Co-authored-by: liutiexing <liutiexing@google.com> Co-authored-by: Wangzheee <634486483@qq.com> Co-authored-by: ziyoujiyi <73728031+ziyoujiyi@users.noreply.github.com> Co-authored-by: zkh2016 <zhangkaihuo@baidu.com> Co-authored-by: zhangchunle <clzhang_cauc@163.com> Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com> Co-authored-by: Lijunhui <1578034415@qq.com> Co-authored-by: Zhang Zheng <32410583+ZzSean@users.noreply.github.com> Co-authored-by: Feiyu Chan <chenfeiyu@baidu.com> Co-authored-by: zn <96479180+kangna-qi@users.noreply.github.com> Co-authored-by: zhangna <zhangna@cambricon.com> Co-authored-by: joeqiao12 <45232181+joeqiao12@users.noreply.github.com> Co-authored-by: jakpiase <jakpia21@gmail.com> Co-authored-by: Leo Chen <39020268+leo0519@users.noreply.github.com> Co-authored-by: Chen Weihang <chenwhpro@163.com> Co-authored-by: Qi Li <qili93@qq.com> Co-authored-by: maxhuiy <1508399706@qq.com> Co-authored-by: TTerror <tangzhiyi11@users.noreply.github.com> Co-authored-by: chentianyu03 <chentianyu03@baidu.com> Co-authored-by: helen88 <z8hanghuan@126.com>
PaddlePaddle · Feb 16, 2022 · c5ba788 · c5ba788
1 parent 87f4a68
commit c5ba788
Show file tree

Hide file tree

Showing 578 changed files with 14,110 additions and 4,711 deletions.
diff --git a/.gitignore b/.gitignore
@@ -9,6 +9,7 @@ paddle/pten/api/lib/api.cc
 paddle/pten/api/backward/backward_api.h
 paddle/pten/api/lib/backward_api.cc
 paddle/pten/include/*
+paddle/pten/infermeta/generated.*
 paddle/pten/extension.h
 paddle/fluid/eager/api/generated/*
 

diff --git a/AUTHORS.md b/AUTHORS.md
@@ -83,3 +83,4 @@
 | jeng1220 | Bai-Cheng(Ryan) Jeng (NVIDIA) |
 | mingxu1067 | Ming Huang (NVIDIA) |
 | zlsh80826 | Reese Wang (NVIDIA) |
+| leo0519 | Leo Chen (NVIDIA) |
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -242,6 +242,15 @@ option(NEW_RELEASE_CUBIN   "PaddlePaddle next-level release strategy for pypi cu
 option(NEW_RELEASE_JIT   "PaddlePaddle next-level release strategy for backup jit package"             OFF)
 option(WITH_ASCEND_INT64 "Compile with int64 kernel for ascend NPU"    OFF)
 option(WITH_POCKETFFT    "Compile with pocketfft support"      ON)
+option(WITH_RECORD_BUILDTIME    "Compile PaddlePaddle with record all targets build time"       OFF)
+
+if(WITH_RECORD_BUILDTIME)
+    set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${CMAKE_CURRENT_SOURCE_DIR}/tools/get_build_time.sh")
+    set_property(GLOBAL PROPERTY RULE_LAUNCH_LINK "${CMAKE_CURRENT_SOURCE_DIR}/tools/get_build_time.sh")
+else()            
+    include(ccache) # set ccache for compilation ; if WITH_RECORD_BUILDTIME=ON can't use ccache
+endif()
+unset(WITH_RECORD_BUILDTIME CACHE)
 
 # PY_VERSION
 if(NOT PY_VERSION)
@@ -382,7 +391,6 @@ if(WITH_PROFILER)
     add_definitions(-DWITH_GPERFTOOLS)
 endif()
 
-include(ccache)             # set ccache for compilation
 include(util)               # set unittest and link libs
 include(version)            # set PADDLE_VERSION
 include(coveralls)          # set code coverage

diff --git a/cmake/cuda.cmake b/cmake/cuda.cmake
@@ -2,7 +2,6 @@ if(NOT WITH_GPU)
     return()
 endif()
 
-
 if(WITH_NV_JETSON)
   add_definitions(-DWITH_NV_JETSON)
   set(paddle_known_gpu_archs "53 62 72")

diff --git a/cmake/operators.cmake b/cmake/operators.cmake
@@ -335,6 +335,17 @@ function(op_library TARGET)
         endif()
     endforeach()
 
+    # pybind USE_OP_DEVICE_KERNEL for ROCm
+    list (APPEND hip_srcs ${hip_cc_srcs})
+    # message("hip_srcs ${hip_srcs}")
+    foreach(hip_src ${hip_srcs})
+        set(op_name "")
+        find_register(${hip_src} "REGISTER_OP_CUDA_KERNEL" op_name)
+        if(NOT ${op_name} EQUAL "")
+            file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(${op_name}, CUDA);\n")
+            set(pybind_flag 1)
+        endif()
+    endforeach()
 
     # pybind USE_OP_DEVICE_KERNEL for CUDNN/MIOPEN
     list(APPEND cudnn_cu_srcs ${cudnn_cu_cc_srcs}) 

diff --git a/paddle/fluid/distributed/ps/service/communicator/communicator.h b/paddle/fluid/distributed/ps/service/communicator/communicator.h
@@ -35,12 +35,12 @@ limitations under the License. */
 #include "paddle/fluid/framework/variable.h"
 #include "paddle/fluid/framework/variable_helper.h"
 #include "paddle/fluid/operators/math/blas.h"
-#include "paddle/fluid/operators/math/math_function.h"
 #include "paddle/fluid/operators/math/selected_rows_functor.h"
 #include "paddle/fluid/platform/device_context.h"
 #include "paddle/fluid/platform/enforce.h"
 #include "paddle/fluid/platform/place.h"
 #include "paddle/fluid/string/split.h"
+#include "paddle/pten/kernels/funcs/math_function.h"
 
 #include "paddle/fluid/distributed/ps/service/ps_client.h"
 
@@ -180,7 +180,7 @@ inline void MergeVars(const std::string &var_name,
 
     // set output tensor to 0.
     paddle::platform::CPUDeviceContext cpu_ctx;
-    paddle::operators::math::SetConstant<paddle::platform::CPUDeviceContext, T>
+    pten::funcs::SetConstant<paddle::platform::CPUDeviceContext, T>
         constant_functor;
     constant_functor(cpu_ctx, out_t, static_cast<T>(0));
     // sum all vars to out

diff --git a/paddle/fluid/distributed/ps/service/ps_service/graph_py_service.h b/paddle/fluid/distributed/ps/service/ps_service/graph_py_service.h
@@ -38,9 +38,10 @@
 #include "paddle/fluid/distributed/ps/service/ps_service/service.h"
 #include "paddle/fluid/distributed/ps/service/sendrecv.pb.h"
 #include "paddle/fluid/framework/program_desc.h"
-#include "paddle/fluid/operators/math/math_function.h"
 #include "paddle/fluid/platform/place.h"
 #include "paddle/fluid/string/printf.h"
+#include "paddle/pten/kernels/funcs/math_function.h"
+
 namespace paddle {
 namespace distributed {
 class GraphPyService {

diff --git a/paddle/fluid/distributed/test/brpc_service_dense_sgd_test.cc b/paddle/fluid/distributed/test/brpc_service_dense_sgd_test.cc
@@ -21,8 +21,8 @@ limitations under the License. */
 #include "paddle/fluid/distributed/ps/service/brpc_ps_server.h"
 #include "paddle/fluid/framework/program_desc.h"
 #include "paddle/fluid/framework/scope.h"
-#include "paddle/fluid/operators/math/math_function.h"
 #include "paddle/fluid/platform/place.h"
+#include "paddle/pten/kernels/funcs/math_function.h"
 
 namespace paddle {
 namespace distributed {
@@ -42,7 +42,6 @@ class DenseTensor;
 namespace framework = paddle::framework;
 namespace platform = paddle::platform;
 namespace operators = paddle::operators;
-namespace math = paddle::operators::math;
 namespace memory = paddle::memory;
 namespace distributed = paddle::distributed;
 

diff --git a/paddle/fluid/distributed/test/brpc_service_sparse_sgd_test.cc b/paddle/fluid/distributed/test/brpc_service_sparse_sgd_test.cc
@@ -22,8 +22,8 @@ limitations under the License. */
 #include "paddle/fluid/distributed/ps/service/brpc_ps_server.h"
 #include "paddle/fluid/distributed/ps/service/env.h"
 #include "paddle/fluid/framework/program_desc.h"
-#include "paddle/fluid/operators/math/math_function.h"
 #include "paddle/fluid/platform/place.h"
+#include "paddle/pten/kernels/funcs/math_function.h"
 
 namespace paddle {
 namespace distributed {
@@ -43,7 +43,6 @@ class DenseTensor;
 namespace framework = paddle::framework;
 namespace platform = paddle::platform;
 namespace operators = paddle::operators;
-namespace math = paddle::operators::math;
 namespace memory = paddle::memory;
 namespace distributed = paddle::distributed;
 

diff --git a/paddle/fluid/distributed/test/brpc_utils_test.cc b/paddle/fluid/distributed/test/brpc_utils_test.cc
@@ -17,7 +17,7 @@ limitations under the License. */
 #include "gtest/gtest.h"
 
 #include "paddle/fluid/distributed/ps/service/brpc_utils.h"
-#include "paddle/fluid/operators/math/math_function.h"
+#include "paddle/pten/kernels/funcs/math_function.h"
 
 namespace paddle {
 namespace framework {
@@ -28,7 +28,6 @@ class Variable;
 namespace framework = paddle::framework;
 namespace platform = paddle::platform;
 namespace operators = paddle::operators;
-namespace math = paddle::operators::math;
 namespace memory = paddle::memory;
 namespace distributed = paddle::distributed;
 
@@ -42,7 +41,7 @@ void CreateVarsOnScope(framework::Scope* scope, platform::Place* place,
   lod1.push_back(framework::Vector<size_t>({1, 3, 8}));
   tensor1->set_lod(lod1);
   tensor1->mutable_data<float>(*place);
-  math::set_constant(ctx, tensor1, 31.9);
+  pten::funcs::set_constant(ctx, tensor1, 31.9);
 
   // var 2
   framework::Variable* var2 = scope->Var("x2");
@@ -52,7 +51,7 @@ void CreateVarsOnScope(framework::Scope* scope, platform::Place* place,
   lod2.push_back(framework::Vector<size_t>({1, 1}));
   tensor2->set_lod(lod2);
   tensor2->mutable_data<int>(*place);
-  math::set_constant(ctx, tensor2, 100);
+  pten::funcs::set_constant(ctx, tensor2, 100);
 
   // var 3
   framework::Variable* var3 = scope->Var("x3");
@@ -62,7 +61,7 @@ void CreateVarsOnScope(framework::Scope* scope, platform::Place* place,
   auto* rows = slr->mutable_rows();
   tensor3->Resize(framework::make_ddim({564, 128}));
   tensor3->mutable_data<float>(*place);
-  math::set_constant(ctx, tensor3, 32.7);
+  pten::funcs::set_constant(ctx, tensor3, 32.7);
   for (int i = 0; i < 564; ++i) rows->push_back(i);
 }
 

diff --git a/paddle/fluid/distributed/test/graph_node_split_test.cc b/paddle/fluid/distributed/test/graph_node_split_test.cc
@@ -36,14 +36,13 @@ limitations under the License. */
 #include "paddle/fluid/framework/scope.h"
 #include "paddle/fluid/framework/tensor_util.h"
 #include "paddle/fluid/framework/variable.h"
-#include "paddle/fluid/operators/math/math_function.h"
 #include "paddle/fluid/platform/place.h"
 #include "paddle/fluid/string/printf.h"
+#include "paddle/pten/kernels/funcs/math_function.h"
 
 namespace framework = paddle::framework;
 namespace platform = paddle::platform;
 namespace operators = paddle::operators;
-namespace math = paddle::operators::math;
 namespace memory = paddle::memory;
 namespace distributed = paddle::distributed;
 

diff --git a/paddle/fluid/distributed/test/graph_node_test.cc b/paddle/fluid/distributed/test/graph_node_test.cc
@@ -36,14 +36,13 @@ limitations under the License. */
 #include "paddle/fluid/framework/scope.h"
 #include "paddle/fluid/framework/tensor_util.h"
 #include "paddle/fluid/framework/variable.h"
-#include "paddle/fluid/operators/math/math_function.h"
 #include "paddle/fluid/platform/place.h"
 #include "paddle/fluid/string/printf.h"
+#include "paddle/pten/kernels/funcs/math_function.h"
 
 namespace framework = paddle::framework;
 namespace platform = paddle::platform;
 namespace operators = paddle::operators;
-namespace math = paddle::operators::math;
 namespace memory = paddle::memory;
 namespace distributed = paddle::distributed;
 

diff --git a/paddle/fluid/eager/auto_code_generator/final_state_generator/CMakeLists.txt b/paddle/fluid/eager/auto_code_generator/final_state_generator/CMakeLists.txt
@@ -24,3 +24,13 @@ add_custom_target(eager_final_state_codegen
     COMMAND ${CMAKE_COMMAND} -E copy_if_different ${tmp_nodes_h_path} ${nodes_h_path}
     VERBATIM
 )
+
+set(tmp_python_c_output_path "${PADDLE_SOURCE_DIR}/paddle/fluid/pybind/tmp_eager_final_state_op_function_impl.h")
+set(python_c_output_path "${PADDLE_SOURCE_DIR}/paddle/fluid/pybind/eager_final_state_op_function_impl.h")
+add_custom_target(eager_final_state_python_c_codegen
+    COMMAND "${PYTHON_EXECUTABLE}" "${PADDLE_SOURCE_DIR}/paddle/fluid/eager/auto_code_generator/final_state_generator/python_c_gen.py" 
+            "--api_yaml_path=${api_yaml_path}"
+            "--output_path=${tmp_python_c_output_path}"
+    COMMAND ${CMAKE_COMMAND} -E copy_if_different ${tmp_python_c_output_path} ${python_c_output_path}
+    VERBATIM
+)