Skip to content

Commit

Permalink
fix format (#5)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhuhaozhe authored Jun 8, 2023
1 parent 216e1f5 commit 2203acb
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions intermediate_source/inductor_debug_cpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,8 +462,10 @@ def trace_handler(p):
# From the profiling table of the eager model, we can see the most time consumption ops are [aten::addmm, aten::add, aten::copy_, aten::mul, aten::clamp_min, aten::bmm].
# Comparing with the inductor model profiling table, we notice there are ``mkl::_mkl_linear`` and fused kernel called ``graph_0_cpp_fused_*``. They are the major
# optimization that the inductor model is doing. Let us discuss them separately.
# (1) Regard to ``mkl::_mkl_linear```: You may notice the number of calls to this kernel is 362, which is exactly the same as ``aten::linear``` in the eager model profiling table.
# The CPU total of ``aten::linear`` is 376.888ms, at the mean time it is 231.573ms for ``mkl::_mkl_linear``. This suggests inductor model speed up ~1.63x for the "linear" part.
#
# (1) Regard to ``mkl::_mkl_linear``: You may notice the number of calls to this kernel is 362, which is exactly the same as ``aten::linear`` in the eager model profiling table.
# The CPU total of ``aten::linear`` is 376.888ms, at the mean time it is 231.573ms for ``mkl::_mkl_linear``. This suggests inductor model speed up ~1.63x for the "linear" part.
#
# (2) Regarding non-linear part: The end-to-end latency for the eager/inductor model is 802/339ms. The speed up for the non-linear part is ~3.94x.
# Let's read the generated code to understand how the inductor achieves this impressive optimization. You are able to find the generated code by
# searching ``cpp_fused__mkl_linear_add_mul_relu_151`` in ``output_code.py``
Expand Down Expand Up @@ -553,8 +555,8 @@ def func(x0, x1, x3, x5, x7):
# eager use: 5.780875144992024 ms/iter
# inductor use: 0.9588955780491233 ms/iter
# speed up ratio: 6.0286805751604735


#
#
# This is just an example. The profiling table shows all element-wise op are fused within the inductor automatically in this model. You can read more kernels in
# `output_code.py`

Expand Down

0 comments on commit 2203acb

Please sign in to comment.