Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra] Add "ti.profiler" (PythonProfiler) for intuitive Python-scope profiling #1493

Merged
merged 18 commits into from
Jul 31, 2020
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
275 changes: 257 additions & 18 deletions docs/profiler.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,45 +2,250 @@
Profiler
========

Taichi's profiler can help you analyze the run-time cost of your program. There are two profiling systems in Taichi: ``ScopedProfiler`` and ``KernelProfiler``.
Taichi's profiler can help you analyze the run-time cost of your program, i.e. performance.
archibate marked this conversation as resolved.
Show resolved Hide resolved

Currently there are three profiling systems in Taichi:

- ``PythonProfiler``
- ``KernelProfiler``
- ``ScopedProfiler`` (for developers)


ScopedProfiler
##############

1. ``ScopedProfiler`` measures time spent on the **host tasks** hierarchically.
PythonProfiler
--------------

2. This profiler is automatically on. To show its results, call ``ti.print_profile_info()``. For example:
``PythonProfiler`` basically measures time spent between ``start()`` and ``stop()`` using
the Python-builtin function ``time.time()``.

Profiling APIs
**************

There are 3 ways to use this profiler:

1. ``ti.profiler.start()`` and ``ti.profiler.stop()`` are the most fundemental APIs:
It will measure the time difference between ``start`` and ``stop``.
Then save the result according to the given name. e.g.:

.. code-block:: python

from time import sleep
import taichi as ti
ti.init()

ti.init(arch=ti.cpu)
var = ti.var(ti.f32, shape=1)
def do_something_A():
sleep(0.01)

def do_something_B():
sleep(0.1)

ti.profiler.start('A')
do_something_A()
ti.profiler.stop('A')

ti.profiler.start('B')
do_something_B()
ti.profiler.stop('B')

ti.profiler.print()


.. code-block:: none

min | avg | max | num | total | name
0.100s | 0.100s | 0.100s | 1x | 0.100s | B
10.10ms | 10.10ms | 10.10ms | 1x | 10.10ms | A


2. ``with ti.profiler()``, this one makes our code cleaner and readable.
Basically, it will automatically invoke ``stop`` when the indented block exited.
For more details about the ``with`` syntax in Python,
see `this tutorial <https://www.pythonforbeginners.com/files/with-statement-in-python>`_.

.. code-block:: python

from time import sleep
import taichi as ti
ti.init()

def do_something_A():
sleep(0.01)

def do_something_B():
sleep(0.1)

with ti.profiler('A'):
do_something_A()
# automatically invoke stop('A')

with ti.profiler('B'):
do_something_B()
# automatically invoke stop('B')

ti.profiler.print()


.. code-block:: none

min | avg | max | num | total | name
0.100s | 0.100s | 0.100s | 1x | 0.100s | B
10.10ms | 10.10ms | 10.10ms | 1x | 10.10ms | A


3. ``@ti.profiler.timed``, this one is very intuitive when profiling kernels.
It will measure the time spent in the function, i.e. ``start`` when entering the function,
``stop`` when leaving the function, and the record name is the function name.

.. code-block:: python

from time import sleep
import taichi as ti
ti.init()

@ti.profiler.timed
def do_something_A():
sleep(0.01)

@ti.profiler.timed
def do_something_B():
sleep(0.1)

do_something_A()
do_something_B()

ti.profiler.print()


.. code-block:: none

min | avg | max | num | total | name
0.100s | 0.100s | 0.100s | 1x | 0.100s | do_something_B
10.10ms | 10.10ms | 10.10ms | 1x | 10.10ms | do_something_A


.. warning::

When combining ``@ti.profiler.timed`` with other decorators like ``@ti.kernel``,
then ``@ti.profiler.timed`` should be put **above** it, e.g.:

.. code-block:: python

@ti.profiler.timed
@ti.kernel
def substep():
...


Recording multiple entries
**************************

When a same **name** is used for multiple times, then they will be merged into one, e.g.:

.. code-block:: python

from time import sleep
import taichi as ti
ti.init()

def do_something_A():
sleep(0.01)

def do_something_B():
sleep(0.1)

ti.profiler.start('A')
do_something_A()
ti.profiler.stop('A')
archibate marked this conversation as resolved.
Show resolved Hide resolved

ti.profiler.start('A')
do_something_B()
ti.profiler.stop('A')

ti.profiler.start('B')
do_something_B()
ti.profiler.stop('B')

ti.profiler.print()

will obtain:

.. code-block:: none

min | avg | max | num | total | name
10.10ms | 55.12ms | 0.100s | 2x | 0.110s | A
0.100s | 0.100s | 0.100s | 1x | 0.100s | B


- ``min`` is the minimum time in records.
- ``avg`` is the average time of records.
- ``max`` is the maximum time in records.
- ``num`` is the number of record entries.
- ``total`` is the total costed time of records.


Profiler options
****************

Due to Taichi's JIT mechanism, a kernel will be **compiled** on its first invocation.
So the first record will be extremely long compared to the following records since it
**involves both compile time and execution time**, e.g.:

.. code-block:: none

min | avg | max | num | total | name
2.37ms | 3.79ms | 1.615s | 1900x | 7.204s | substep

.. code-block:: none

min | avg | max | num | total | name
2.37ms | 2.95ms | 12.70ms | 1895x | 5.592s | substep


As you see, this make our result inaccurate, especially the ``max`` column.

To avoid this, you may specify a ``warmup`` option to ``ti.profiler``, e.g.:

.. code-block:: python

@ti.profiler.timed(warmup=5)
@ti.kernel
def compute():
var[0] = 1.0
print("Setting var[0] =", var[0])
def substep():
...


compute()
ti.print_profile_info()
Set ``warmup=5`` for example, will **discard** the first 5 record entries.
I.e. discard the kernel compile time and possible TLB and cache misses on start up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
I.e. discard the kernel compile time and possible TLB and cache misses on start up.
I.e. discard the kernel compile time and possible TLB(Translation Lookaside Buffer) and cache misses on start up.

IMO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance caring people should be OK with this term, and btw we use a space between brackets and chars:

Suggested change
I.e. discard the kernel compile time and possible TLB and cache misses on start up.
I.e. discard the kernel compile time and possible TLB (Translation Lookaside Buffer) and cache misses on start up.



``ti.print_profile_info()`` prints profiling results in a hierarchical format.
Check out `misc/mpm99_timed.py <https://github.com/taichi-dev/taichi/blob/master/misc/mpm99_timed.py>`_ for a summary example.

.. Note::

``ScopedProfiler`` is a C++ class in the core of Taichi. It is not exposed to Python users.
.. warning::

``ti.profiler``, i.e. ``PythonProfiler``, **only works in Python-scope**, e.g.::

@ti.func
def substep():
ti.profiler.start('hello') # won't work as you expected...
...
ti.profiler.stop('hello')

@ti.profiler.timed # won't work as you expected...
@ti.func
def hello():
...

To do profiling **inside Taichi-scope**, please see the ``KernelProfiler`` section below.


KernelProfiler
##############
--------------

``KernelProfiler`` records the costs of Taichi kernels on devices.

1. ``KernelProfiler`` records the costs of Taichi kernels on devices. To enable this profiler, set ``kernel_profiler=True`` in ``ti.init``.
To enable this profiler, please initialize Taichi using ``ti.init(kernel_profiler=True)``.

2. Call ``ti.kernel_profiler_print()`` to show the kernel profiling result. For example:
Call ``ti.kernel_profiler_print()`` to show the kernel profiling result. For example:

.. code-block:: python
:emphasize-lines: 3, 13
Expand All @@ -67,3 +272,37 @@ The outputs would be:
[ 22.73%] jit_evaluator_0_kernel_0_serial min 0.001 ms avg 0.001 ms max 0.001 ms total 0.000 s [ 1x]
[ 0.00%] jit_evaluator_1_kernel_1_serial min 0.000 ms avg 0.000 ms max 0.000 ms total 0.000 s [ 1x]
[ 77.27%] compute_c4_0_kernel_2_serial min 0.004 ms avg 0.004 ms max 0.004 ms total 0.000 s [ 1x]


ScopedProfiler
--------------

``ScopedProfiler`` measures time spent on the **host tasks** hierarchically.

This profiler is automatically on.

To show its results, call ``ti.print_profile_info()``. For example:

.. code-block:: python

import taichi as ti

ti.init(arch=ti.cpu)
var = ti.var(ti.f32, shape=1)


@ti.kernel
def compute():
var[0] = 1.0
print("Setting var[0] =", var[0])


compute()
ti.print_profile_info()


``ti.print_profile_info()`` prints profiling results in a hierarchical format.

.. Note::

``ScopedProfiler`` is a C++ class in the core of Taichi. It is not exposed to Python users.
Loading