Skip to content

Commit

Permalink
Polishing the cpu profiling doc (#6116)
Browse files Browse the repository at this point in the history
  • Loading branch information
abhinavarora authored and wangkuiyi committed Nov 30, 2017
1 parent 0d40a4d commit 6dc5b34
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions doc/howto/optimization/cpu_profiling.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
This tutorial introduces techniques we used to profile and tune the
This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages
`cProfile` and `yep`, and Google `perftools`.
`cProfile` and `yep`, and Google's `perftools`.

Profiling is the process that reveals the performance bottlenecks,
Profiling is the process that reveals performance bottlenecks,
which could be very different from what's in the developers' mind.
Performance tuning is to fix the bottlenecks. Performance optimization
Performance tuning is done to fix these bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively.

PaddlePaddle users program AI by calling the Python API, which calls
PaddlePaddle users program AI applications by calling the Python API, which calls
into `libpaddle.so.` written in C++. In this tutorial, we focus on
the profiling and tuning of

Expand Down Expand Up @@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime:

We can see that the most time-consuming function is the `built-in
method run`, which is a C++ function in `libpaddle.so`. We will
explain how to profile C++ code in the next section. At the right
explain how to profile C++ code in the next section. At this
moment, let's look into the third function `sync_with_cpp`, which is a
Python function. We can click it to understand more about it:

Expand Down Expand Up @@ -135,8 +135,8 @@ to generate the profiling file. The default filename is
`main.py.prof`.

Please be aware of the `-v` command line option, which prints the
analysis results after generating the profiling file. By taking a
glance at the print result, we'd know that if we stripped debug
analysis results after generating the profiling file. By examining the
the print result, we'd know that if we stripped debug
information from `libpaddle.so` at build time. The following hints
help make sure that the analysis results are readable:

Expand All @@ -155,9 +155,9 @@ help make sure that the analysis results are readable:
variable `OMP_NUM_THREADS=1` to prevents OpenMP from automatically
starting multiple threads.

### Look into the Profiling File
### Examining the Profiling File

The tool we used to look into the profiling file generated by
The tool we used to examine the profiling file generated by
`perftools` is [`pprof`](https://github.com/google/pprof), which
provides a Web-based GUI like `cprofilev`.

Expand Down Expand Up @@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to
optimize `MomentumOp`.

`pprof` would mark performance critical parts of the program in
red. It's a good idea to follow the hint.
red. It's a good idea to follow the hints.

0 comments on commit 6dc5b34

Please sign in to comment.