diff --git a/docs/static_site/src/_sass/minima/_docs.scss b/docs/static_site/src/_sass/minima/_docs.scss index 09924f3166e7..f628740862ae 100644 --- a/docs/static_site/src/_sass/minima/_docs.scss +++ b/docs/static_site/src/_sass/minima/_docs.scss @@ -67,7 +67,9 @@ } .docs-faq { - background-color: white; + background-color: $grey-color-light; + padding-top: 20px; + padding-bottom: 20px; } .docs-architecture { @@ -76,4 +78,10 @@ margin-bottom: 20px; padding-top: 20px; padding-bottom: 20px; -} \ No newline at end of file +} + +.docs-dev-guide { + background-color: white; + padding-top: 20px; + padding-bottom: 20px; +} diff --git a/docs/static_site/src/assets/img/dev_guide_profilling_1.png b/docs/static_site/src/assets/img/dev_guide_profilling_1.png new file mode 100644 index 000000000000..7c8248e38213 Binary files /dev/null and b/docs/static_site/src/assets/img/dev_guide_profilling_1.png differ diff --git a/docs/static_site/src/assets/img/dev_guide_profilling_2.png b/docs/static_site/src/assets/img/dev_guide_profilling_2.png new file mode 100644 index 000000000000..dbb55a1cceca Binary files /dev/null and b/docs/static_site/src/assets/img/dev_guide_profilling_2.png differ diff --git a/docs/static_site/src/assets/img/dev_guide_profilling_3.png b/docs/static_site/src/assets/img/dev_guide_profilling_3.png new file mode 100644 index 000000000000..3f276cf9ab3a Binary files /dev/null and b/docs/static_site/src/assets/img/dev_guide_profilling_3.png differ diff --git a/docs/static_site/src/assets/img/dev_guide_profilling_4.png b/docs/static_site/src/assets/img/dev_guide_profilling_4.png new file mode 100644 index 000000000000..73003b075569 Binary files /dev/null and b/docs/static_site/src/assets/img/dev_guide_profilling_4.png differ diff --git a/docs/static_site/src/assets/img/dev_guide_profilling_5.png b/docs/static_site/src/assets/img/dev_guide_profilling_5.png new file mode 100644 index 000000000000..893ce2e4b81e Binary files /dev/null and b/docs/static_site/src/assets/img/dev_guide_profilling_5.png differ diff --git a/docs/static_site/src/assets/img/dev_guide_profilling_6.png b/docs/static_site/src/assets/img/dev_guide_profilling_6.png new file mode 100644 index 000000000000..5682d6a6a1cf Binary files /dev/null and b/docs/static_site/src/assets/img/dev_guide_profilling_6.png differ diff --git a/docs/static_site/src/assets/img/dev_guide_profilling_7.png b/docs/static_site/src/assets/img/dev_guide_profilling_7.png new file mode 100644 index 000000000000..936262d9e414 Binary files /dev/null and b/docs/static_site/src/assets/img/dev_guide_profilling_7.png differ diff --git a/docs/static_site/src/pages/api/api.html b/docs/static_site/src/pages/api/api.html index 6f87cf654fca..87841d8bd0b0 100644 --- a/docs/static_site/src/pages/api/api.html +++ b/docs/static_site/src/pages/api/api.html @@ -173,6 +173,18 @@

Deep Learning System Design Concepts

+
+
+

Developer Guide

+ +
+

FAQ

diff --git a/docs/static_site/src/pages/api/developer_guide/1_github_contribution_and_PR_verification_tips.md b/docs/static_site/src/pages/api/developer_guide/1_github_contribution_and_PR_verification_tips.md new file mode 100644 index 000000000000..93cc916f7b0f --- /dev/null +++ b/docs/static_site/src/pages/api/developer_guide/1_github_contribution_and_PR_verification_tips.md @@ -0,0 +1,193 @@ +--- +layout: page_category +title: GitHub contribution and PR verification tips +category: Developer Guide +permalink: /api/dev-guide/github_contribution_and_PR_verification_tips +--- + + + + + + + + + + + + + + + + + +# GitHub contribution and PR verification tips + +Use this page for general git workflow tips. + +## Setup and configure + +It is recommended that you fork the MXNet repo, and then set the original repo as an upstream remote repo. + +Fork [https://github.com/apache/incubator-mxnet](https://github.com/apache/incubator-mxnet) then: + +``` +git clone --recursive https://github.com/your_username/incubator-mxnet +cd mxnet +git remote add upstream https://github.com/apache/incubator-mxnet +``` + +Once `upstream` was added, then create a branch for your contribution. + + +``` +git branch your-contribution-branch +``` + +Note that you can incorporate the changes from `upstream` to any of your local branches during or after development via: + +``` +git fetch upstream +git rebase upstream/master +``` + +See [this stackoverflow discussion](https://stackoverflow.com/questions/3357122/git-pull-vs-git-fetch-vs-git-rebase) for more details about difference between `git pull`, `git rebase` and `git merge`. + +Since Apache MXNet 3rd party git submodules, to update their changes on your branch after rebase, you can run: + +``` +git submodule update --recursive +``` + +## Save your local changes for future + +During development, you can save your current changes in your branch before committing anything. For example to go to another branch to do something else via: + + +``` +git stash save +``` + +To restore the changes so that they can be added to a commit use: + + +``` +git stash pop +``` + + +To drop the changes, use: + +``` +git stash drop +``` + +## Reset + +Sometimes, if you want to wipe out the changes you have made you can use: + +``` +git reset --hard +``` + +Be very careful since hard-reset removes any of the changes and you’ll be back to the HEAD commit. To remove all the changed before a commit given its commit-SHA you can use `git reset --hard commit-SHA` or `git reset --hard HEAD~2` to remove relative to the first two commits on top of HEAD. + +However, sometimes it’s useful to keep the files/changes staged when moving the HEAD which can be done via +`git reset --soft`. All of the files changed between the original HEAD and the commit will be staged. + +In [summary](https://stackoverflow.com/a/50022436), + + +* **`--soft`**: **uncommit** changes, changes are left staged (*index*). +* **`--mixed`** *(default)*: **uncommit + unstage** changes, changes are left in *working tree*. +* **`--hard`**: **uncommit + unstage + delete** changes, nothing left. + + + +## Recover a previous commit after reset + +Sometimes you might mistakenly reset a branch to a wrong commit. When that happens, you can use the following command to show the list of recent commits: + + +``` +git reflog +``` + +Once you get the right hashtag, you can use git reset again to change the head to the right commit. + + +## How to resolve conflict with master + +Sometimes when rebasing to the most recent master as explained above, git may show you there are some conflicts which it cannot resolve. These changes will not be merged. For examples, your file `conflict.py` has some conflicts with the master branch. Here you need to: + +* manually modify the file to resolve the conflict. +* After you resolved the conflict, mark it as resolved by: + +``` +git add conflict.py +``` + +* Then you can continue rebase by: + +``` +git rebase --continue +``` + +* Finally push to your fork, you may need to **force push** here: + +``` +git push --force +``` + +**Note** that force push is okay when it’s on your branch and you are the only one who is using that branch. Otherwise, it can have bad consequences as it’s rewritten the history. + + +## How to group multiple commits into one + +Sometimes, you may have added a lot of related commits suitable to be grouped/combined together to create one meaningful atomic commit. For example, when later commits are only fixes to previous ones, in your PR. +If you haven’t configured your default git editor, do the following once: + +``` +git config core.editor the-editor-you-like +``` + +Assume we want to merge the last 3 commits. + +``` +git rebase -i HEAD~3 +``` + +1. It will pop up an text editor. Set the **first commit as pick,** and **change later ones to squash**. +2. After you saved the file, it will pop up another text editor to ask you modify the combined commit message. +3. Push the changes to your fork, you need to force push. + +``` +git push --force +``` + +**Note** that force push is okay when it’s on your branch and you are the only one who is using that branch. Otherwise, it can have bad consequences as it’s rewritten the history. + + +## Apply only k-latest commits on to the master + +Sometimes it is useful to only apply your k-latest changes on top of the master. This usually happens when you have other m-commits that are already merged before these k-commits. Directly rebase against the master might cause merge conflicts on these first m-commits (which can be safely discarded). + +You can instead use the following command: + + +``` +# k is the concrete number. Put HEAD~2 for the last 1 commit. +git rebase --onto upstream/master HEAD~k +``` + +You can then force push to the master `git push --force`. Note that the above command will discard all the commits before the last k ones. + + +## What is the consequence of force push + +The last three tips require the force push, this is because we altered the path of the commits. **It is fine to force push to your own fork, as long as the commits changed are only yours.** In case there are multiple collaborators who use your branch there is a safer option `git push --force-with-lease.` + + +## PR verification + +When sending a pull request, remember to add some tests. During the development, one can set `MXNET_TEST_COUNT=1000/10000` to test on some randomly selected test cases. This makes the testing and development cycle faster. Moreover, some test results might change due to the seed in pseudo-random number generator. To fix the seed during testing, set `MXNET_TEST_SEED=your seed number`. diff --git a/docs/static_site/src/pages/api/developer_guide/debugging_and_performance_optimization_tips.md b/docs/static_site/src/pages/api/developer_guide/debugging_and_performance_optimization_tips.md new file mode 100644 index 000000000000..7f53bdb73bbc --- /dev/null +++ b/docs/static_site/src/pages/api/developer_guide/debugging_and_performance_optimization_tips.md @@ -0,0 +1,59 @@ +--- +layout: page_category +title: Debugging and performance optimization tips +category: Developer Guide +permalink: /api/dev-guide/debugging_and_performance_optimization_tips +--- + + + + + + + + + + + + + + + + + +# Debugging and performance optimization tips + +The general workflow when defining your network with Gluon API is either: + +* build sequentially using `nn.Sequential` or `nn.HybridSequential` + +* inherit from `nn.Block` or `nn.HybridBlock` + +## Debugging + +When debugging your MXNet code, remember the following: + +**Do NOT hybridize for debugging** + +The difference between [imperative style (Gluon non-hybridized) and symbolic style (Gluon hybridized)]({{ "/versions/1.2.1/architecture/program_model.html" | relative_url }}) is: + +* *imperative style* is _define-by-run_ +* *symbolic style* is _define-then-run_ + + +Basically, that means the execution path changes when calling `hybridize` on your network inherited from `HybridBlock` or `HybridSequential` (note that inheriting directly from `Block` is the same as not hybridizing your network). For efficiency, symbolic code does not keep the intermediate results and so it would be hard to debug and examine the intermediate outputs. Therefore, if you want to *examine the intermediate results for debugging, do NOT hybridize*. Once everything is working as expected, then you can `hybridize` and enjoy the speed up. + +Please checkout the [d2l](http://d2l.ai/chapter_computational-performance/hybridize.html?highlight=hybridize#hybrid-programming) for more details about the hybrid-programming model. + +## Use naive engine + +It is also useful to set the environment variable `MXNET_ENGINE_TYPE='NaiveEngine'` prior to running your (end-to-end) code. This setting disables multi-threading and the execution engine will be synchronous, so you can examine the backtrace more easily. Remember to change it back to either the default `'ThreadedEnginePerDevice'` or `'ThreadedEngine'`. + +For more details, here is a comprehensive tutorial on interactive debugging on [YouTube](https://www.youtube.com/watch?v=6-dOoJVw9_0). + +## Performance optimization + +Following up on using the environment variable `MXNET_ENGINE_TYPE` for debugging, here are the [available environment variables]({{ "/api/faq/env_var" | relative_url }}) that affect the performance of your code. + +Please refer to [this presentation](https://www.slideshare.net/ThomasDelteil1/debugging-and-performance-tricks-for-mxnet-gluon) for more information on debugging and performance optimization. + diff --git a/docs/static_site/src/pages/api/developer_guide/examine_forward_results_with_hooks.md b/docs/static_site/src/pages/api/developer_guide/examine_forward_results_with_hooks.md new file mode 100644 index 000000000000..cc468037de16 --- /dev/null +++ b/docs/static_site/src/pages/api/developer_guide/examine_forward_results_with_hooks.md @@ -0,0 +1,163 @@ +--- +layout: page_category +title: Examine forward results with hooks +category: Developer Guide +permalink: /api/dev-guide/examine_forward_results_with_hooks +--- + + + + + + + + + + + + + + + + + +# Examine forward results with hooks + +There are currently three ways to register a function in an MXNet Gluon Block for execution: + +* before `forward` via [register_forward_pre_hook]({{"/api/python/docs/api/gluon/block.html#mxnet.gluon.Block.register_forward_pre_hook" | relative_url }}) +* after `forward` via [register_forward_hook]({{"/api/python/docs/api/gluon/block.html#mxnet.gluon.Block.register_forward_hook" | relative_url }}) +* as a callback via [register_op_hook]({{"/api/python/docs/api/gluon/block.html#mxnet.gluon.Block.register_op_hook" | relative_url }}) + +## Pre-forward hook + +To register a hook prior to forward execution, the requirement is that the registered operation **should not modify the input or output**. For example: `hook(block, input) -> None`. This is useful to get a summary before execution. + +``` +import mxnet as mx +from mxnet.gluon import nn + +block = nn.Dense(10) +block.initialize() +print("{}".format(block)) +# Dense(None -> 10, linear) + +def pre_hook(block, input) -> None: # notice it has two arguments, one block and one input + print("{}".format(block)) + return + +# register +pre_handle = block.register_forward_pre_hook(pre_hook) +input = mx.nd.ones((3, 5)) +print(block(input)) + +# Dense(None -> 10, linear) +# [[ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495 +# -0.07610667 0.1562067 0.14192852 0.04463106] +# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495 +# -0.07610667 0.1562067 0.14192852 0.04463106] +# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495 +# -0.07610667 0.1562067 0.14192852 0.04463106]] +# +``` + +We can `detach` a hook from a block: + + +``` +pre_handle.detach() +print(block(input)) + +# [[ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495 +# -0.07610667 0.1562067 0.14192852 0.04463106] +# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495 +# -0.07610667 0.1562067 0.14192852 0.04463106] +# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495 +# -0.07610667 0.1562067 0.14192852 0.04463106]] +# +``` + +Notice `Dense(None -> 10, linear)` is not displayed anymore. + +## Post-forward hook + +Registering a hook after forward execution is very similar to pre-forward hook (as explained above) with the difference that the hook signature should be `hook(block, input, output) -> None` where **hook should not modify the input and output.** Continuing from the above example: + + +``` +def post_hook(block, intput, output) -> None: + print("{}".format(block)) + return + +post_handle = block.register_forward_hook(post_hook) +print(block(input)) + +# Dense(5 -> 10, linear) +# [[ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495 +# -0.07610667 0.1562067 0.14192852 0.04463106] +# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495 +# -0.07610667 0.1562067 0.14192852 0.04463106] +# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495 +# -0.07610667 0.1562067 0.14192852 0.04463106]] +# +``` + + +Notice the difference between `pre_hook` and `post_hook` results due to shape inference after `forward` is done executing. + +## Callback hook + +We can register a callback monitor to monitor all operators that are called by the `HybridBlock` **after hybridization** with `register_op_hook(callback, monitor_all=False) ` where the callback signature should be: + + +``` +callback(node_name: str, opr_name: str, arr: NDArray) -> None +``` + +where `node_name` is the name of the tensor being inspected (str), `opr_name` is the name of the operator producing or consuming that tensor (str) and `arr` the tensor being inspected (NDArray). + + +``` +import mxnet as mx +from mxnet.gluon import nn + +def mon_callback(node_name, opr_name, arr): + print("{}".format(node_name)) + print("{}".format(opr_name)) + return + +model = nn.HybridSequential(prefix="dense_") +with model.name_scope(): + model.add(mx.gluon.nn.Dense(2)) + +model.initialize() +model.hybridize() +model.register_op_hook(mon_callback, monitor_all=True) +print(model(mx.nd.ones((2, 3, 4)))) + +# b'dense_dense0_fwd_data' +# b'FullyConnected' +# b'dense_dense0_fwd_weight' +# b'FullyConnected' +# b'dense_dense0_fwd_bias' +# b'FullyConnected' +# b'dense_dense0_fwd_output' +# b'FullyConnected' +# [[-0.05979988 -0.16349721] +# [-0.05979988 -0.16349721]] +# +``` + + +Setting `monitor_all=False` will print only the output: + + +``` +`# b'dense_dense0_fwd_output'` +`# b'FullyConnected'`` +# [[-0.05979988 -0.16349721] +# [-0.05979988 -0.16349721]] +# + + + + + + + + + + + + + + + + +# Exception handing and custom error types + + +Apache MXNet v1.7 has added the custom error type support and as a result `MXNetError` is inherited from `RuntimeError` so it is possible to register a custom error type in the backend and prepend its error message. Then in the frontend, one can throw the exception of the registered error type. + +For example, we want the `transpose` operator defined in the C++ backend to throw `ValueError` type in the Python frontend. Therefore, in the C++ backend we can add this check: + +``` +CHECK_EQ(axes_set.size(), axes.ndim()) << "ValueError: Repeated axis in transpose." + << " param.axes = " + << param.axes; +``` + +so that on the frontend, when a problematic `transpose` call is made such as: + +``` +from mxnet import np + +dat = np.random.normal(0, 1, (3, 4, 5)) +dat.transpose((0, 0, 1)) +``` + +the following traceback will be produced: + + +``` +ValueError Traceback (most recent call last) + in +----> 1 dat.transpose((0, 0, 1)) + +~/mxnet-distro/mxnet-build/python/mxnet/numpy/multiarray.py in transpose(self, *axes) + 1460 elif axes[0] is None: + 1461 axes = None +-> 1462 return _mx_np_op.transpose(self, axes=axes) + 1463 + 1464 def flip(self, *args, **kwargs): +~/mxnet-distro/mxnet-build/python/mxnet/ndarray/register.py in transpose(a, axes, out, name, **kwargs) + +~/mxnet-distro/mxnet-build/python/mxnet/_ctypes/ndarray.py in _imperative_invoke(handle, ndargs, keys, vals, out, is_np_op, output_is_list) + 105 c_str_array(keys), + 106 c_str_array([str(s) for s in vals]), +--> 107 ctypes.byref(out_stypes))) + 108 + 109 create_ndarray_fn = _np_ndarray_cls if is_np_op else _ndarray_cls + +~/mxnet-distro/mxnet-build/python/mxnet/base.py in check_call(ret) + 271 """ + 272 if ret != 0: +--> 273 raise get_last_ffi_error() + 274 + 275 +ValueError: Traceback (most recent call last): + File "src/operator/numpy/np_matrix_op.cc", line 77 + +ValueError: Check failed: axes_set.size() == axes.ndim() (2 vs. 3) : Repeated axis in transpose. param.axes = [0,0,1] +``` + + +Note that as of writing this document, the following Python error types are supported: + + +* `ValueError` +* `TypeError` +* `AttributeError` +* `IndexError` +* `NotImplementedError` + +Check [this](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/error.py) resource for more details +about Python supported error types that MXNet supports. + +## How to register a custom error type + +Here is the way to register a custom error type in Python frontend: + + +``` +import mxnet as mx + +@mx.error.register +class MyError(mx.MXNetError): + def __init__(self, msg): + super().__init__(msg) +``` + +Then in the C++ backend, you can refer to `MyError` via: + +`LOG(FATAL) << "MyError: this is a custom error message"` diff --git a/docs/static_site/src/pages/api/developer_guide/profiling.md b/docs/static_site/src/pages/api/developer_guide/profiling.md new file mode 100644 index 000000000000..841c00891b6b --- /dev/null +++ b/docs/static_site/src/pages/api/developer_guide/profiling.md @@ -0,0 +1,279 @@ +--- +layout: page_category +title: Profiling +category: Developer Guide +permalink: /api/dev-guide/profiling +--- + + + + + + + + + + + + + + + + + +# Profiling + +Apache MXNet provides memory [profiler]({{"/api/python/docs/api/mxnet/profiler/index.html" | relative_url }}) which is a way to access what is happening under the hood during runtime. The common scenario is you want to use the profiler for your hybridized model and visualize the outputs via `chrome://tracing`. Here are the steps you need to do: + +1. Configure the profiler +2. `set_state('run')` before the model is defined +3. Add `mx.nd.waitall()` to enforce synchronization after you have done with some computation (maybe as part of training) +4. Then add `set_state('stop')` +5. Finally `dump` the profiling results + + +Here is a simple example + +``` +import mxnet as mx +from mxnet.gluon import nn +from mxnet import profiler + +def enable_profiler(profile_filename, run=True, continuous_dump=False, aggregate_stats=False): + profiler.set_config(profile_symbolic=True, + profile_imperative=True, + profile_memory=True, + profile_api=True, + filename=profile_filename, + continuous_dump=continuous_dump, + aggregate_stats=aggregate_stats) + if run: + profiler.set_state('run') + +enable_profiler(profile_filename='test_profiler.json', run=True, continuous_dump=True) +profiler.set_state('run') + +model = nn.HybridSequential(prefix='net_') +with model.name_scope(): + model.add(nn.Dense(128, activation='tanh')) + model.add(nn.Dropout(0.5)) + model.add(nn.Dense(64, activation='tanh'), + nn.Dense(32, in_units=64)) + model.add(nn.Activation('relu')) +model.initialize(ctx=mx.cpu()) +model.hybridize() + +inputs = mx.sym.var('data') + +with mx.autograd.record(): + out = model(mx.nd.zeros((16, 10), ctx=mx.cpu())) +out.backward() +mx.nd.waitall() +profiler.set_state('stop') +profiler.dump(True) +``` + +And in `chrome://tracing` use the `load` and select `test_profiler.json`, then you will see something like this +![dev_guide_profilling_1](/assets/img/dev_guide_profilling_1.png) To understand what is going on, we need to dive deep into the MXNet runtime. + +## Dive deep into MXNet runtime with the profiler + +Let's start with a simple example and explain as we go on. The following code creates a 3x3 tensor, computes the diagonal and then sum's along the diagonal (to compute the “trace”). Using the MXNet profiler, we capture internal MXNet behavior and dump it to a string and print it (`dumps()`) and also dump it to a file (`dump()`). Then we can import that file in `chrome://tracing` and view it graphically. + +``` +import mxnet as mx +import numpy as np + +from mxnet import profiler + +#configure the profiler +profiler.set_config(profile_all=True, aggregate_stats=True, filename='trace_profile.json') +#start the profiler collecting data +profiler.set_state('run') + +########################################################### +#1. create our data +data = np.linspace(1,9,9).reshape((3,3)) + +#2. create an MXNet ndarray +a = mx.nd.array(data) + +#3. compute on our data and produce results +b = mx.nd.diag(a) +c = mx.nd.sum(b,-1) + +#4. wait for computation to finish +mx.nd.waitall() +########################################################### + +#stop the profiler +profiler.set_state('stop') + +#dump the profiling data as a string +print(profiler.dumps()) +#dump the profiling data as a json file that can be viewed graphically +profiler.dump() +``` + +When running this code, the dumps function dumps the profiling data to a string and returns it (which we promptly print). This statistical info is shown below. + +``` +Profile Statistics: + Note the difference in units for different entries. +Device Storage +================= +Name Total Count Min Use (kB) Max Use (kB) Avg Use (kB) +---- ----------- ------------- ------------- ------------- +Memory: cpu/0 3 96.0600 96.0760 0.0080 + +MXNET_C_API +================= +Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) +---- ----------- --------- ------------- ------------- ------------- +MXImperativeInvokeEx 2 0.3360 0.0990 0.2370 0.1680 +MXNet C API Calls 17 0.2320 0.2160 0.2320 0.0080 +MXNDArraySyncCopyFromCPU 1 0.1750 0.1750 0.1750 0.1750 +MXNDArrayCreateEx 1 0.1050 0.1050 0.1050 0.1050 +MXNDArrayGetShapeEx 11 0.0210 0.0000 0.0160 0.0019 +MXNDArrayWaitAll 1 0.0200 0.0200 0.0200 0.0200 +MXNDArrayGetDType 1 0.0010 0.0010 0.0010 0.0010 +MXNet C API Concurrency 34 0.0000 0.0000 0.0010 0.0000 + +operator +================= +Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) +---- ----------- --------- ------------- ------------- ------------- +sum 1 0.0520 0.0520 0.0520 0.0520 +diag 1 0.0410 0.0410 0.0410 0.0410 +WaitForVar 1 0.0220 0.0220 0.0220 0.0220 +``` + +The dump function writes out the same data in a format that can be opened in `chrome://tracing` and displayed visually. This can be seen in the diagram below. + +![dev_guide_profilling_2.png](/assets/img/dev_guide_profilling_2.png) +The profiling data has captured info about interesting functions that have executed while your program was running. Here are some explanations about what each one does. + +### **The functions in the C_API are:** + +|**Function Name** |**Description** | +|--- |--- | +|**MXImperativeInvokeEx** | invokes an operator to perform the computation | +|**MXNDArrayCreateEx** | creates an ndarray | +| **MXNDArrayGetDType** | returns the data type of the ndarray | +| **MXNDArrayGetShape** | returns the shape of the ndarray (as a tuple where each element is the size of a dimension) | +| **MXNDArraySyncCopyFromCPU** | called when data is initially residing outside of an MXNet data structure (ie. numpy.ndarry rather than mxnet.numpy.ndarray). Data is copied into the MXNet data structure | +| **MXNDArrayWaitAll** | wait for all asynchronous operations to finish in MXNet. This function is only used in benchmarking to wait for work to happen. In a real program, there is no waiting and data dependencies are evaluated and computation executed as needed in a As Late As Possible (ALAP) way | + +### **The function in the Engine API are:** + +| **Function Name** | **Description** | +|--- |--- | +| **WaitForVar** | Takes a variable reference as input and waits until that variable has been computed before returning | + +### **Other API functions:** + +| **Function Name** | **Description** | +|--- |--- | +| **ResourceParallelRandomSetSeed** | sets the random number generator seed | + +### **Operators we intended to call in the code:** + +| **Operator Name** | **Description** | +|--- |--- | +| **sum** | sum a tensor along a particular axis | +| **diag** | compute the diagonal of the tensor | + + + +## Closer look + +From the code, we can identify the major events in our test application + +1. Initialize our input data +2. Creating a new MXNet ndarray using our existing data values +3. Compute on our data + 1. produce the diagonal of the input data + 2. sum along the diagonal to compute the “trace” of the matrix +4. Wait for computation to finish (only needed when profiling) + +In the following list, #1 uses regular numpy functions to initialize data. MXNet is not involved in this process. In #2, we create an MXNet ndarray and quite a few things happen under the hood. The screenshot below shows a zoomed in portion of the timeline. + +![dev_guide_profilling_3.png](/assets/img/dev_guide_profilling_3.png) +Here, the four red arrows show the important events in this sequence. + +1. First, the `MXNDArrayCreateEx` is called to physically allocate space to store the data and other necessary attributes in the `ndarray` class. +2. Then some support functions are called (`MXNDArrayGetShape,` `MXNDArrayGetDType`) while initialing the data structure. +3. Finally the data is copied from the non-MXNet ndarray into the newly prepared MXNet ndarray by the `MXNDArraySyncCopyFromCPU` function. + +Next, #3 (in our code example) begins the computing process to produce our output data. The screenshot below shows this behavior. + +![dev_guide_profilling_4.png](/assets/img/dev_guide_profilling_4.png) +Here you can see that the following sequence of events happen: + +1. `MXImperativeInvokeEx` is called the first time to launch the diagonal operator from #3 (in our code example). +2. Soon after that the actual **`diag`** operator begins executing in another thread. +3. While that is happening, our main thread moves on and calls `MXImperativeInvokeEx` again to launch the **`sum`** operator. Just like before, this returns without actually executing the operator and continues. +4. Lastly, the `MXNDArrayWaitAll` is called as the main thread has progressed to #4 in our app. It will wait here while all the computation finishes. + +Next lets look at a view of the part of the timeline zoomed to the actual operator execution. + +![dev_guide_profilling_5.png](/assets/img/dev_guide_profilling_5.png) +Here there are 3 main events happening: + +1. The **`diag`** operator is executing first. +2. Then the `ResourceParallelRandomSetSeed` runs. +3. And finally the `sum` operator executes (for a very short time as shown by the big red arrow). + +The `diag` operator running makes sense (although seems to take a little longer than we'd like). At the end, the sum operator runs (very quickly!). But the weird part in the middle is **`ResourceParallelRandomSetSeed`** running. This is part of the MXNet resource manager. The resource manager handles temporary space and random number generators needed by the operators. The **`sum`** operator requests temporary space in order to compute the sum, and therefore launches the resource manager (for the first time) here. As part of its startup sequence, the random number generator is initialized by setting the seed. So this is some initialization overhead. But let's try and run the app again, running the compute twice, and look at the 2nd run to try and remove this initialization from our profiling. + +Here is the modified code: + +``` +import mxnet as mx +import numpy as np + +from mxnet import profiler + +profiler.set_config(profile_all=True, aggregate_stats=True, filename='trace_profile.json') +profiler.set_state('run') + +################ +# first run +sdata = np.linspace(1,9,9).reshape((3,3)) + +sa = mx.nd.array(sdata) +sb = mx.nd.diag(sa) +sc = mx.nd.sum(sb,-1) + +mx.nd.waitall() +################ + +################ +# second run +data = np.linspace(1,9,9).reshape((3,3)) + +a = mx.nd.array(data) +b = mx.nd.diag(a) +c = mx.nd.sum(b,-1) + +mx.nd.waitall() +################ + +profiler.set_state('stop') + +print(profiler.dumps()) +profiler.dump() +``` + +Notice that we renamed the variables and made another copy after the `waital` call. This is so that MXNet doesn’t have to worry about re-using variables, and to segment the 2nd half after the first time initialization. + +Here is an overview of the *new* timeline: + +![dev_guide_profilling_6.png](/assets/img/dev_guide_profilling_6.png) +The first red box is the first run, and the 2nd smaller one is the 2nd run. First off, we can see how much smaller the 2nd one is now without any of the initialization routines. Here is a zoomed in view of just the 2nd run. + + +![dev_guide_profilling_7.png](/assets/img/dev_guide_profilling_7.png) +We still have the same sequence of events at the beginning to initialize the MXNet ndarray (`MXNDArrayCreateEx`, `MXNDArrayGetShape`, `MXNDArrayGetDType`, `MXNDArraySyncCopyFromCPU`). Then the **`diag`** operator runs, followed by the **`sum`** operator, and finally the `waitall`. When you look at this, be careful about the assumptions that you make. In this version of the timeline, it appears that the operator executes after the `MXImperativeInvokeEx` runs, and seems to imply an inherent ordering. But realize that there is no dependency between the **`diag`** operator finishing and the next **`MXImperativeInvokeEx`** launching the **`sum`** operator. In this case, it just-so-happens that the **`diag`** operator finishes so quickly that it appears that way. But in reality the main thread is launching the operators and not waiting for them to finish. Lastly, keep in mind that in this case by the time we hit the **`MXNDArrayWaitAll`** everything is already done and we return immediately, but in other circumstances it may sit here waiting for everything to finish (like we saw earlier in the first run). + +