This is a tool to estimate theoretical computational cost of a chainer-based neural network forward pass.
You can analyze
- Theoretical amount of floating point arithmetics (FLOPs)
- Theoretical amount of memory read and write (mread/mwrite)
For each layer (we call them computational costs).
Also, summary of these computational costs for each layer-type, and total cost can be calculated.
The computational costs this tool estimates are all theoretical number, by assuming a most straightforward naive implementation, for each layer. Therefore, for example, the following factors are NOT considered.
- Layer fusion
- It is general to fuse Conv-BN-ReLU stack into one layer, that reduces memory transfer significantly.
- Techniques for speeding-up convolution
- Lowering (im2col)
- Winograd
In addition, for now, they are not in cosideration either.
- Mixed precision network
- For example, keeping conv weights in FP16, saves as FP32
- chainer-computational-cost uses data type of input tensor of a layer
Otherwise, these costs are also exluceded from computational cost.
- Inside-layer working memory read/write
- Assume that every layers read only the input data and parameters from memory and write only the final results back to memory
- Index calculation
- For example,
shift
andtranspose
heavily relies on src/dst index calculation but these are ignored
- For example,
- Broadcasting
However,
- Duplicated memory access are ignored
- In the most naive implementation, for example ResizeImages and Conv will read memory from the same region repeatedly. chainer-computational-cost treats such cases as just reading the entire input only once
- Python 2.7, 3.5, 3.6 and 3.7
- Chainer 4.x, 5.x and 6.0 (including ChainerX)
- six==1.12.0
- (optional) texttable >= 1.4.0
- (optional) pytest >= 4.2.1
% pip install chainer_computational_cost
Manual installation by
% git clone git@github.com:belltailjp/chainer_computational_cost.git
% cd chainer_computational_cost
% python setup.py
or
% pip install git+https://github.com/belltailjp/chainer_computational_cost.git
import chainer
import chainer.links as L
import numpy as np
from chainer_computational_cost import ComputationalCostHook
net = L.VGG16Layers()
x = np.random.random((1, 3, 224, 224)).astype(np.float32)
with chainer.no_backprop_mode(), chainer.using_config('train', False):
with ComputationalCostHook(fma_1flop=True) as cch:
y = net(x)
cch.show_report(unit='G', mode='md')
It will show the following table to stdout in markdown table format.
Layer name | GFLOPs | MemRead GiB | MemWrite GiB | MemR+W GiB |
---|---|---|---|---|
Convolution2DFunction-1 | 0.087 | 0.001 | 0.012 | 0.013 |
ReLU-1 | 0.003 | 0.012 | 0.012 | 0.024 |
Convolution2DFunction-2 | 1.85 | 0.012 | 0.012 | 0.024 |
ReLU-2 | 0.003 | 0.012 | 0.012 | 0.024 |
MaxPooling2D-1 | 0.002 | 0.012 | 0.003 | 0.015 |
Convolution2DFunction-3 | 0.925 | 0.003 | 0.006 | 0.009 |
ReLU-3 | 0.002 | 0.006 | 0.006 | 0.012 |
Convolution2DFunction-4 | 1.85 | 0.007 | 0.006 | 0.013 |
ReLU-4 | 0.002 | 0.006 | 0.006 | 0.012 |
MaxPooling2D-2 | 0.001 | 0.006 | 0.001 | 0.007 |
Convolution2DFunction-5 | 0.925 | 0.003 | 0.003 | 0.006 |
ReLU-5 | 0.001 | 0.003 | 0.003 | 0.006 |
Convolution2DFunction-6 | 1.85 | 0.005 | 0.003 | 0.008 |
ReLU-6 | 0.001 | 0.003 | 0.003 | 0.006 |
Convolution2DFunction-7 | 1.85 | 0.005 | 0.003 | 0.008 |
ReLU-7 | 0.001 | 0.003 | 0.003 | 0.006 |
MaxPooling2D-3 | 0.001 | 0.003 | 0.001 | 0.004 |
Convolution2DFunction-8 | 0.925 | 0.005 | 0.001 | 0.007 |
ReLU-8 | 0.0 | 0.001 | 0.001 | 0.003 |
Convolution2DFunction-9 | 1.85 | 0.01 | 0.001 | 0.012 |
ReLU-9 | 0.0 | 0.001 | 0.001 | 0.003 |
Convolution2DFunction-10 | 1.85 | 0.01 | 0.001 | 0.012 |
ReLU-10 | 0.0 | 0.001 | 0.001 | 0.003 |
MaxPooling2D-4 | 0.0 | 0.001 | 0.0 | 0.002 |
Convolution2DFunction-11 | 0.462 | 0.009 | 0.0 | 0.01 |
ReLU-11 | 0.0 | 0.0 | 0.0 | 0.001 |
Convolution2DFunction-12 | 0.462 | 0.009 | 0.0 | 0.01 |
ReLU-12 | 0.0 | 0.0 | 0.0 | 0.001 |
Convolution2DFunction-13 | 0.462 | 0.009 | 0.0 | 0.01 |
ReLU-13 | 0.0 | 0.0 | 0.0 | 0.001 |
MaxPooling2D-5 | 0.0 | 0.0 | 0.0 | 0.0 |
Reshape-1 | 0.0 | 0.0 | 0.0 | 0.0 |
LinearFunction-1 | 0.103 | 0.383 | 0.0 | 0.383 |
ReLU-14 | 0.0 | 0.0 | 0.0 | 0.0 |
LinearFunction-2 | 0.017 | 0.063 | 0.0 | 0.063 |
ReLU-15 | 0.0 | 0.0 | 0.0 | 0.0 |
LinearFunction-3 | 0.004 | 0.015 | 0.0 | 0.015 |
Softmax-1 | 0.0 | 0.0 | 0.0 | 0.0 |
total | 15.488 | 0.623 | 0.107 | 0.729 |
If you call show_summary_report
method,
it will show summary for each type of layer.
Layer type | # Layers | GFLOPs | MemRead GiB | MemWrite GiB | MemR+W GiB |
---|---|---|---|---|---|
Convolution2DFunction | 13 | 15.347 | 0.089 | 0.05 | 0.139 |
ReLU | 15 | 0.014 | 0.05 | 0.05 | 0.101 |
MaxPooling2D | 5 | 0.005 | 0.023 | 0.006 | 0.029 |
Reshape | 1 | 0.0 | 0.0 | 0.0 | 0.0 |
LinearFunction | 3 | 0.124 | 0.461 | 0.0 | 0.461 |
Softmax | 1 | 0.0 | 0.0 | 0.0 | 0.0 |
total | 38 | 15.488 | 0.623 | 0.107 | 0.729 |
Estimation values can be accessed through instance method of
ComputationalCostHook
.
layer_report
- Layer-wise computational-cost estimations
summary_report
- Computational costs summarized for each layer types
ignored_layers
- List of layers that are not yet supported by chainer-computational-cost
total_report
- Total computational costs of the entire NN (layers caught during the lifetime of the hook object)
As for basic usage, please refer to the avobe quickstart.
When fma_1flop
is set to True
, chainer_computational_cost considers
FMA (fused multiply and add, ax + b
) as one operation.
Otherwise, it counts as 2 operations.
This affects to convolution and linear layers' estimation.
Estimated computational cost table is reported by calling show_report
and show_summary_report
method.
These have several options as explained below.
Currently it supports the following modes.
- CSV mode (
mode='csv'
) - default - Markdown table (
mode='md'
) - Prettified table (
mode='table'
)
>>> cch.show_summary_report(unit='G', mode='table')
+-----------------------+----------+--------+---------+----------+--------+
| Layer type | # Layers | GFLOPs | MemRead | MemWrite | MemR+W |
| | | | GiB | GiB | GiB |
+=======================+==========+========+=========+==========+========+
| Convolution2DFunction | 13 | 15.347 | 0.089 | 0.050 | 0.139 |
+-----------------------+----------+--------+---------+----------+--------+
| ReLU | 15 | 0.014 | 0.050 | 0.050 | 0.101 |
+-----------------------+----------+--------+---------+----------+--------+
| MaxPooling2D | 5 | 0.005 | 0.023 | 0.006 | 0.029 |
+-----------------------+----------+--------+---------+----------+--------+
| Reshape | 1 | 0 | 0 | 0 | 0 |
+-----------------------+----------+--------+---------+----------+--------+
| LinearFunction | 3 | 0.124 | 0.461 | 0 | 0.461 |
+-----------------------+----------+--------+---------+----------+--------+
| Softmax | 1 | 0 | 0 | 0 | 0 |
+-----------------------+----------+--------+---------+----------+--------+
| total | 38 | 15.488 | 0.623 | 0.107 | 0.729 |
+-----------------------+----------+--------+---------+----------+--------+
Report is by default written to stdout.
You can specify stream (e.g. file object) to dst
argument of
show_report
and show_summary_report
.
cch.show_report(ost=sys.stderr, unit='G', mode='md')
cch.show_summary_report(ost=sys.stderr, unit='G', mode='md')
The following unit prefixes are supported by unit
argument of
show_report
and show_summary_report
.
None
: if you want to use raw valuesK
: 10^3 (for FLOPs) or 1024^1 (for memory report)M
: 10^6 or 1024^2G
: 10^9 or 1024^3T
: 10^12 or 1024^4
For memory report, the unit will be shown as like KiB
or MiB
instead of
KB
.
You can specify how many digits after the decimal point to show to
n_digits
argument of show_report
and show_summary_report
.
By default it is set to 3.
Possible value is between 0 (round to integer) to 10.
If None
is specified it is treated as 10.
Be noted that you do not need to worry about numerical error in summary report due to the rounding, because summary values are calculated before rounding.
>>> cch.show_summary_report(unit='G', mode='table', n_digits=8)
+-----------------------+----------+-------------+------------+------------+------------+
| Layer type | # Layers | GFLOPs | MemRead | MemWrite | MemR+W |
| | | | GiB | GiB | GiB |
+=======================+==========+=============+============+============+============+
| Convolution2DFunction | 13 | 15.34663066 | 0.08864903 | 0.05046844 | 0.13911748 |
+-----------------------+----------+-------------+------------+------------+------------+
| ReLU | 15 | 0.01355571 | 0.05049896 | 0.05049896 | 0.10099792 |
+-----------------------+----------+-------------+------------+------------+------------+
| ... | ... | ... | ... | ... | ... |
You can specify which column to show as a table to
columns
argument of show_report
and show_summary_report
.
There are two ways to customize columns.
The first way is to make use of predefined columns set.
There are some column definitions in SummaryColumns
for
show_summary_report
, and ReportColumns
for show_report
, respectively.
>>> cch.show_summary_report(unit='G', mode='table', columns=SummaryColumns.ALL)
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| Layer type | # Layers | GFLOPs | MemRead | MemWrite | MemR+W | FLOPs | MemRead | MemWrite | MemR+W |
| | | | GiB | GiB | GiB | (%) | (%) | (%) | (%) |
+=======================+==========+========+=========+==========+========+=========+=========+==========+=========+
| Convolution2DFunction | 13 | 15.347 | 0.089 | 0.05 | 0.139 | 99.085% | 14.237% | 47.297% | 19.073% |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| ReLU | 15 | 0.014 | 0.05 | 0.05 | 0.101 | 0.088% | 8.11% | 47.325% | 13.847% |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| MaxPooling2D | 5 | 0.005 | 0.023 | 0.006 | 0.029 | 0.03% | 3.662% | 5.343% | 3.908% |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| Reshape | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0% | 0.0% | 0.0% | 0.0% |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| LinearFunction | 3 | 0.124 | 0.461 | 0.0 | 0.461 | 0.798% | 73.991% | 0.032% | 63.171% |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| Softmax | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0% | 0.001% | 0.003% | 0.001% |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| total | 38 | 15.488 | 0.623 | 0.107 | 0.729 | 100.0% | 100.0% | 100.0% | 100.0% |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
The other way is to manually specify the column list.
>>> cch.show_report(unit='G', mode='table' , columns=[
... 'name', 'flops', 'mread', 'mwrite', 'mrw', 'output_shapes', "params"
... ])
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
| Layer name | GFLOPs | MemRead | MemWrite | MemR+W | Output shapes | Function parameters |
| | | GiB | GiB | GiB | | |
+==========================+========+=========+==========+========+======================+============================================+
| Convolution2DFunction-1 | 0.087 | 0.001 | 0.012 | 0.013 | [(1, 64, 224, 224)] | k=3, s=1, p=1, d=1, groups=1, nobias=False |
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
| ReLU-1 | 0.003 | 0.012 | 0.012 | 0.024 | [(1, 64, 224, 224)] | |
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
| Convolution2DFunction-2 | 1.850 | 0.012 | 0.012 | 0.024 | [(1, 64, 224, 224)] | k=3, s=1, p=1, d=1, groups=1, nobias=False |
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
| ... | ... | ... | ... | ... | ... | ... |
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
| total | 15.488 | 0.623 | 0.107 | 0.729 | | |
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
Once you let cch
gather the computational costs as explained above,
you can access to the report information directly.
>>> cch.layer_report
It is a huge dictionary whose structure is:
{
"Convolution2DFunction-1": {
"name": "Convolution2DFunction-1",
"type": "Convolution2DFunction",
"flops": 86704128,
"mread": 609280,
"mwrite": 12845056,
"mrw": 13454336,
"traceback": "...",
"input_shapes": [[1, 3, 224, 224], [64, 3, 3, 3], [ 4]],
"output_shapes": [[1, 64, 224, 224]],
"params": {"k": 3, "s": 1, "p": 1, "d": 1, "groups": 1, "nobias": false},
"flops%": 0.5597995752663483,
"mread%": 0.09112725854650683,
"mwrite%": 11.21102960110868,
"mrw%": 1.7179140987382209
},
"ReLU-1": {
"name": "ReLU-1",
"type": "ReLU",
"flops": 3211264,
"mread": 12845056,
"mwrite": 12845056,
"mrw": 25690112,
"traceback": ...,
"input_shapes": [[1, 64, 224, 224]],
"output_shapes": [[1, 64, 224, 224]],
"params": {},
"flops%": 0.020733317602457342,
"mread%": 1.9211770272392967,
"mwrite%": 11.21102960110868,
"mrw%": 3.2802366168768162
},
...
}
Also, summary report can be found. This contains total costs for each type of layers.
>>> cch.summary_report
{
"Convolution2DFunction": {
"type": "Convolution2DFunction",
"name": "Convolution2DFunction",
"flops": 15346630656,
"n_layers": 13,
"mread": 95186176,
"mwrite": 54190080,
"mrw": 149376256,
"flops%": 99.08452482214363,
"mread%": 14.236566554630553,
"mwrite%": 47.29653112967724,
"mrw%": 19.07307623350047
},
"ReLU": {
"type": "ReLU",
"name": "ReLU",
"flops": 13555712,
"n_layers": 15,
"mread": 54222848,
"mwrite": 54222848,
"mrw": 108445696,
"flops%": 0.08752157475169971,
"mread%": 8.109866545469965,
"mwrite%": 47.32513069498619,
"mrw%": 13.846866178002324
},
...
"total": {
"name": "total",
"type": "total",
"flops": 15488423327,
"n_layers": 38,
"mread": 668603456,
"mwrite": 114575168,
"mrw": 783178624,
"flops%": 100.0,
"mread%": 100.0,
"mwrite%": 100.0,
"mrw%": 100.0
}
}
Please see DETAILS.md.
Layer types supported are listed in the next section.
In case you need an unsupported layer or you have your custom layer, you can insert a cost calculator.
def custom_calculator(func, in_data, **kwargs)
...
return (0, 0, 0)
with chainer.no_backprop_mode(), chainer.using_config('train', False):
with ComputationalCostHook(fma_1flop=True) as cch:
cch.add_custom_cost_calculator(F.math.basic_math.Add, custom_calculator)
y = x + x # Call Add
cch.report['Add-0'] # you can find custom estimation result
A custom cost calculator has to have the following signature.
- First positional argument:
- A
Function
orFunctionNode
object will be passed.
- A
- Second positional argument
- List of data (could be
numpy.array
,cupy.array
or a scalar) will be fed
- List of data (could be
- Third keyword argument
- Some flags will be fed
fma_1flop: bool
, for example
- Some flags will be fed
Also, a calculator has to return a tuple with the following 4 elements:
- Number of FLOPs in
int
- Focus only on principle floating point operations
- If
fma_1flop=True
is specified inkwargs
, please treat operations that can be fused to an FMA operation as 1 FLOP.
- Memory read (number of elements) in
int
- Memory write (number of elements) in
int
- Not in "bytes". Conversion to bytes is automatically done by
ComputationalCostHook
.
- Not in "bytes". Conversion to bytes is automatically done by
- Parameters to report in
dict
- Function-specific parameters, e.g.
k
ofConvolution2DFunction
.
- Function-specific parameters, e.g.
For more details about how to implement custom cost calculator,
please refer existing implementations located in
chainer_computational_cost/cost_calculators/*.py
.
You can overwrite your custom calculator to existing one.
This is useful when for example the device or environment you're considering
has some special specifications that are different from normal behavior.
e.g. there is an inference engine that doesn't support inplace Reshape
,
whose mread
and mwrite
won't be 0.
If a layer not supported by chainer_computational_cost is used,
it shows a message like
Warning: XXXFunction is not yet supported by ComputationalCostHook, ignored
.
Also, you can access to which layers are ignored.
with ComputationalCostHook() as cch:
...
print(cch.ignored_layers)
It has the following structure.
{
'XXXFunction':
{
'type': 'XXXFunction',
'traceback': '(traceback)'
},
...
}
Adding layer type support is one of the most important contribution.
The specification is almost same as cost calculator for custom layers explaine above.
In addition, functions have to follow the following rules.
- Name should be
calc_xxxx
- Source code name has to correspond to chainer's file arrangement
- For example, BN is in chainer/functions/normalization,
so the cost calculator is implemented in
chainer_computational_cost/cost_calculators/normalization.py
.
- For example, BN is in chainer/functions/normalization,
so the cost calculator is implemented in
Once you properly implemented your cost calculation function,
chainer-computational-cost automatically find and configure.
This auto-discovery mechanism is implemented in
chainer_computational_cost/cost_calculators/__init__.py
.
Also, please write a docstring for each cost calculator in Markdown format. It will be used to generate DETAILS.md. Please refer Documentation section below for more details.
Following the Chainer contribution guide, chainer-computational-cost also requires flake8 and hacking.
% pip install hacking flake8
% flake8 .
We use pytest for unit-testing. Please write sufficient test cases when you support new functions or implement a new feataure.
% pip install pytest
% python -m pytest
In order to check coverage locally, please confirm by following command.
% pip install pytest-cov
% python -m pytest --cov chainer_computational_cost --cov-report html:cov_html
% open cov_html/index.html
Every PRs will be automatically tested by Travis CI and code coverage of the test is monitored by coveralls. Please make sure your PR becomes all green.
(TODO: consider better way)
DETAILS.md is automatically generated by make_details_md.py script.
This script collects docstring of each cost calculation functions. If a cost calculator doesn't have a docstring, it won't appear in DETAILS.md. So please always write docstring for every cost calculators in the following format.
def calc_xxxx(func: XXXX, inputs, **kwargs):
"""XXXX
XXXX is defined as: $y=f(x)$
...
|Item|Value|
|:---|:---|
|FLOPs| $$ 4 \| x \| $$ |
|mread| $$ \| x \| $$ |
|mwrite| $$ \| x \| $$ |
"""
- It supports markdown format
- First line should be function name (No newline between
"""
andXXX
) - 1-line blank
- Detailed explanation
- It is suggested to summarize as a table.
Also, mathematical formula is supported. This is an example of inline equation.
function $y=f(x)$ is ...
And non-inline equation.
function is defined a follows:
$$y=f(x)$$
Both inline and non-inline formula, please write in one line.
# NG pattern
$$
y=f(x)
$$
After writing docstring, DETAILS.md can be generated by:
% python make_details_md.py
Then please don't forget to commit new DETAILS.md.
There is not assurance of cost estimation formula, and it might change in the future. Please verify it by yourself if you will use this for critical purposes.
The key concept of chainer-computational-cost is originally developed by t-abe.