-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
Just for curious. Based on my knowledge, tvm op kernel is pre-compiled and then linked together with MXNet. How can it be configured according to the runtime input shapes? |
Yes, kernels are pre-compiled. At compile time, several different schedules (kernels) for a single op are defined and compiled. Then at runtime, with the runtime input shape, the most suitable kernel is chosen. It's true that the kernel is pre-compiled, but we have multiple available kernels for one single op, so we can choose the most efficient one based on the runtime input shape. |
@hzfan Thanks for explanation. My next question is, how do I know which schedule is the best one for a certain input shape? Static rule defined or runtime tuned? |
That's a good question. Actually we have considered both options, and for now we use simple static rules. To be more specific, for now I require the size of a for-loop to be multiples of its splitting factor (if the for-loop is splitted). This helps eliminate a if-condition, and thus makes it faster. Runtime tuning has also been considered, but has not been implemented in this version. The idea is to try all the available schedules for every runtime shape, measure their performance, and cache the best choice. This is quite similar to autotvm.
|
Also cc @icemelon9 @kevinthesun |
return diff / repeat | ||
|
||
|
||
def test_tvm_dot(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who uses this? for testing only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for reproducing the benchmark result. Other code in benchmark/
only serves this purpose, too.
conf_path = [p for p in candidates_path if os.path.exists(p) and os.path.isfile(p)] | ||
if len(conf_path) == 0: | ||
raise RuntimeError('Cannot find the TVM op config.\n' + | ||
'List of candidates:\n' + str('\n'.join(candidates_path))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fallback to default behavior if config file is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just a little bit more code I think.
In which case will the config file be missing? It is generated in compile time (even if no tunable parameters are needed, a nearly empty config will be generated too).
def dot(dtype, fallback): | ||
cfg = autotvm.get_config() | ||
cfg.define_knob("bn", [64] if fallback else [64, 32]) | ||
cfg.define_knob("factor", [4] if fallback else [4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it's always [4]
no matter what fallback
is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The difference is that when fallback
is false, the shape comes with a hint, indicating that it is multiples of 4.
This factor
means in any case I want to split the loop by a factor of 4. When fallback, there is no guarantee the loop size is a multiples of 4., while when not fallback, there is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you are trying to get here. My point is that this line is equivalent to the following, correct?
cfg.define_knob("factor", [4])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
from collections import OrderedDict | ||
import numpy as _np | ||
|
||
class OtherOptionSpace(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this GeneralOptionSpace
? Same for other places: other
-> general
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the OtherOptionSpace
comes from tvm/python/tvm/autotvm/task/space.py
. Besides OtherOptionSpace
, there is SplitSpace
, ReorderSpace
and AnnotateSpace
. Maybe in the future the three other spaces will be needed, so I keep its name consistent with tvm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
contrib/tvmop/compile.py
Outdated
if op.dispatch is True: | ||
config_space = autotvm.ConfigSpace() | ||
with autotvm.task.ApplyConfig(config_space): | ||
sch, args = op.func(fallback=False, **each_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires fallback
as a mandatory parameter in op.func
, which is not ideal in terms of usability in my opinion. We should support compiling whatever users define and treat the fallback
knob as an advanced feature for performance tuning.
A way to achieve such purpose is inspect the signature of op.func
for keyword fallback
. If the keyword does not exist, we just compile the op using the default schedule, e.g.
if 'fallback' in str(inspect.signature(op.func)):
sch, args = op.func(fallback=False, **each_kwargs)
else:
sch, args = op.func(**each_kwargs)
@yzhliu What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I set self.dispatchable = 'fallback' in inspect.signature(self.func).parameters
in opdef.py
d69cc01
to
6b6c65f
Compare
2fe9f81
to
587812d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* infra for dispatch tvm op * fix ci and sanity error * disable shape with hint and fix coding style * rename to avoid conflict with original dot * update tvm and use soft link * config file moves to lib/ when using Makefile * add tvmop.conf to ci * fix rebase * fix rebase * use inspect to detect dispatchable func
do we have a developer guide using tvm op? |
Seems @yzhliu is working on it. |
Description
This PR implements an infra to let users dispatch the execution of a tvm operator to different schedules according to the runtime input shapes. This helps with acceleration.
A gemm example can be found in
The following are some experimental results for matrix multiplication between two n * n matrix. Note that benchmark results cannot be reproduced until this gets merged.
The example schedule is roughly equivalent to the Blocking optimization. More opt (like vectorization, loop permutation, array packing, write cache for blocks, parallel) can be used for further acceleration.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments