Skip to content

Commit

Permalink
** Collage v2 sketch ***
Browse files Browse the repository at this point in the history
- Enable cudnn, get rid of support for op-predicate based BYOC integrations
- Enable cublas
- And yet another go at pruning unnecessary candidates.
- Another go at pruning unnecessary candidates
- Fix CompositePartitionRule use
- Fix a few bugs with new TensorRT pattern-based integration
- Rework RemoveSubCandidatesCombinerRule for soundness
- Better logging
- Bug fixes
- Implement critical nodes idea for avoiding obviously unnecessary candidates
- Promote DataflowGraph from alias to class so can cache downstream index set
- Quick check to avoid unioning candidates which would create a cycle
- Host out CandidatePartitionIndex and add rules to avoid small candidates subsumed by containing candidates
- GetFunction can legitimately return nullptr
- rename tuning log
- Support for int64 literals
- Switch GPT2 to plain model
- Fix library cloberring issue for cutlass
- actually checkin 'built in' tuning log (covers mnist & gpt2 only)
- trying to debug gpt2
- Update TargetKind attribute name
- working through gpt2 issues
- checkin tuning records for MNIST (with hack to not retry failed winograd)
- Autotvm tuning disabled if log file empty (default)
- Autotvm tuning during search working
- tune during search
  (but does not load tuned records after search!)
- About to add tuning to estimate_seconds
- Split out the combiner rules & make them FFI friendly
- Rework comments
- Estimate IRModule instead of Function (closer to meta_schedule iface)
- Add 'host' as first-class partitioning spec
  (Avoids special casing for the 'leave behind for the VM' case)
- Move CollagePartitioner to very start of VM compiler flow (not changing legacy)
- Fix bugs etc with new SubGraph::Rewrite approach
  Ready for updating RFC to focus on partitioning instead of fusion.
- Working again after partition<->fusion split.
- Add PrimitivePartitionRule
- Refactor SubGraph Extract/Rewrite
  *** CAUTION: Almost certainly broken ***
- Rename kernel->partition, fusion->partition
- Next: make nesting in "Primitive" an explicit transform
- respect existing target constraints from device planner
- make 'compiler' and 'fusion_rule' attributes avail on all target kinds
- moved design to tvm-rfcs, apache/tvm-rfcs#62
- incorporate comments
- avoid repeated fusion
- fix trt type checking
- better logs
- pretty print primitive rules
- fix tensorrt
- multiple targets per spec
- don't extract candidate function until need cost
  Need to bring CombineByPrimitives back under control since lost depth limit.
- cleaned up fusion rule names
- added 'fuse anything touching' for BYOC
- Finish dd example
- Add notion of 'MustLower', even if a candidate fires may still need to consider
  leaving node behind for VM (especially for constants).
- starting example
- finished all the dd sections
- documentation checkpoint
- docs checkpoint
- more design
- starting on dd
- runs MNIST with TVM+CUTLASS+TRT
- cutlass function-at-a-time build
- need to account for build_cutlass_kernels_vm
- move cutlass tuning into relay.ext.cutlass path to avoid special case
- add utils
- don't fuse non-scalar constants for tvm target.
- stuck on cuda mem failure on conv2d, suspect bug in main
- where do the cutlass attrs come from?
- running, roughtly
- pretty printing, signs of life
- wire things up again
- Switch SubGraph and CandidateKernel to TVM objects
- naive CombineByKindFusionRule, just to see what we're up agaist
  Will switch to Object/ObjectRef for SubGraph and CandidateKernel to avoid excess copying.
- preparing to mimic FuseOps
- rework SubGraph to use IndexSet
- rough cut at MaximalFusion
- split SubGraph and IndexSet in preparation for caching input/output/entry/exit sets in SubGraph.
- top-down iterative handling of sub-sub-graphs
- about to give up on one-pass extraction with 'sub-sub-graphs'
- Add notion of 'labels' to sub-graphs
- Rework FusionRules to be more compositional
- partway through reworking fusion rules, broken
- SubGraph::IsValid, but still need to add no_taps check
- dataflow rework, preparing for SubGraph::IsValid
- explode into subdir
- mnist with one fusion rule (which fires twice) working
- switch to CandidateKernelIndex
- Confirm can measure 'pre-annotated' primitive functions
- checkpoint
- stuff
- more sketching
- dominator logging
  • Loading branch information
mbs-octoml committed Apr 20, 2022
1 parent 68beae9 commit d03f187
Show file tree
Hide file tree
Showing 98 changed files with 9,962 additions and 886 deletions.
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,7 @@ tvm_file_glob(GLOB_RECURSE RELAY_OP_SRCS
)
tvm_file_glob(GLOB_RECURSE RELAY_PASS_SRCS
src/relay/analysis/*.cc
src/relay/collage/*.cc
src/relay/transforms/*.cc
src/relay/quantize/*.cc
)
Expand Down
15 changes: 15 additions & 0 deletions collage_autotvm.tuninglog
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "conv2d_nchw_winograd.cuda", [["TENSOR", [1, 1, 32, 32], "float32"], ["TENSOR", [8, 1, 5, 5], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "float32"], {}], "config": {"index": 968, "code_hash": null, "entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 2, 4, 1]], ["tile_x", "sp", [-1, 1, 7, 7]], ["tile_rc", "sp", [-1, 1]], ["auto_unroll_max_step", "ot", 128], ["unroll_explicit", "ot", 1]]}, "result": [[1000000000.0], 6, 10, 1648166365.035291], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "conv2d_nchw.cuda", [["TENSOR", [1, 1, 32, 32], "float32"], ["TENSOR", [8, 1, 5, 5], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "float32"], {}], "config": {"index": 748547, "code_hash": null, "entity": [["tile_f", "sp", [-1, 1, 4, 1]], ["tile_y", "sp", [-1, 1, 1, 4]], ["tile_x", "sp", [-1, 1, 14, 1]], ["tile_rc", "sp", [-1, 1]], ["tile_ry", "sp", [-1, 5]], ["tile_rx", "sp", [-1, 5]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": [[2.1807114592422733e-06, 2.182203281316585e-06, 2.183491385782991e-06], 0, 1.8035461902618408, 1648233194.5253587], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "conv2d_nchw_winograd.cuda", [["TENSOR", [1, 8, 18, 18], "float32"], ["TENSOR", [16, 8, 5, 5], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "float32"], {}], "config": {"index": 7905, "code_hash": null, "entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 1, 4, 4]], ["tile_x", "sp", [-1, 1, 49, 1]], ["tile_rc", "sp", [-1, 4]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": [[1.4285206158127155e-05, 1.4285846107313532e-05, 1.4331592281168714e-05], 0, 7.421089172363281, 1648237434.129], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "conv2d_nchw.cuda", [["TENSOR", [1, 8, 18, 18], "float32"], ["TENSOR", [16, 8, 5, 5], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "float32"], {}], "config": {"index": 714012, "code_hash": null, "entity": [["tile_f", "sp", [-1, 1, 8, 1]], ["tile_y", "sp", [-1, 1, 1, 1]], ["tile_x", "sp", [-1, 1, 7, 2]], ["tile_rc", "sp", [-1, 8]], ["tile_ry", "sp", [-1, 5]], ["tile_rx", "sp", [-1, 5]], ["auto_unroll_max_step", "ot", 512], ["unroll_explicit", "ot", 1]]}, "result": [[2.5586838960333487e-06, 2.5701070606157226e-06, 2.572374535019662e-06], 0, 3.1794843673706055, 1648239614.7956486], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_small_batch.gpu", [["TENSOR", [1, 256], "float32"], ["TENSOR", [10, 256], "float32"], null, "float32"], {}], "config": {"index": 4, "code_hash": null, "entity": [["tile_k", "sp", [-1, 16]]]}, "result": [[2.158152404676017e-06, 2.1645748896629425e-06, 2.1784918293729133e-06], 0, 1.6369056701660156, 1648241555.184448], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_large_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [2304, 768], "float32"], null, "float32"], {}], "config": {"index": 61851361, "code_hash": null, "entity": [["tile_x", "sp", [-1, 2, 2, 8]], ["tile_y", "sp", [-1, 1, 2, 9]], ["tile_k", "sp", [-1, 2, 4]]]}, "result": [[0.004074227972972973, 0.0040861373243243244, 0.004086151648648648], 0, 3.037601947784424, 1648251189.6885986], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_small_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [2304, 768], "float32"], null, "float32"], {}], "config": {"index": 5, "code_hash": null, "entity": [["tile_k", "sp", [-1, 8]]]}, "result": [[0.0268318398, 0.026832641350000002, 0.02683273135], 0, 4.179340600967407, 1648254281.8060668], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "batch_matmul.cuda", [["TENSOR", [600, 32, 64], "float32"], ["TENSOR", [600, 32, 64], "float32"], [600, 32, 32], "float32", 0, 1], {}], "config": {"index": 20386, "code_hash": null, "entity": [["tile_y", "sp", [-1, 2, 8]], ["tile_x", "sp", [-1, 16, 1]], ["tile_k", "sp", [-1, 16]], ["auto_unroll_max_step", "ot", 32], ["unroll_explicit", "ot", 1]]}, "result": [[3.258110773592547e-05, 3.258372944511948e-05, 3.261549426218442e-05], 0, 2.397996664047241, 1648255266.3718677], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "batch_matmul.cuda", [["TENSOR", [600, 32, 32], "float32"], ["TENSOR", [600, 64, 32], "float32"], [600, 32, 64], "float32", 0, 1], {}], "config": {"index": 5980, "code_hash": null, "entity": [["tile_y", "sp", [-1, 2, 8]], ["tile_x", "sp", [-1, 16, 1]], ["tile_k", "sp", [-1, 16]], ["auto_unroll_max_step", "ot", 16], ["unroll_explicit", "ot", 0]]}, "result": [[3.199404780823732e-05, 3.199749384187525e-05, 3.200219666269368e-05], 0, 2.3573713302612305, 1648257050.9987426], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_large_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [768, 768], "float32"], null, "float32"], {}], "config": {"index": 13482935, "code_hash": null, "entity": [["tile_x", "sp", [-1, 5, 16, 1]], ["tile_y", "sp", [-1, 4, 16, 2]], ["tile_k", "sp", [-1, 12, 2]]]}, "result": [[0.00026185516898148144, 0.00026186912731481486, 0.0002643642638888889], 0, 5.9183220863342285, 1648262140.4419408], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_small_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [768, 768], "float32"], null, "float32"], {}], "config": {"index": 9, "code_hash": null, "entity": [["tile_k", "sp", [-1, 32]]]}, "result": [[0.0022258066376811595, 0.0022258676666666666, 0.0022260689855072464], 0, 1.6845574378967285, 1648264221.272429], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_large_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [3072, 768], "float32"], null, "float32"], {}], "config": {"index": 75386735, "code_hash": null, "entity": [["tile_x", "sp", [-1, 5, 16, 1]], ["tile_y", "sp", [-1, 2, 16, 4]], ["tile_k", "sp", [-1, 2, 12]]]}, "result": [[0.0009476383928571428, 0.0009476764880952381, 0.0009480008333333333], 0, 3.346571207046509, 1648271350.9854434], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_small_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [3072, 768], "float32"], null, "float32"], {}], "config": {"index": 17, "code_hash": null, "entity": [["tile_k", "sp", [-1, 768]]]}, "result": [[1000000000.0], 4, 4.362995386123657, 1648274146.1389868], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_large_batch.gpu", [["TENSOR", [1600, 3072], "float32"], ["TENSOR", [768, 3072], "float32"], null, "float32"], {}], "config": {"index": 15171048, "code_hash": null, "entity": [["tile_x", "sp", [-1, 5, 4, 20]], ["tile_y", "sp", [-1, 1, 192, 2]], ["tile_k", "sp", [-1, 8, 2]]]}, "result": [[1000000000.0], 1, 1.2985179424285889, 1648274382.1135368], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_small_batch.gpu", [["TENSOR", [1600, 3072], "float32"], ["TENSOR", [768, 3072], "float32"], null, "float32"], {}], "config": {"index": 9, "code_hash": null, "entity": [["tile_k", "sp", [-1, 32]]]}, "result": [[1000000000.0], 4, 4.3437583446502686, 1648274480.7225487], "version": 0.2, "tvm_version": "0.9.dev0"}
3 changes: 2 additions & 1 deletion include/tvm/ir/expr.h
Original file line number Diff line number Diff line change
Expand Up @@ -260,9 +260,10 @@ class GlobalVarNode : public RelayExprNode {
*/
class GlobalVar : public RelayExpr {
public:
TVM_DLL explicit GlobalVar(String name_hint, Type type = {});
TVM_DLL explicit GlobalVar(String name_hint, Type type = {}, Span span = {});

TVM_DEFINE_OBJECT_REF_METHODS(GlobalVar, RelayExpr, GlobalVarNode);
TVM_DEFINE_OBJECT_REF_COW_METHOD(GlobalVarNode);
};

// PrimExprs that are useful as runtime containers.
Expand Down
21 changes: 21 additions & 0 deletions include/tvm/relay/expr.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@
#include "./type.h"

namespace tvm {

GlobalVar WithFields(GlobalVar global_var, Optional<String> opt_name_hint = {},
Optional<Type> opt_type = {},
Optional<VirtualDevice> opt_virtual_device = {},
Optional<Span> opt_span = {});

namespace relay {

using Expr = tvm::RelayExpr;
Expand Down Expand Up @@ -97,8 +103,23 @@ class Constant : public Expr {
TVM_DLL explicit Constant(runtime::NDArray data, Span span = Span());

TVM_DEFINE_OBJECT_REF_METHODS(Constant, RelayExpr, ConstantNode);
TVM_DEFINE_OBJECT_REF_COW_METHOD(ConstantNode);
};

/*!
* \brief Returns the constant with given properties. A null property denotes 'no change'.
* Returns this if all properties are unchanged. Otherwise, returns a copy with the new fields.
* \param constant The constant to copy
* \param op_data The (optional) data for the copied constant. If none, ret_constant->data =
* constant->data.
* \param opt_virtual_device The (optional) virtual_device for the copied constant. If none,
* ret_constant->virtual_device = constant->virtual_device.
* \param opt_span The (optional) span for the copied constant. If none,
* ret_constant->span = constant->span.
*/
Constant WithFields(Constant constant, Optional<runtime::NDArray> opt_data = {},
Optional<VirtualDevice> opt_virtual_device = {}, Optional<Span> opt_span = {});

/*! \brief Tuple of multiple Exprs */
class Tuple;
/*! \brief Tuple container */
Expand Down
2 changes: 2 additions & 0 deletions include/tvm/relay/expr_functor.h
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,8 @@ class MixedModeVisitor : public ::tvm::relay::ExprVisitor {
*/
explicit MixedModeVisitor(int visit_limit = 1);

using ExprVisitor::VisitExpr_;

/*!
* \brief VisitExpr is finalized to preserve call expansion of dataflow regions
*/
Expand Down
2 changes: 1 addition & 1 deletion include/tvm/relay/function.h
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ namespace attr {
/*! \brief Mark the function as a primitive function. */
constexpr const char* kPrimitive = "Primitive";
/*!
* \brief Indicate the compiler that should be used for building this function.
* \brief Indicate the BYOC compiler that should be used for building this function.
* When this is unset or set to "default", the default compilation pipeline will be used.
*/
constexpr const char* kCompiler = "Compiler";
Expand Down
40 changes: 28 additions & 12 deletions include/tvm/relay/op_attr_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,24 +41,40 @@ using tir::BijectiveLayoutNode;
using tir::Layout;
using tir::LayoutAxis;

/*! \brief operator pattern used in graph fusion */
/*!
* \brief Operator pattern used to guide fusion.
*
*
*
*/
enum OpPatternKind {
// Elementwise operation
// Elementwise operator, eg relu.
// \code
// out[i, j, k] = op(in[i, j, k])
// \endcode
// The underlying scalar op can always be moved to the point the input tensor was created.
kElemWise = 0,
// Broadcasting operator, can always map output axis to the input in order.
// for example :code:`out[i, ax1, j, ax2] = input[i, j]`.
// Note that the axis need to be in order so transpose is not a bcast operator.
// Broadcasting operator, eg add.
// As for kElemWise, but some output axes may be broadcasted, and the remaining must correspond
// to input axes in order.
// \code
// out[i, j, k] = op(in[i, j])
// \endcode
// (So transpose is not a kBroadcast).
kBroadcast = 1,
// Injective operator, can always injectively map output axis to a single input axis.
// All injective operator can still be safely fused to injective and reduction.
// Injective operator, eg concat.
// Can always injectively map output axis to a single input axis.
// All kInjecting operators can be fused to kInjective and kCommReduce operators.
// Eg: concatenate
kInjective = 2,
// Communicative reduction operator.
// Communicative reduction operator, eg sum.
kCommReduce = 3,
// Complex operation, can still fuse elemwise operations into its output.
// but cannot chain another complex op
// Complex operation, eg conv2d. Often called the fused sub-graph's 'anchor node'.
// Can fuse kElemWise operations into its output, but cannot fuse additional kOutEWiseFusable
// operations.
kOutEWiseFusable = 4,
// The pattern for tuple nodes. Can fuse into subsequent injective ops,
// but treated specially
// A tuple.
// Can fuse into subsequent injective ops, but treated specially.
kTuple = 7,
// Opaque operation, cannot fuse anything.
kOpaque = 8
Expand Down
5 changes: 5 additions & 0 deletions include/tvm/relay/transform.h
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,11 @@ TVM_DLL Pass InferType();
*/
TVM_DLL Type InferTypeLocal(const Expr& expr);

/*!
* \brief Infer the types of all sub-expression of expr.
*/
TVM_DLL Expr InferTypeExpr(const Expr& expr);

/*!
* \brief Search and eliminate common subexpression. For example, if there are
* two expressions evaluated to an identical value, a single variable is created
Expand Down
2 changes: 2 additions & 0 deletions include/tvm/target/compilation_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,8 @@ class CompilationConfig : public ObjectRef {
TVM_DLL CompilationConfig(const transform::PassContext& pass_ctx, TargetMap legacy_target_map_arg,
Target optional_host_target_arg);

TVM_DLL CompilationConfig(const transform::PassContext& pass_ctx, Array<Target> targets);

TVM_DEFINE_OBJECT_REF_METHODS(CompilationConfig, ObjectRef, CompilationConfigNode);
};

Expand Down
10 changes: 10 additions & 0 deletions include/tvm/target/target.h
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,17 @@ class Target : public ObjectRef {
*/
static Target WithHost(const Target& target, const Target& host);

/*!
* \brief Returns true if \p this is a 'refinement of' \p that. Ie \p this
* and \p that are structurally equivalent except \p this may have 'compiler' and/or 'fusion_rule'
* attributes
*/
bool IsRefinementOf(const Target& that) const;

private:
Target(TargetKind kind, Optional<ObjectRef> host, String tag, Array<String> keys,
Map<String, ObjectRef> attrs);

// enable with syntax.
friend class TargetInternal;
friend class With<Target>;
Expand Down
14 changes: 13 additions & 1 deletion include/tvm/target/target_kind.h
Original file line number Diff line number Diff line change
Expand Up @@ -384,6 +384,16 @@ inline TargetKindRegEntry& TargetKindRegEntry::set_name() {
#define TVM_TARGET_KIND_REGISTER_VAR_DEF \
static DMLC_ATTRIBUTE_UNUSED ::tvm::TargetKindRegEntry& __make_##TargetKind

/* Special attributes on all target kinds:
* "compiler": If set, the BYOC toolchain name this target is specialized to. This name appears:
* - In the BYOC lowering function registered as "ext.relay.<toolchain>".
* - As the "Compiler" attribute on "Primitive" functions.
* - In the operator predicate bound to the operator attribute "target.<toolchain>"
* - In a @register_pattern_table("<toolchain>") annotation.
* "fusion_rule": If set, the FusionRule to use for this target in the CollageFuseOps pass.
* If missing, use built-in rules to derive the required FusionSpec.
*/

/*!
* \def TVM_REGISTER_TARGET_KIND
* \brief Register a new target kind, or set attribute of the corresponding target kind.
Expand Down Expand Up @@ -412,7 +422,9 @@ inline TargetKindRegEntry& TargetKindRegEntry::set_name() {
.add_attr_option<String>("model") \
.add_attr_option<Array<String>>("libs") \
.add_attr_option<Target>("host") \
.add_attr_option<Integer>("from_device")
.add_attr_option<Integer>("from_device") \
.add_attr_option<String>("compiler") \
.add_attr_option<ObjectRef /* actually PartitionRule */>("partition_rule")

} // namespace tvm

Expand Down
2 changes: 1 addition & 1 deletion python/tvm/auto_scheduler/dispatcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ class ApplyHistoryBestOrSample(ApplyHistoryBest):
"""

def __init__(
self, records, sample_simple_workloads=False, cost_model_file=None, num_measure=-1
self, records, sample_simple_workloads=False, cost_model_file=None, num_measure=-1
):
self.sample_simple_workloads = sample_simple_workloads
self.num_measure = num_measure
Expand Down
26 changes: 18 additions & 8 deletions python/tvm/autotvm/task/dispatcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ class DispatchContext(object):
def __init__(self):
self._old_ctx = DispatchContext.current

def contains(self, target, workload):
raise NotImplementedError()

def query(self, target, workload):
"""
Query the context to get the specific config for a template.
Expand Down Expand Up @@ -227,9 +230,11 @@ def load(self, records):

counter = 0
for inp, res in records:
#logger.info(f"inp={inp}, res={res}")
counter += 1
if res.error_no != 0:
continue
#TODO(mbs): Cache error
#if res.error_no != 0:
# continue

# use target keys in tvm target system as key to build best map
for k in inp.target.keys:
Expand All @@ -251,7 +256,12 @@ def load(self, records):
if np.mean(other_res.costs) > np.mean(res.costs):
best_by_model[key] = (inp, res)

logger.debug("Finish loading %d records", counter)
#logger.info("Finished loading %d records", counter)

def contains(self, target, workload):
#logger.info(
# f"look for match with {target} and {workload} with {len(self._best_user_defined)} user-defined, {len(self.best_by_model)} model and {len(self.best_by_targetkey)} target entries")
return self._query_inside(target, workload) is not None

def _query_inside(self, target, workload):
if target is None:
Expand Down Expand Up @@ -311,8 +321,8 @@ def _query_inside(self, target, workload):

if not _env.GLOBAL_SCOPE.silent:
msg = (
"Cannot find config for target=%s, workload=%s. A fallback configuration "
"is used, which may bring great performance regression." % (target, workload)
"Cannot find config for target=%s, workload=%s. A fallback configuration "
"is used, which may bring great performance regression." % (target, workload)
)
if msg not in DispatchContext.warning_messages:
DispatchContext.warning_messages.add(msg)
Expand Down Expand Up @@ -426,9 +436,9 @@ def _query_inside(self, target, workload):
key = (str(target), workload)
if key not in self._global_cfg_dict:
msg = (
"Config for target=%s, workload=%s is missing in ApplyGraphBest context. "
"A fallback configuration is used, which may bring great performance "
"regression." % (target, workload)
"Config for target=%s, workload=%s is missing in ApplyGraphBest context. "
"A fallback configuration is used, which may bring great performance "
"regression." % (target, workload)
)
logger.warning(msg)
cfg = FallbackConfigEntity()
Expand Down
2 changes: 2 additions & 0 deletions python/tvm/contrib/cc.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import sys
import os
import subprocess
import logging

from .._ffi.base import py_str

Expand Down Expand Up @@ -238,6 +239,7 @@ def _linux_compile(output, objects, options, compile_cmd, compile_shared=False):
cmd += objects
if options:
cmd += options
logging.info(f"invoking '{cmd}'")
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
(out, _) = proc.communicate()
if proc.returncode != 0:
Expand Down
Loading

0 comments on commit d03f187

Please sign in to comment.