Skip to content

Commit

Permalink
Squashed commit
Browse files Browse the repository at this point in the history
[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485)

[Meta Schedule][M3c] PostOrderApply (apache#486)

Fix Post Order Apply (apache#490)

[MetaSchedule] Relay Integration (apache#489)

[M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492)

Fix replay trace. (apache#493)

[M3c][Meta Schedule] Implement the Replay Func class. (apache#495)

[PR] Test script for meta-schedule task extraction. Interface to load… (apache#494)

[Meta Schedule Refactor] Get child blocks (apache#500)

Read-at && Write-at (apache#497)

[M3c][Meta Schedule] Measure Callbacks (apache#498)

[Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496)

[MetaSchedule] Sample-Perfect-Tile (apache#501)

[MetaSchedule] TE Workloads (apache#502)

[TensorIR] GetProducer, GetConsumer (apache#506)

[MetaScheduleRefactor] Annotate&Unannotate (apache#505)

[MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503)

[Tests] Add unittests for auto-inline and multi-level-tiling (apache#508)

[Meta Schedule] Minor Fixes (apache#507)

[MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509)

[MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499)

[Meta Schedule] Add Helper Function & Minor Modification (apache#512)

[MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll  (apache#513)

[Meta Schedule] Feature Extractor & Cost Model (apache#510)

Blockize & Tensorize (apache#514)

Layout Rewriting: Suggest-Index-Map (apache#520)

[MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516)

[Meta Schedule] Per-Store-Feature (apache#521)

Add traced schedule for blockize & tensorize (apache#526)

[Meta Schedule] Add XGBoost Model & Random Model (apache#519)

User-Interface: Tune-TIR (apache#525)

User-Interface: Tune-TE (apache#527)

[Minor] More logging on python (apache#528)

Get CUDA tuning working (apache#529)

[MetaSchedule] TensorRT BYOC (apache#518)

[BugFix] LocalBuilder API (apache#531)

[Meta Schedule] Add Cost Model Update Measure Callback (apache#530)

[Bugfix] BuilderInput with default params (apache#532)

[MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534)

[Meta Schedule] Evolutionary Search (apache#522)

[BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535)

[Meta Schedule] Fix some bugs (apache#537)

Initiate Experiments for CPU Performance Alignment with Ansor (apache#538)

[Meta Schedule] Tweak experiment scripts (apache#539)

[Meta Schedule] Initiate experiments on CUDA (apache#540)

[TIR][Schedule] Buffer transform (apache#523)

Auto Tensor Core (apache#524)

Working on Evo Search (apache#542)

[Meta Schedule] Add Replay Tuning Interface (apache#543)

Evolutionary Search on CPU (apache#544)

Misc improvement over the error message (apache#545)

[TIR][Schedule] Software pipelining (apache#533)

[Meta Schedule Refactor] fixing unit tests (apache#547)

[MetaSchedule] Mutator-Compute-Location (apache#548)

Misc Improvement of Evolutionary Search (apache#549)

Hotfix for software pipeline (apache#552)

Misc Improvement (apache#550)

[Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555)

Rule RFactor (apache#551)

[MemHammer] Rewrite Rules (apache#554)

[MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556)

[MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559)

[MetaSchedule] Perf Alignment - NRM on CUDA (apache#560)

[TIR] Reorder the block iters of the blocks generated by RFactor (apache#561)

Removing 2 unit tests for software pipelining (apache#562)

[MemHammer] Lower Pass + Unittests (apache#557)

Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564)

Fix Sketch Generation Unittests (apache#565)

speed up VerifyGpuCode (apache#568)

[Performance Align] fixing codegen problems (apache#569)

[Meta schedule] improve search space (#1)

Hot fix for bound predicate (#3)

[Meta Schedule] Update Tune Relay (#4)

[Performance Align] fixing codegen problems (#5)

[PerfAlign] NRM & SFM on Raspi Aligned (#6)

[BugFix] Apply bound predicate directly to loops when possible (#12)

[BugFix] Fix CrossThreadReduction on CUDA (#13)

[MetaSchedule] Enable BertTuning with MetaScheduler (#11)

[Minor][MemHammer] Minor tweaks in code review (#14)

[Meta Schedule] Add customizable search space to PostOrderApply. (#16)

Fix cooperative fetching (#17)

Fixes for codegen (#18)

[Hotfix] A unittest (#19)

Fix for GRP sketch gen (#21)

Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (#20)

[BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (#22)

[MemHammer][Refactor] Code Review (#15)

[Meta Schedule] Add Winograd Test for Customizable Search Space (#24)

Import & Cache Mechanism (#26)

[BugFix] Fix Winograd Test Script (#25)

Add task extraction & caching (#27)

A few fixes for task extraction (#28)

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com>
Co-authored-by: Xiyou Zhou <xiyou@octoml.ai>
  • Loading branch information
8 people committed Feb 20, 2022
1 parent 73cf51b commit 46a98f1
Show file tree
Hide file tree
Showing 125 changed files with 12,037 additions and 699 deletions.
158 changes: 0 additions & 158 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,158 +0,0 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Github code owners file
# This file is used as a convenient tool to map
# committers' areas of expertise and faciliate the review process.
#
# This may not be the non-comprehensive list and is meant to be
# updated over time.

# Per ASF policy, committer have global write permission.
# We normally recommend committers to shepherd code in their area of expertise.
* @apache/tvm-committers

# Order is important; the last matching pattern takes the most precedence.
# The sub modules should be ordered first by depth.
# Making sure we append new sub-module rules after exisiting modules rules.

##############################
# Top-level Fallbacks
##############################
include/** @tqchen @jroesch @yzhliu @icemelon @junrushao1994 @comaniac @zhiics
src/** @tqchen @jroesch @yzhliu @icemelon @junrushao1994 @comaniac @zhiics
apps/** @tqchen @jroesch @yzhliu @icemelon @junrushao1994 @comaniac @zhiics
python/** @tqchen @jroesch @yzhliu @icemelon @junrushao1994 @comaniac @zhiics

# Thirdparty license audit
3rdparty/** @tqchen @jroesch
licenses/** @tqchen @jroesch

# JVM language
jvm/** @yzhliu

# Golang
golang/** @srkreddy1238

# WASM
web/** @tqchen @jroesch

# Docker
docker/** @areusch @leandron @jroesch

# Conda
conda/** @tqchen @junrushao1994 @comaniac

# CMake
cmake/** @jroesch @tqchen @areusch @junrushao1994 @comaniac

# rust bindings
rust/** @jroesch @nhynes @nhynes

# vta
vta/** @tmoreau89 @vegaluisjose

# docs
docs/** @comaniac @junrushao1994 @tqchen @jroesch @areusch @yzhliu @merrymercy @icemelon
tutorials/** @comaniac @junrushao1994 @tqchen @jroesch @areusch @yzhliu @merrymercy @icemelon

# tests
tests/** @comaniac @junrushao1994 @tqchen @jroesch @areusch @yzhliu @merrymercy @icemelon

##############################
# Specific modules
##############################

# automation related
src/auto_scheduler/** @merrymercy @jcf94 @comaniac @junrushao1994 @vinx13 @Hzfengsy
include/tvm/auto_scheduler/** @merrymercy @jcf94 @comaniac @junrushao1994 @vinx13 @Hzfengsy
python/tvm/auto_scheduler/** @merrymercy @jcf94 @comaniac @junrushao1994 @vinx13 @Hzfengsy

python/tvm/autotvm/** @merrymercy @jcf94 @comaniac @junrushao1994 @vinx13

# node system and reflection
src/node/** @junrushao1994 @vinx13 @tqchen @jroesch @comaniac
include/tvm/node/** @junrushao1994 @vinx13 @tqchen @jroesch @comaniac

# ir: Common IR
src/ir/** @junrushao1994 @vinx13 @tqchen @jroesch @comaniac
include/tvm/ir/** @junrushao1994 @vinx13 @tqchen @jroesch @comaniac
python/tvm/ir/** @junrushao1994 @vinx13 @tqchen @jroesch @comaniac

# tir
src/tir/** @junrushao1994 @vinx13 @tqchen @kparzysz-quic @ZihengJiang @masahi @were @Hzfengsy
include/tvm/tir/** @junrushao1994 @vinx13 @tqchen @kparzysz-quic @ZihengJiang @masahi @were @Hzfengsy
python/tvm/tir/** @junrushao1994 @vinx13 @tqchen @kparzysz-quic @ZihengJiang @masahi @were @Hzfengsy

# te
src/te/** @junrushao1994 @vinx13 @tqchen @kparzysz-quic @ZihengJiang @masahi @were
include/tvm/te/** @junrushao1994 @vinx13 @tqchen @kparzysz-quic @ZihengJiang @masahi @were
python/tvm/te/** @junrushao1994 @vinx13 @tqchen @kparzysz-quic @ZihengJiang @masahi @were

# target
src/target/** @junrushao1994 @vinx13 @tqchen @kparzysz-quic @ZihengJiang @masahi
include/tvm/target/** @junrushao1994 @vinx13 @tqchen @kparzysz-quic @ZihengJiang @masahi
python/tvm/target/** @junrushao1994 @vinx13 @tqchen @kparzysz-quic @ZihengJiang @masahi

# arith: Arithmetic module and simplifiers
src/arith/** @tqchen @junrushao1994 @vinx13
include/tvm/arith/** @tqchen @junrushao1994 @vinx13
python/tvm/arith/** @tqchen @junrushao1994 @vinx13

# parser
src/parser/** @jroesch @slyubomirsky

# runtime
src/runtime/** @vinx13 @tqchen @FronzenGene @liangfu @areusch @tmoreau89 @ajtulloch @masahi @kazum @ZihengJiang @junrushao1994
include/tvm/runtime/** @vinx13 @tqchen @FronzenGene @liangfu @areusch @tmoreau89 @ajtulloch @masahi @kazum @ZihengJiang @junrushao1994
python/tvm/runtime/** @vinx13 @tqchen @FronzenGene @liangfu @areusch @tmoreau89 @ajtulloch @masahi @kazum @ZihengJiang @junrushao1994

# runtime/micro
src/runtime/micro/** @areusch @liangfu @tmoreau89 @manupa-arm
src/runtime/crt/** @areusch @liangfu @tmoreau89 @manupa-arm
include/tvm/runtime/crt/** @areusch @liangfu @tmoreau89 @manupa-arm
include/tvm/runtime/micro/** @areusch @liangfu @tmoreau89 @manupa-arm
python/tvm/micro/** @areusch @liangfu @tmoreau89 @manupa-arm

# relay
src/relay/** @jroesch @slyubomirsky @icemelon @MarisaKirisame @ZihengJiang @yzhliu @vinx13 @mbrookhart @jwfromm @zhiics @anijain2305 @wweic @eqy @junrushao1994
include/tvm/relay/** @jroesch @slyubomirsky @icemelon @MarisaKirisame @ZihengJiang @yzhliu @vinx13 @mbrookhart @jwfromm @zhiics @anijain2305 @wweic @eqy @junrushao1994
python/tvm/relay/** @jroesch @slyubomirsky @icemelon @MarisaKirisame @ZihengJiang @yzhliu @vinx13 @mbrookhart @jwfromm @zhiics @anijain2305 @wweic @eqy @junrushao1994


# relay/qnn
src/relay/qnn/** @jwfromm @anijain2305 @ZihengJiang
inlcude/tvm/relay/qnn/** @jwfromm @anijain2305 @ZihengJiang
python/tvm/relay/qnn/** @jwfromm @anijain2305 @ZihengJiang

# relay/backend/contrib: BYOC
src/relay/backend/contrib/** @zhiics @trevor-m @comaniac @mbaret @manupa-arm

# relay/frontends
python/tvm/relay/frontend/** @jwfromm @mbrookhart @srkreddy1238 @siju-samuel @Huyuwei @hlu1 @kazum @PariksheetPinjari909

# topi: Operator definitions
src/topi/** @Laurawly @Huyuwei @kevinthesun @jwfromm @vinx13 @masahi @FronzenGene @yzhliu @mbrookhart @ZihengJiang @jcf94
include/tvm/topi/** @Laurawly @Huyuwei @kevinthesun @jwfromm @vinx13 @masahi @FronzenGene @yzhliu @mbrookhart @ZihengJiang @jcf94
python/tvm/topi/** @Laurawly @Huyuwei @kevinthesun @jwfromm @vinx13 @masahi @FronzenGene @yzhliu @mbrookhart @ZihengJiang @jcf94


# tvm/driver/
python/tvm/driver/** @leandron @jwfromm @tqchen @jroesch

# tvm/driver/tvmc
python/tvm/driver/tvmc/** @leandron @jwfromm
17 changes: 17 additions & 0 deletions include/tvm/arith/int_set.h
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,14 @@ Map<Var, IntSet> ConvertDomMap(const std::unordered_map<const VarNode*, IntSet>&
* \return An integer set that can cover all the possible values of e.
*/
IntSet EvalSet(PrimExpr e, const Map<IterVar, IntSet>& dom_map);
/*!
* \brief Same as EvalSet, but takes Map<Var, IntSet>
*
* \param e The expression to be evaluated.
* \param dom_map The domain of each variable.
* \return An integer set that can cover all the possible values of e.
*/
IntSet EvalSet(PrimExpr e, const Map<Var, IntSet>& dom_map);
/*!
* \brief Same as EvalSet, but takes unordered_map
*
Expand All @@ -177,6 +185,15 @@ IntSet EvalSet(PrimExpr e, const Map<IterVar, IntSet>& dom_map);
* \return An integer set that can cover all the possible values of e.
*/
IntSet EvalSet(PrimExpr e, const std::unordered_map<const tir::VarNode*, IntSet>& dom_map);
/*!
* \brief Same as EvalSet, but takes Array<PrimExpr>
*
* \param exprs The expressions to be evaluated.
* \param dom_map The domain of each variable.
* \return An array of integer sets that can cover all the possible values.
*/
Array<IntSet> EvalSet(const Array<PrimExpr>& exprs, const Map<Var, IntSet>& dom_map);

/*!
* \brief Find an symbolic integer set that contains is union over
* all the possible conditional values in dom_map.
Expand Down
4 changes: 2 additions & 2 deletions include/tvm/meta_schedule/builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class BuilderInputNode : public runtime::Object {
IRModule mod;
/*! \brief The target to be built for. */
Target target;
/*! \brief The optional parameters used for build */
/*! \brief Parameters for Relay build module. */
Optional<Map<String, runtime::NDArray>> params;

void VisitAttrs(tvm::AttrVisitor* v) {
Expand All @@ -55,7 +55,7 @@ class BuilderInput : public runtime::ObjectRef {
* \brief Constructor of BuilderInput.
* \param mod The IRModule to be built.
* \param target The target to be built for.
* \param params The optional parameters used for build
* \param params Parameters for Relay build module.
*/
TVM_DLL explicit BuilderInput(IRModule mod, Target target,
Optional<Map<String, runtime::NDArray>> params = NullOpt);
Expand Down
2 changes: 2 additions & 0 deletions include/tvm/meta_schedule/schedule_rule.h
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ class ScheduleRule : public runtime::ObjectRef {
* \param tile_binds For each level of tiles, which thread axis it is bound to. Recommended:
* - NullOpt on CPU
* - [blockIdx.x, vthread.x, threadIdx.x] on GPU
* \param use_tensor_core Whether to apply tensor core wmma intrinsic for the computation
* \param max_innermost_factor The maximum size of the innermost factor. NullOpt means no limit
* \param vector_load_lens The length of vector lane in vectorized cooperative fetching.
* NullOpt means disable vectorization
Expand All @@ -146,6 +147,7 @@ class ScheduleRule : public runtime::ObjectRef {
*/
TVM_DLL static ScheduleRule MultiLevelTiling(String structure, //
Optional<Array<String>> tile_binds, //
bool use_tensor_core, //
Optional<Integer> max_innermost_factor, //
Optional<Array<Integer>> vector_load_lens, //
Optional<Map<String, ObjectRef>> reuse_read, //
Expand Down
2 changes: 2 additions & 0 deletions include/tvm/meta_schedule/tune_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ class TuneContextNode : public runtime::Object {
v->Visit("rand_state", &rand_state);
v->Visit("num_threads", &num_threads);
v->Visit("is_stopped", &is_stopped);
v->Visit("builder_results", &builder_results);
v->Visit("runner_futures", &runner_futures);
v->Visit("measure_candidates", &measure_candidates);
}

Expand Down
59 changes: 59 additions & 0 deletions include/tvm/tir/function.h
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,65 @@ class LinkedParam : public ObjectRef {
TVM_DEFINE_OBJECT_REF_COW_METHOD(LinkedParamNode);
};

/*! \brief A mapping from multi-dimensional indices to another set of multi-dimensional indices */
class IndexMapNode : public Object {
public:
/*! \brief The source indices */
Array<Var> src_iters;
/*! \brief The target indices */
Array<PrimExpr> tgt_iters;

void VisitAttrs(tvm::AttrVisitor* v) {
v->Visit("src_iters", &src_iters);
v->Visit("tgt_iters", &tgt_iters);
}

/*!
* \brief Take `inputs` as the source indices and return the corresponding target indices.
* \param inputs The source indices.
* \return The target indices.
*/
Array<PrimExpr> Apply(const Array<PrimExpr>& inputs) const;

/*!
* \brief Map a shape to the output space
* \param shape The shape in the source space
* \return The shape in the target space
*/
Array<PrimExpr> MapShape(const Array<PrimExpr>& shape) const;

/*!
* \brief Convert to string representation in Python.
* \return The stringified lambda expression in Python.
*/
String ToPythonString() const;

static constexpr const char* _type_key = "tir.IndexMap";
TVM_DECLARE_FINAL_OBJECT_INFO(IndexMapNode, Object);
};

/*!
* \brief Managed reference to IndexMapNode.
* \sa IndexMapNode
*/
class IndexMap : public ObjectRef {
public:
/*!
* \brief Constructor.
* \param src_iters The source indices.
* \param tgt_iters The target indices.
*/
explicit IndexMap(Array<Var> src_iters, Array<PrimExpr> tgt_iters);
/*!
* \brief Create an index map from a packed function
* \param ndim The number of dimensions
* \param func The function to be applied
* \return The created index map
*/
static IndexMap FromFunc(int ndim, runtime::TypedPackedFunc<Array<PrimExpr>(Array<Var>)> func);
TVM_DEFINE_OBJECT_REF_METHODS(IndexMap, ObjectRef, IndexMapNode);
};

/*!
* \brief Tensor intrinsics for tensorization
*/
Expand Down
20 changes: 20 additions & 0 deletions include/tvm/tir/schedule/schedule.h
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,11 @@ class ScheduleNode : public runtime::Object {
*/
virtual BlockRV CacheWrite(const BlockRV& block_rv, int write_buffer_index,
const String& storage_scope) = 0;
/******** Schedule: Data movement ********/
virtual BlockRV ReadAt(const LoopRV& loop_rv, const BlockRV& block_rv, int read_buffer_index,
const String& storage_scope) = 0;
virtual BlockRV WriteAt(const LoopRV& loop_rv, const BlockRV& block_rv, int write_buffer_index,
const String& storage_scope) = 0;
/******** Schedule: Compute location ********/
/*!
* \brief Move a producer block under the specific loop, and regenerate the
Expand Down Expand Up @@ -521,6 +526,21 @@ class ScheduleNode : public runtime::Object {
*/
virtual void Unannotate(const BlockRV& block_rv, const String& ann_key) = 0;

/******** Schedule: Layout transformation ********/
/*!
* \brief Apply a transformation represented by IndexMap to buffer
* \details The indices and the access region to the target buffer is transformed by the given
* index_map. The index_map is used to infer the new shape of the buffer. Buffer must be either
* a function parameter, or allocated in a block (it cannot be a buffer subregion created via
* 'match_buffer').
* \param block_rv The block that accesses the target buffer.
* \param buffer_index The index of the buffer in block's read or write region.
* \param is_write_index Whether the buffer_index is the index of the block's write region.
* \param index_map The transformation to apply.
*/
virtual void TransformLayout(const BlockRV& block_rv, int buffer_index, bool is_write_index,
const IndexMap& index_map) = 0;

/******** Schedule: Misc ********/
/*! \brief A no-op that marks the start of postprocessing phase of scheduling */
virtual void EnterPostproc() = 0;
Expand Down
Loading

0 comments on commit 46a98f1

Please sign in to comment.