forked from apache/tvm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Enable cudnn, get rid of support for op-predicate based BYOC integrations - Enable cublas - And yet another go at pruning unnecessary candidates. - Another go at pruning unnecessary candidates - Fix CompositePartitionRule use - Fix a few bugs with new TensorRT pattern-based integration - Rework RemoveSubCandidatesCombinerRule for soundness - Better logging - Bug fixes - Implement critical nodes idea for avoiding obviously unnecessary candidates - Promote DataflowGraph from alias to class so can cache downstream index set - Quick check to avoid unioning candidates which would create a cycle - Host out CandidatePartitionIndex and add rules to avoid small candidates subsumed by containing candidates - GetFunction can legitimately return nullptr - rename tuning log - Support for int64 literals - Switch GPT2 to plain model - Fix library cloberring issue for cutlass - actually checkin 'built in' tuning log (covers mnist & gpt2 only) - trying to debug gpt2 - Update TargetKind attribute name - working through gpt2 issues - checkin tuning records for MNIST (with hack to not retry failed winograd) - Autotvm tuning disabled if log file empty (default) - Autotvm tuning during search working - tune during search (but does not load tuned records after search!) - About to add tuning to estimate_seconds - Split out the combiner rules & make them FFI friendly - Rework comments - Estimate IRModule instead of Function (closer to meta_schedule iface) - Add 'host' as first-class partitioning spec (Avoids special casing for the 'leave behind for the VM' case) - Move CollagePartitioner to very start of VM compiler flow (not changing legacy) - Fix bugs etc with new SubGraph::Rewrite approach Ready for updating RFC to focus on partitioning instead of fusion. - Working again after partition<->fusion split. - Add PrimitivePartitionRule - Refactor SubGraph Extract/Rewrite *** CAUTION: Almost certainly broken *** - Rename kernel->partition, fusion->partition - Next: make nesting in "Primitive" an explicit transform - respect existing target constraints from device planner - make 'compiler' and 'fusion_rule' attributes avail on all target kinds - moved design to tvm-rfcs, apache/tvm-rfcs#62 - incorporate comments - avoid repeated fusion - fix trt type checking - better logs - pretty print primitive rules - fix tensorrt - multiple targets per spec - don't extract candidate function until need cost Need to bring CombineByPrimitives back under control since lost depth limit. - cleaned up fusion rule names - added 'fuse anything touching' for BYOC - Finish dd example - Add notion of 'MustLower', even if a candidate fires may still need to consider leaving node behind for VM (especially for constants). - starting example - finished all the dd sections - documentation checkpoint - docs checkpoint - more design - starting on dd - runs MNIST with TVM+CUTLASS+TRT - cutlass function-at-a-time build - need to account for build_cutlass_kernels_vm - move cutlass tuning into relay.ext.cutlass path to avoid special case - add utils - don't fuse non-scalar constants for tvm target. - stuck on cuda mem failure on conv2d, suspect bug in main - where do the cutlass attrs come from? - running, roughtly - pretty printing, signs of life - wire things up again - Switch SubGraph and CandidateKernel to TVM objects - naive CombineByKindFusionRule, just to see what we're up agaist Will switch to Object/ObjectRef for SubGraph and CandidateKernel to avoid excess copying. - preparing to mimic FuseOps - rework SubGraph to use IndexSet - rough cut at MaximalFusion - split SubGraph and IndexSet in preparation for caching input/output/entry/exit sets in SubGraph. - top-down iterative handling of sub-sub-graphs - about to give up on one-pass extraction with 'sub-sub-graphs' - Add notion of 'labels' to sub-graphs - Rework FusionRules to be more compositional - partway through reworking fusion rules, broken - SubGraph::IsValid, but still need to add no_taps check - dataflow rework, preparing for SubGraph::IsValid - explode into subdir - mnist with one fusion rule (which fires twice) working - switch to CandidateKernelIndex - Confirm can measure 'pre-annotated' primitive functions - checkpoint - stuff - more sketching - dominator logging
- Loading branch information
1 parent
68beae9
commit d03f187
Showing
98 changed files
with
9,962 additions
and
886 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "conv2d_nchw_winograd.cuda", [["TENSOR", [1, 1, 32, 32], "float32"], ["TENSOR", [8, 1, 5, 5], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "float32"], {}], "config": {"index": 968, "code_hash": null, "entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 2, 4, 1]], ["tile_x", "sp", [-1, 1, 7, 7]], ["tile_rc", "sp", [-1, 1]], ["auto_unroll_max_step", "ot", 128], ["unroll_explicit", "ot", 1]]}, "result": [[1000000000.0], 6, 10, 1648166365.035291], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "conv2d_nchw.cuda", [["TENSOR", [1, 1, 32, 32], "float32"], ["TENSOR", [8, 1, 5, 5], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "float32"], {}], "config": {"index": 748547, "code_hash": null, "entity": [["tile_f", "sp", [-1, 1, 4, 1]], ["tile_y", "sp", [-1, 1, 1, 4]], ["tile_x", "sp", [-1, 1, 14, 1]], ["tile_rc", "sp", [-1, 1]], ["tile_ry", "sp", [-1, 5]], ["tile_rx", "sp", [-1, 5]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": [[2.1807114592422733e-06, 2.182203281316585e-06, 2.183491385782991e-06], 0, 1.8035461902618408, 1648233194.5253587], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "conv2d_nchw_winograd.cuda", [["TENSOR", [1, 8, 18, 18], "float32"], ["TENSOR", [16, 8, 5, 5], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "float32"], {}], "config": {"index": 7905, "code_hash": null, "entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 1, 4, 4]], ["tile_x", "sp", [-1, 1, 49, 1]], ["tile_rc", "sp", [-1, 4]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": [[1.4285206158127155e-05, 1.4285846107313532e-05, 1.4331592281168714e-05], 0, 7.421089172363281, 1648237434.129], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "conv2d_nchw.cuda", [["TENSOR", [1, 8, 18, 18], "float32"], ["TENSOR", [16, 8, 5, 5], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "float32"], {}], "config": {"index": 714012, "code_hash": null, "entity": [["tile_f", "sp", [-1, 1, 8, 1]], ["tile_y", "sp", [-1, 1, 1, 1]], ["tile_x", "sp", [-1, 1, 7, 2]], ["tile_rc", "sp", [-1, 8]], ["tile_ry", "sp", [-1, 5]], ["tile_rx", "sp", [-1, 5]], ["auto_unroll_max_step", "ot", 512], ["unroll_explicit", "ot", 1]]}, "result": [[2.5586838960333487e-06, 2.5701070606157226e-06, 2.572374535019662e-06], 0, 3.1794843673706055, 1648239614.7956486], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_small_batch.gpu", [["TENSOR", [1, 256], "float32"], ["TENSOR", [10, 256], "float32"], null, "float32"], {}], "config": {"index": 4, "code_hash": null, "entity": [["tile_k", "sp", [-1, 16]]]}, "result": [[2.158152404676017e-06, 2.1645748896629425e-06, 2.1784918293729133e-06], 0, 1.6369056701660156, 1648241555.184448], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_large_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [2304, 768], "float32"], null, "float32"], {}], "config": {"index": 61851361, "code_hash": null, "entity": [["tile_x", "sp", [-1, 2, 2, 8]], ["tile_y", "sp", [-1, 1, 2, 9]], ["tile_k", "sp", [-1, 2, 4]]]}, "result": [[0.004074227972972973, 0.0040861373243243244, 0.004086151648648648], 0, 3.037601947784424, 1648251189.6885986], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_small_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [2304, 768], "float32"], null, "float32"], {}], "config": {"index": 5, "code_hash": null, "entity": [["tile_k", "sp", [-1, 8]]]}, "result": [[0.0268318398, 0.026832641350000002, 0.02683273135], 0, 4.179340600967407, 1648254281.8060668], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "batch_matmul.cuda", [["TENSOR", [600, 32, 64], "float32"], ["TENSOR", [600, 32, 64], "float32"], [600, 32, 32], "float32", 0, 1], {}], "config": {"index": 20386, "code_hash": null, "entity": [["tile_y", "sp", [-1, 2, 8]], ["tile_x", "sp", [-1, 16, 1]], ["tile_k", "sp", [-1, 16]], ["auto_unroll_max_step", "ot", 32], ["unroll_explicit", "ot", 1]]}, "result": [[3.258110773592547e-05, 3.258372944511948e-05, 3.261549426218442e-05], 0, 2.397996664047241, 1648255266.3718677], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "batch_matmul.cuda", [["TENSOR", [600, 32, 32], "float32"], ["TENSOR", [600, 64, 32], "float32"], [600, 32, 64], "float32", 0, 1], {}], "config": {"index": 5980, "code_hash": null, "entity": [["tile_y", "sp", [-1, 2, 8]], ["tile_x", "sp", [-1, 16, 1]], ["tile_k", "sp", [-1, 16]], ["auto_unroll_max_step", "ot", 16], ["unroll_explicit", "ot", 0]]}, "result": [[3.199404780823732e-05, 3.199749384187525e-05, 3.200219666269368e-05], 0, 2.3573713302612305, 1648257050.9987426], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_large_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [768, 768], "float32"], null, "float32"], {}], "config": {"index": 13482935, "code_hash": null, "entity": [["tile_x", "sp", [-1, 5, 16, 1]], ["tile_y", "sp", [-1, 4, 16, 2]], ["tile_k", "sp", [-1, 12, 2]]]}, "result": [[0.00026185516898148144, 0.00026186912731481486, 0.0002643642638888889], 0, 5.9183220863342285, 1648262140.4419408], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_small_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [768, 768], "float32"], null, "float32"], {}], "config": {"index": 9, "code_hash": null, "entity": [["tile_k", "sp", [-1, 32]]]}, "result": [[0.0022258066376811595, 0.0022258676666666666, 0.0022260689855072464], 0, 1.6845574378967285, 1648264221.272429], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_large_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [3072, 768], "float32"], null, "float32"], {}], "config": {"index": 75386735, "code_hash": null, "entity": [["tile_x", "sp", [-1, 5, 16, 1]], ["tile_y", "sp", [-1, 2, 16, 4]], ["tile_k", "sp", [-1, 2, 12]]]}, "result": [[0.0009476383928571428, 0.0009476764880952381, 0.0009480008333333333], 0, 3.346571207046509, 1648271350.9854434], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_small_batch.gpu", [["TENSOR", [1600, 768], "float32"], ["TENSOR", [3072, 768], "float32"], null, "float32"], {}], "config": {"index": 17, "code_hash": null, "entity": [["tile_k", "sp", [-1, 768]]]}, "result": [[1000000000.0], 4, 4.362995386123657, 1648274146.1389868], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_large_batch.gpu", [["TENSOR", [1600, 3072], "float32"], ["TENSOR", [768, 3072], "float32"], null, "float32"], {}], "config": {"index": 15171048, "code_hash": null, "entity": [["tile_x", "sp", [-1, 5, 4, 20]], ["tile_y", "sp", [-1, 1, 192, 2]], ["tile_k", "sp", [-1, 8, 2]]]}, "result": [[1000000000.0], 1, 1.2985179424285889, 1648274382.1135368], "version": 0.2, "tvm_version": "0.9.dev0"} | ||
{"input": ["cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32", "dense_small_batch.gpu", [["TENSOR", [1600, 3072], "float32"], ["TENSOR", [768, 3072], "float32"], null, "float32"], {}], "config": {"index": 9, "code_hash": null, "entity": [["tile_k", "sp", [-1, 32]]]}, "result": [[1000000000.0], 4, 4.3437583446502686, 1648274480.7225487], "version": 0.2, "tvm_version": "0.9.dev0"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.