forked from apache/tvm
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Include static builds of the runtime as part of CI #3
Open
gigiblender
wants to merge
231
commits into
main
Choose a base branch
from
build-static-runtime
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… buffer (apache#13605) * Fix PlanAndUpdateBufferAllocationLocation not visiting constant buffer * add comment
…ache#13414) Enable depthwise conv2d NHWC with HWIO kernel layout. The default kernel layout is HWOI, matched to previous behavior.
…che#13602) * Add support for SequenceAt and SplitToSequence to onnx importer * Formatting * Change keepdims comparison * Only unify non-tuples in If
…#13606) * introduce LowerToPrimFunc to lower Relay func to TIR prim func * add doc * expose to python * adding test * another minor doc update * Verify that the input is a primitive function
…CopyConstants scheduler (apache#13588) In Ethos-U, CopyConstants scheduler currently copies weights for all operators. But in Vela, there are a number of scenarios where the weights are not buffered in SRAM, and FullyConnected case is one of them.
gigiblender
force-pushed
the
build-static-runtime
branch
from
December 14, 2022 12:33
c1c3b12
to
5ba8295
Compare
* fixed test * fix flag for arduino
Pass `std::nullopt` to initialization of `PassBuilder` for `PGOOptions`. LLVM is moving away from its own `Optional` type to `std::optional`.
…13616) default_rng was introduced in numpy 1.19, which is not present even in Ubuntu 20.04 (it comes with 1.17.4).
…abase (apache#13611) [Metaschedule] Align get_top_k logic in MemoryDatabase and JSONDatabase
…ase (apache#13618) * fixed tensor core batch_matmul legalize for transpose_b = False case * add test * clean up
…che#13615) In the Relay Matmul shape relation, we are a little over enthusiastic about unifying dynamic shapes. If one of the shapes is static, it does not need to be unified. This change only rewrites dynamic shapes to required static constraints. * Remove overwriting of matmul shapes when they are static * Simplify nesting * Add shape check to dense tests.
[Frontend] [ONNX] Support sequence_lens of GRU. Support convert sequence_lens input of GRU.
* [ETHOSN] Add support for experimental compiler option The support library currently supports enabling the experimental cascading compiler option via an environment variable `FORCE_EXPERIMENTAL_COMPILER`. This commit exposes the ability to enable this option through TVMC.
…#13622) * Fix print round-tripable multi thread env binding * add unittest
* [TOPI][Hexagon] Implement global_avg_pool2d for hexagon * Fix name * Fix lint issues * Use get_hexagon_target()
apache#13613) * Add check for non-contiguous memory access when lowering to async dma copies. * lint * lint and nits * lint
* [MetaSchedule] Fix tensorcore winograd task extraction * add test * fixed target
Please join us to welcome @multiverstack-intellif as a new reviewer to TVM. Min contributed key features in TIR scheduling, specifically, cache_read/write that are aware of cache location. These are huge improvements that are technically profound and helpful to the overall TVM stack. Therefore, it would be good to get more opportunities for him to participate more deeply in the community. - [Commits History](https://github.com/apache/tvm/commits?author=multiverstack-intellif) - [Code Review](https://github.com/apache/tvm/pulls?q=reviewed-by:multiverstack-intellif)
This PR address fixes for UMA target registration. * Fix the doc issue apache#13304 * Continues stalled PR apache#12731 Changes: * Incorporates all proposed fixes from mentioned [PR apache#12731](apache#12731) * Address test case concerns and discussions from [PR apache#12731](apache#12731) * **NEW:** Already exiting target cannot be created, explicit error on this. * **NEW:** Attributes having special/reserved scope cannot be created explicitly. It also address proper test cases for all the above.
…afe (apache#13609) Protect CheckSemaphore with mutex. Ensure that only one thread can add a semaphore if it doesn't already exist.
…rsions above 3.8 too (apache#13635) Currently, `relay.testing.py_converter` is checking for using _exactly_ Python 3.8 in order to use certain updated signatures in the `ast` library. However, those signatures are also correct for versions _above_ 3.8. This PR changes the bounds checks so that the converter will work above 3.8.
…3623) * add set_output and test for set_output_zero_copy in python * clean up * clean up test * test finished * remove set output * remove setoutput from header * use zero copy for params * fix typo * address comments * address comments * add second test for set_input params * add requires_torch * add requires torch * remove pytest * add error handling for c graph executor * better handling
This PR tries to fix the crash of parser when the old value of a var is an array but the new value is not. For example: ```python from tvm.script import tir as T def func_wrapper(shape, dtype): @T.prim_func def test_case(): a = T.alloc_buffer(shape, dtype=dtype) return test_case if __name__ == "__main__": a = np.zeros((10, 10), dtype="int8") print(func_wrapper((256, 256), dtype="int8").script()) ``` In the above code, there are two assignment to var 'a'. In the global scope, its value is a numpy array. But it is a Buffer in the prim function. There is a table named 'name2value' to track the value of vars like 'a' here. When the parser wants to update its value, it will compare the value between the new and the old assignment. Here the problem comes. When we use '==' to compare an array with a value, the result is an array too, which can not be used as a condition of a if stmt directly. So, the code above will emit an error: ```shell error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() --> /workspace/code_newest/tvm/private_test/test_meta_programming.py:16:9 | 16 | a = T.alloc_buffer(shape, dtype=dtype) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` This PR fixes this by change "==" to "is". Co-authored-by: lightzhan-intellif <zhan.liang@intellif.com>
…pache#13325) * [TRANSFORM] Fix virtual device anaotation issue with BYOC subgraphs Heterogeneous module partitioned by BYOC has functions nodes without any VirtualDevice definition (having FullyUnconstrained device). Ignoring the device here causes expr_virtual_devices_ being empty when PopVirtualDevice is called assuming above PushVirtualDevice is succeeded. PushVirtualDevice and PopVirtualDevice occurs as pairs across function body, hence it's better to insert the The Virtual Device for Uncontrained and Pop it subsequently. * * Test case Co-authored-by: Siva Rama Krishna Reddy B <sivb@blr-ubuntu-ripper.qualcomm.com>
This PR try to fix the following bug: ```python def test_var_capturing_order(): b = 2 @T.prim_func def test_case(): k: T.int32 = b if __name__ == "__main__": b = 1 ``` In the prim func `test_case`, the vaule of b should be 2, rather than 1. The parser wrongly uses global vars to shadow the value of nonlocal vars, which should be reversed. Co-authored-by: lightzhan-intellif <zhan.liang@intellif.com>
Removing some minor code path that is not used any longer.
Temporarily constrain the version of numpy to workaround the deprecated value used in mxnet. See apache#13647. Change-Id: Ib271c223447c76b855fe35cc8a1e77411a3fa441
-New process memory allocator is used to create buffers and networks. -Support for 22.08 stack has been kept intact in the sources and tests until new docker image is built and used. -Tests were modified to meet limitations imposed on input zero point and kernel size by NPU software. -Removed defining ETHON_API_VERSION from cmake infra.
This commit updates the microTVM code to use Zephyr 3.2 and SDK 0.15.2. As a result of this change, there are a few other changes that are included: - A launch script was added for Zephyr and Arduino template project to run the Project API server from a different Python ENV than TVM Python ENV; - For Zephyr, the launch script uses global python3.8 which is where west is registered. However, for Arduino it uses a separate virtual ENV with python3 version that exists in the host; - tests/micro/project_api/test_arduino_microtvm_api_server.py was removed since these tests were using Arduino microTVM API server by importing it from TVM. We no longer support Arduino/Zephyr dependencies in TVM testing python ENV; - Disables a demo and test due to a CMSIS-NN bug: [Bug] CMSIS-NN BYOC fails with Zephyr 3.2 apache#13856; There will be a follow up work to move Zephyr to a completely separate virtual ENV as it was done in this commit for Arduino in the launch script.
…e#13891) This PR adds Tensor Core intrinsics with `shared.dyn` scope and changes the default rules to use `shared.dyn`. Here are the performance improvement of GEMM 1024x1024x1024 on my device (RTX-3080) | | Use `shared` | Use `shared.dyn`. | Speedup | | ----------- | --------------------- | -------------------- | ---------- | fp 16-16-16 | 66399.8766 GFLOPs | 71778.3808 GFLOPs | 8.1% | fp 16-16-32 | 44292.5893 GFLOPs | 49070.2514 GFLOPS | 10.8% | cc @vinx13 @junrushao @masahi
normalize test_crt.py temporary directory usage remove test_crt.py unused code and variables fix test_crt.py to pass pylint and add to pylint script
Dear community: Please join us to welcome Wrongtest (@wrongtest-intellif) as a new PMC member. Wrongtest has been actively improving and contributing to the TIR Schedule and arithmetic analysis implementation. He’s been actively involved in several components of the TVM top to bottom, including but not limited to, - Relay - TOPI - MetaSchedule - TIR - TVMScript - Arithmetic analysis in TIR - Codegen for Cuda and LLVM - Runtime He has a deep understanding of the whole part of one of the compilation flows in the TVM(Relay-TIR-Codegen). He focuses on applying TVM to their NPU backend, contributing features back to the community, and cultivating more active community contributors in his company. In addition, He has been very actively participating in the community, discussion and sharing his ideas in the forum. He has been also actively managing the PRs and issues. * [Commits History](https://github.com/apache/tvm/commits?author=wrongtest-intellif) * [Code Review](https://github.com/apache/tvm/pulls?q=+reviewed-by%3Awrongtest-intellif+) * [Community Forum Summary](https://discuss.tvm.apache.org/u/wrongtest/summary)
* feat: combine cutlass and ansor * use sm80 and disable run_benchmark * fix lint error * use tempfile; fix dangerous default value * merge cutlass_ansor test into test_cutlass.py * fix lint --------- Co-authored-by: hanqingchang <hanqingchang@kuaishou.com>
This PR extends the value accepted by USE_LIBBACKTRACE cmake variable. It adds ability to search pre-compile libbacktrace from system path or user-provided path. The behavior of this variable will be: - OFF: Don't use libbacktrace - ON: Find libbacktrace from system paths. Fail the build generation if libbacktrace is not found. - COMPILE: Build and link to libbacktrace from 3rdparty/libbacktrace. - \<PATH\>: Looking for the libbacktrace header and static lib from a user-provided path. Fail the build generation if libbacktrace is not found. - AUTO: - Find libbacktrace from system paths. - If not found, fallback to COMPILE on Linux or MacOS, fallback to OFF on Windows or other platforms.
…apache#13849) * Assertion failed during tuning * Cleanup * Do not commit * Do not commit * Undo fix + provide test for multithread random filling * Random fill test with fix enabled * Isolate the effect of this test on the other tests * Correct the typo in the function name * Import threading + lint
…e#13877) * initial commit * update additional use cases * typo * asf header, summary * clean up * lint * move code to src/runtime/file_utils.h * file utils is cool
…3826) [Torch] Fix advanced indexing with NoneType index
* [Hexagon][QNN] Improve performance of qnn.mul This commit imroves performance of qnn.mul operation without QNN canonicalization. * [QNN][Relay][Topi] Add qnn.dense with weight layout This commit adds new Relay operation "qnn.contrib_dense_pack" that supports different weights layout (nn.dense and qnn.dense do not support this attribute). This new operation is full analog of "nn.contrib_dense_pack" operation but in QNN space.
…pache#13874) * [TOPHUB] use keys as a keyword for searching of existing statistics In case of ARM we might not specify -device and in this case llvm will be used while even in this case we can determin proper filename with stat since keys have architecture defined. The same situatin must with with x86 * Add test on target not having arm_cpu device * minor fix, add comment * Fix pylint * Fix comment
* [OpenCL] Implement save/load pre-compiled programs Using pre-compiled programs might significantly improve inference time of the first run. - Added methods `SupportPreCompiledPrograms` which reports if the module supports using pre-compiled programs. - Method `GetPreCompiledPrograms` returns string with bytes of pre-compiled programs. - Method `SetPreCompiledPrograms` allows user to pass pre-compiled programs to the module. * Fix lint * Apply comment: PackedFunc is used * Fix build * Fix CI and rename functions * Apply comments
- Fix a bug where `buffer_decl`, combined with certain usage patterns of the resulting buffer, cause an TVM-internal assert failure during TIR-compilation.
fix doc for tvm.te.const()
* [CLML][CODEGEN] CLML native codegen utility This util generates native CLML code given a DNN model. It does import via tvmc, extracts clml_modules, get the json source and finally generates clml_models.cc that holds source for various sub graphs. cpp_clml tool has additional infrastructure to compile it as a standalong binary that runs these models. This PR adds symbol name to the generates json grpah. Also, extends const_loader interface to get constant params. * * review comments * * review * * review
…d.py script (apache#13905) As the PR title states, the purpose of this PR is to pass the path parameter given to cmake_build to the task_build.py script. with this PR we will be able to control compilation in different directories.
* add base class for bitwise operations. BitwiseAnd, BitwiseNot, BitwiseOr and BitwiseXor were implemented * add test for BitwiseAnd, BitwiseNot, BitwiseOr, BitwiseXor operations to ONNX front-end * add test of BitShift for ONNX front-end * fix dtype for test * skip test due to old version of ORT --------- Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
A simple tvmc tune command currently results in a huge wall of warnings about target_host parameter being deprecated, even when the user hasn't provided a target-host cmd line argument. We can prevent that happening from just not providing the default target-host to tvmc. Also, ensure that when the user does provide target-host, we print the warning once, not 500 times.
) This PR updates the order of arithmetic analyzer simplification, by adding a stage of canonical simplification at the very beginning so that every simplification always starts with a canonical round. This is because the rewrite simplification may destroy some PrimExpr property that the canonical simplification can make use of. Therefore, adding the canonical one in the front can maximize the use of canonical simplification.
Issue apache#13911 reported that newer versions of setuptools crash with the version constraint "<=1.23.*", this commit implements the suggested fix by using "<=1.23".
This PR is the bug fix reported in apache#13892. Initially, we mix the logic of `LetStmt` docsifying method with and without concise scoping. For example, in ```python x = T.var("int32") with T.let(x, 0): ``` `x` in the `LetStmt` works as a right value, while in ```python x: T.int32 = 0 ``` `x` in the `LetStmt` works as a left value as result. Our old logic mixed them together to generate the wrong code for the first case. Meanwhile, during the fix, we found another bug in concise scoping check. For example, we have ```python x = T.var("int32") y = T.var("int32") with T.let(x, y): with T.let(y, 0): ``` here we should not output ```python x = T.var("int32") y = T.var("int32") with T.let(x, y): y: int32 = 0 ``` becase this will define a new `y_1: int32 = 0` indeed, due the the variable shadowing logic of the parser, which is different from the `y` we define and refer to. Our concise scoping `v: ... = ...` should launch if and only if the `v` is never defined before. Otherwise, we use `with T.let(v, ...):` instead.
…ter (apache#13859) This PR refactors the output of `assert_structural_equal`. Different from the directly printing mismatching nodes, in the old version, the improved one will print the whole scripts, with mismatching nodes underlined. And we print the `ObjectPath` to the mismatching nodes for further better debug. For example, we have following functions ```python @T.prim_func def func1(a: T.handle, b: T.handle): A = T.match_buffer(a, (128, 128)) B = T.match_buffer(b, (128, 128)) @T.prim_func def func2(a: T.handle, b: T.handle): A = T.match_buffer(a, (128, 128)) B = T.match_buffer(b, (128, 256)) ``` the log of `assert_structural_equal(func1, func2)` will be like ```python ValueError: StructuralEqual check failed, caused by lhs at <root>.buffer_map[b].shape[1].value: # from tvm.script import tir as T @T.prim_func def main(a: T.handle, b: T.handle): A = T.match_buffer(a, (128, 128)) B = T.match_buffer(b, (128, 128)) ^^^ T.evaluate(0) and rhs at <root>.buffer_map[b].shape[1].value: # from tvm.script import tir as T @T.prim_func def main(a: T.handle, b: T.handle): A = T.match_buffer(a, (128, 128)) B = T.match_buffer(b, (128, 256)) ^^^ T.evaluate(0) ``` instead of ```python ValueError: StructuralEqual check failed, caused by lhs: 128 and rhs: 256 ``` which is not readable sometimes.
Rescale parameters have been added for binary elementwise operation in accordance with the Vela API (rescale field in NpuElementWiseOperation https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela/+/refs/tags/3.5.0/ethosu/vela/api.py#381). This PR is preparation for implementation of softmax operation.
This patch undoes the change that was put in place to prevent the build and installation of NNPACK from failing due to a renaming of the default branch to main by the NNPACK external dependency cpuinfo. See apache#13871 The issue has been fixed at the source by PR apache#214 which is now merged in to NNPACK, so the change to `ubuntu_install_nnpack.sh` is no longer required: Maratyszcza/NNPACK#214
Leaving class definitions was not correctly handled when recreating scoping information. The fix correctly pops scope whenever the indentation level becomes less than the current scope.
The rules that rewrite `min(floordiv(x + (A-1), A) * A, max(x, A))` and `min(truncdiv(x + (A-1), A) * A, max(x, A))` into `max(x, A)` did not have sufficiently tight bounds. The `truncdiv` rule required that `x >= 0`, while the `floordiv` rule had no requirement on `x`. In both cases, the simplification was incorrect when `x==0`, as it would result in a rewrite from `min(0, max(0, A))` into `max(0, A)`. This commit updates the rules to require that `x >= 0` for each of these rules.
On the platforms with several OpenCL platforms (e.g. Intel CPU and NVidia GPU) it was possible to use OpenCL device only from one platform. And in case when Intel was the first in the platforms list than it wasn't possible to run model on NVidia GPU. In this PR the init function was modified and now it is possible to use OpenCL devices from different platforms. In case when there are several GPU accelerators then it is possible to select one of them. You can use device id to select GPU device. On the code below the device names of two OpenCL devices are printed: ```python import tvm print("opencl 0: ", tvm.opencl(0).device_name) print("opencl 1: ", tvm.opencl(1).device_name) ``` In case then the machine doesn't contain any GPUs then we will try to use CPU if OpenCL runtime is available.
Certain tests that make use of `pytest_wrapper.py`, such as those triggered by `task_python_integration.sh` will fail when a "PLATFORM" environment variable is not set within the Docker container. When using `build.sh` to both create a container, and run a command to execute one of the tests that requires a "PLATFORM", an error will occur due to the missing environment variable. This patch is necessary to add support for this environment variable to `build.sh` and prevent such errors.
gigiblender
force-pushed
the
build-static-runtime
branch
from
February 7, 2023 12:24
4f4c7d2
to
2252b32
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.