Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UMA] UMA v1.0 #12087

Merged
merged 112 commits into from
Aug 9, 2022
Merged

[UMA] UMA v1.0 #12087

merged 112 commits into from
Aug 9, 2022

Conversation

MichaelJKlaiber
Copy link
Contributor

UMA v1:

  • UMA Partitioner
  • UMA Pipeline
  • UMA Lower
  • UMA Codegen

@MichaelJKlaiber MichaelJKlaiber changed the title Uma v1.0 UMA v1.0 Jul 14, 2022
@MichaelJKlaiber MichaelJKlaiber changed the title UMA v1.0 [UMA] UMA v1.0 Jul 14, 2022
Copy link
Contributor

@cbalint13 cbalint13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaelJKlaiber , @cgerum , @PaulPalomeroBernardo

My congratulations for the wonderful work and the outstanding idea towards modularization !

  • Suggested a small nit for cmake completeness
  • Please rebase against apache:main HEAD, it looks out of sync
  • Please invoke with Cc more reviewers (i.e. those from the UMA RFC PR)

CMakeLists.txt Outdated Show resolved Hide resolved
@cbalint13
Copy link
Contributor

cbalint13 commented Jul 15, 2022

Cc @driazati may help with the missing clang i386 test failure @ test_lower_with_uma[1-224-224-3-5-5-4]

[2022-07-15T13:33:16.200Z] >  raise RuntimeError("cannot find clang, candidates are: " + str(cc_list))

Copy link
Contributor

@sunggg sunggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great contribution for better accelerator supports! 🥳
I enjoyed reading the code and learning the design.
Overall, implementation looks good to me.
Here are a few high-level comments:

  • Could you add more documentation in general (e.g., explain what parameters are)?
  • It would be great to have a testing for end-to-end model-level testing. (e.g., run BERT with UMA on ARM EthosU)

Please see below for more comments. Thanks!

python/tvm/relay/backend/contrib/uma/_template/backend.py Outdated Show resolved Hide resolved
python/tvm/relay/backend/contrib/uma/_template/backend.py Outdated Show resolved Hide resolved

def _register_c_codegen(
self,
includes: Callable[[], str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the docstring for parameters?
Also, can includes be Optional[str]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc-string added.
@cgerum : can you answer the other question?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To generate dynamic strings more easily we would be in favor of passing a Callable
@PaulPalomeroBernardo @cgerum

@cbalint13
Copy link
Contributor

cbalint13 commented Jul 19, 2022

Thank you for the great contribution for better accelerator supports! partying_face I enjoyed reading the code and learning the design. Overall, implementation looks good to me. Here are a few high-level comments:

  • Could you add more documentation in general (e.g., explain what parameters are)?
  • It would be great to have a testing for end-to-end model-level testing. (e.g., run BERT with UMA on ARM EthosU)

Please see below for more comments. Thanks!

Thank you for the great contribution for better accelerator supports! partying_face I enjoyed reading the code and learning the design. Overall, implementation looks good to me. Here are a few high-level comments:

Personal view over the first part (non code related) of @sunggg requests, of course just IMHO:

  • Could you add more documentation in general (e.g., explain what parameters are)?
  • Current sample (_template) & doc are fine for a contrib module.
  • It greatly enables "out-of-tree" maintenance towards custom HW accelarator design.
  • Example parts Vanilla, Strawberry & Chocolate are nice levels of introductory & guidance.
  • It would be great to have a testing for end-to-end model-level testing. (e.g., run BERT with UMA on ARM EthosU)
  • Sounds beyond of scope, same way we could "dream" a fully verilated BERT running verilator-contrib module (!) @tmoreau89 , @vegaluisjose

  • As for ARM-Ethos ⟨™⟩, my understanding is a quite different (not so "out-of-tree" principled) way to connect things inside, but @manupa-arm can comment more on this.

  • Would be possible in the future to decouple ARM-Ethos⟨™⟩ (or Vitis-AI) to be more "out-of-tree" by porting it over UMA, but this would be quite bit of another task.

Some personal efforts envisioned similar out-of-tree approaches {for i.e. OLIMP / e-verest } are before UMA just landed as super nice "out-of-tree" topping. As my last wish, I put forward hope that we community one day will be able to leverage jointly the complex hardware-design space within TVM's unique compiler-space.

Beyond code-related requests below, I halt my review here with a @lgtm claim, and a go for UMA v1.0 contrib !

Please see below for more comments. Thanks!

@sunggg
Copy link
Contributor

sunggg commented Jul 19, 2022

@cbalint13, appreciate sharing your thoughts! I also think this is a great out-of-tree approach and look forward to its actual usage.

Could you add more documentation in general (e.g., explain what parameters are)?

Current sample (_template) & doc are fine for a contrib module.
It greatly enables "out-of-tree" maintenance towards custom HW accelarator design.
Example parts Vanilla, Strawberry & Chocolate are nice levels of introductory & guidance.

I agree that the overall docstring structure (e.g., Vanilla, Strawberry & Chocolate) would be a great introduction/tutorial (Truly appreciate the authors!).
My issue was more like a code-level docstring since some of them are missing and I wasn't sure what each function or parameter exactly means.
https://github.com/apache/tvm/blob/bc211fdc91932ef9b77f7edd2ff65151ee5faaff/python/tvm/relay/backend/contrib/uma/_template/codegen.py
https://github.com/apache/tvm/blob/bc211fdc91932ef9b77f7edd2ff65151ee5faaff/python/tvm/relay/backend/contrib/uma/_template/codegen.py

It would be great to have a testing for end-to-end model-level testing. (e.g., run BERT with UMA on ARM EthosU)

Sounds beyond of scope, same way we could "dream" a fully verilated BERT running verilator-contrib module (!) @tmoreau89 , @vegaluisjose

As for ARM-Ethos ⟨™⟩, my understanding is a quite different (not so "out-of-tree" principled) way to connect things inside, but @manupa-arm can comment more on this.

Would be possible in the future to decouple ARM-Ethos⟨™⟩ (or Vitis-AI) to be more "out-of-tree" by porting it over UMA, but this would be quite bit of another task.

I see. Thank you for the clarification! Maybe my example was not relevant. I just thought it would be nice to have some model-level test cases since we seem to have all necessary components (UMA Partitioner, UMA Pipeline, UMA Lower, UMA Codegen) in this PR. I just wanted to see if current PR also covers a case where we lower some parts with UMA while handling other unsupported parts via the conventional pipeline.

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @MichaelJKlaiber I've given this a look over. I think it's generally in the right direction, suggest a couple more comments and a few minor changes here and there. overall though, the approach looks great. i'd love it if we could explore integrating this further with tvmc since we are registering a custom Target. happy to discuss further!

cmake/config.cmake Outdated Show resolved Hide resolved
include/tvm/relay/transform.h Outdated Show resolved Hide resolved
python/tvm/relay/backend/contrib/uma/_template/backend.py Outdated Show resolved Hide resolved
python/tvm/relay/backend/contrib/uma/_template/backend.py Outdated Show resolved Hide resolved
tests/python/contrib/test_uma/test_partition.py Outdated Show resolved Hide resolved
python/tvm/relay/backend/contrib/uma/tutorial.md Outdated Show resolved Hide resolved
* "Compiler" attribute. Functions that do not match this condition remain
* unaltered.
*/
class OutlineCompilerFunctionsMutator : public MixedModeMutator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this already exists in ethosu, can we factor it out instead of duplicating?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc : @lhutton1 , If its exactly the same and if you feel it useful in a general sense for accelerator flow. I would like to second @areusch, we could move this pass to be generic transform pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in fact just a copy of the ethosu code. I think, it is a generally useful pass for accelerator flows, so +1 to make this a generic transform pass, if this is an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areusch, should we move this to a seperate PR then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're duplicating it here, i'd prefer if we could factor it out now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i'm fine to defer this to a follow-on PR if @manupa-arm is ok with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe with a TODO/issue then ?

src/relay/backend/contrib/uma/relay_to_tir.cc Outdated Show resolved Hide resolved
src/relay/backend/contrib/uma/tir_to_runtime.cc Outdated Show resolved Hide resolved
@cbalint13
Copy link
Contributor

cbalint13 commented Jul 20, 2022

thanks @MichaelJKlaiber I've given this a look over. I think it's generally in the right direction, suggest a couple more comments and a few minor changes here and there. overall though, the approach looks great. i'd love it if we could explore integrating this further with tvmc since we are registering a custom Target. happy to discuss further!

@areusch ,
IMHO, for consideration purpose:

  • UMA with it's approach to bridge "out-of-tree" things/implementations into TVM might have "looser" tie & offerings for tvmc. I would only second the idea that tvmc's "exposing fixed CLI features" must relay when in reality it only may reach what (full/partially) will ever be implemented (by UMA consumers) on top of UMA bridge itself. And this may statement is valid even if we register a full tvm legit custom target via UMA.
  • I can imagine someone's only goal via UMA is to generate lowered "c" code blobs and period, code that includes super-optimised (inc. auto-tensorized loops) __asm(".word " 0xc0fefe) calls for very special ISA extensions. I won't condition this PR to tvmc with a must however some really obvious things probably can be exposed via tvmc.

@MichaelJKlaiber
Copy link
Contributor Author

@cbalint13, appreciate sharing your thoughts! I also think this is a great out-of-tree approach and look forward to its actual usage.

Could you add more documentation in general (e.g., explain what parameters are)?

Current sample (_template) & doc are fine for a contrib module.
It greatly enables "out-of-tree" maintenance towards custom HW accelarator design.
Example parts Vanilla, Strawberry & Chocolate are nice levels of introductory & guidance.

I agree that the overall docstring structure (e.g., Vanilla, Strawberry & Chocolate) would be a great introduction/tutorial (Truly appreciate the authors!). My issue was more like a code-level docstring since some of them are missing and I wasn't sure what each function or parameter exactly means. https://github.com/apache/tvm/blob/bc211fdc91932ef9b77f7edd2ff65151ee5faaff/python/tvm/relay/backend/contrib/uma/_template/codegen.py https://github.com/apache/tvm/blob/bc211fdc91932ef9b77f7edd2ff65151ee5faaff/python/tvm/relay/backend/contrib/uma/_template/codegen.py

It would be great to have a testing for end-to-end model-level testing. (e.g., run BERT with UMA on ARM EthosU)

Sounds beyond of scope, same way we could "dream" a fully verilated BERT running verilator-contrib module (!) @tmoreau89 , @vegaluisjose

As for ARM-Ethos ⟨™⟩, my understanding is a quite different (not so "out-of-tree" principled) way to connect things inside, but @manupa-arm can comment more on this.

Would be possible in the future to decouple ARM-Ethos⟨™⟩ (or Vitis-AI) to be more "out-of-tree" by porting it over UMA, but this would be quite bit of another task.

I see. Thank you for the clarification! Maybe my example was not relevant. I just thought it would be nice to have some model-level test cases since we seem to have all necessary components (UMA Partitioner, UMA Pipeline, UMA Lower, UMA Codegen) in this PR. I just wanted to see if current PR also covers a case where we lower some parts with UMA while handling other unsupported parts via the conventional pipeline.

We have an example testcase for running MobileNet (created in Relay) with uma and the Vanilla Accelerator in place. Not sure if it should be added to this PR or a later one, as this PR already is a lot to discuss. I'd be in favor of moving this to the next UMA PR.

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work!
I've done an intial round of review -- my concerns are mostly around the structure and UMA CLI. It would be better seperate the latter to another PR, IMO, given the content in this PR.

python/tvm/relay/backend/contrib/uma/uma_cli.py Outdated Show resolved Hide resolved
AOTTestRunner,
generate_ref_data,
compile_and_run,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a test source. Should we move this to tests ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manupa-arm , this is not supposed to be test code. This is example code for the tutorial. We will replace AOTTestRunner by microTVM Project API

python/tvm/relay/backend/contrib/uma/_template/backend.py Outdated Show resolved Hide resolved
* "Compiler" attribute. Functions that do not match this condition remain
* unaltered.
*/
class OutlineCompilerFunctionsMutator : public MixedModeMutator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc : @lhutton1 , If its exactly the same and if you feel it useful in a general sense for accelerator flow. I would like to second @areusch, we could move this pass to be generic transform pass.

python/tvm/relay/backend/contrib/uma/_template/backend.py Outdated Show resolved Hide resolved
Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MichaelJKlaiber this is an exciting addition! Just wanted to chip in with a few more suggestions / nits.

Since the code in python/tvm/relay/backend/contrib/uma/* is a developer facing API, it might be worth enabling mypy type checking in task_mypy.sh.

python/tvm/relay/backend/contrib/uma/api/partitioner.py Outdated Show resolved Hide resolved
],
Optional[int],
]
] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to break these type annotations into smaller named variables? e.g. like PatternTable in partitioner.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noting i think this one isn't done yet

python/tvm/relay/backend/contrib/uma/backend.py Outdated Show resolved Hide resolved
tests/scripts/task_python_uma.sh Outdated Show resolved Hide resolved
@areusch
Copy link
Contributor

areusch commented Jul 21, 2022

we discussed this in the Community Meeting yesterday. here are notes on the discussion:

  • when the design references "Accelerator A" and "Accelerator B," does this mean we're using both simultaneously?

    • not in this v1, though the architecture supports it. at present they can simply coexist as options.
  • should we integrate this with TVMC?

    • @areusch: it should be fairly easy to integrate the UMA targets with the tvmc run command
    • @manupa-arm : this should be pretty straightforward to add to tvmc. the bigger concern here was around uma_cli.py, which is supposed to generate a starter implementation for new accelerators in uma.
    • @areusch : we should either have tvmc or some other developer-facing entry point to house tools like this. probably not bad to add dev tools to tvmc now--we can always migrate them out if we need to.
    • @MichaelJKlaiber : intention of uma_cli is just to make the tutorial easier to replicate on your own, so there are two steps there--create the accelerator flow and then run inference.
    • @manupa-arm : do we expect the CLI to work when we're in an environment where only the tvm wheel is present? e.g what about the C sources included with accelerator? should those go in the wheel?
    • @MichaelJKlaiber: those sources are copied into the generated dir by uma_cli.
    • @areusch : what's the include path folks are expected to set on their downstream C compiler? seems like the C files included with accelerator template should really make it into the Model Library Format. Could produce another CSourceModule which would create another e.g. default_lib3.cc in the MLF. Could also use the import_c pragma similar. to how we do for microTVM.
  • where should the template live?

    • @areusch : could go either way or both. how do we expect people to package their accelerator flow? if merging into mainline, perhaps we want in the python import path. if keeping accelerator flow private, perhaps apps is similar to carrying that code alongside the tvm wheel.
    • @manupa-arm : deciding intended location based on whether a flow will get upstreamed makes sense. _template is an example rather than a target, so maybe apps could make more sense for it.
  • @manupa-arm : also suggest to break the CLI changes into another PR.

  • @MichaelJKlaiber : only vanilla accelerator was impl'd; do folks have suggestions for chocolate and strawberry? feel free ot post in discuss thread or contact

    • @areusch : would be cool to see something that leverages usmp to model physical accelerator memories. could also be cool to see an example where buffers were marked to live on-device.
  • Slava: are the optimization provided in the default TVM pipeline also part of the UMA pipeline?

    • @areusch : you can classify the optimizations in terms of relay passes, scheduling, and post-scheduling passes. TVM tries to operate on an IRModule-to-IRModule prinicple, where each optimization or step takes an IRModule and returns an IRModule. when you mark a subgraph as offloaded to an UMA pipeline, some optimizations aren't enabled--for example, Relay-level operator fusion. Others e.g. those which operate post-scheduling (usmp, for example) will run on UMA operators.
    • Slava: if I have a conv2d followed by batch norm, and only the conv2d is offloaded, then the batch norm is not fused by default?
    • @areusch: the right way to do that would be to mark both as offloaded and do the fusion yourself. there are also some efforts to enable post-scheduling fusion via Relax, but those haven't landed yet.
  • Slava: what's the best way to leverage UMA if e.g. we have 2 different implementations of conv2d depending on kernel size?

    • @areusch : you'd need to give your pattern matcher enough fidelity to differentiate those two workloads. you can also inspect the matched subgraph after using a looser pattern.
  • slava: what's the rough timeline?

  • @MichaelJKlaiber : can also discuss more questions in high-bandwidth with folks.

    • suggest folks post up on the discuss forum. we can also use this meeting for further discussion.

MichaelJKlaiber added a commit to MichaelJKlaiber/tvm that referenced this pull request Jul 27, 2022
UMAv1.0 updates according to comments and suggestions from  PR apache#12087
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @MichaelJKlaiber! few last things

* "Compiler" attribute. Functions that do not match this condition remain
* unaltered.
*/
class OutlineCompilerFunctionsMutator : public MixedModeMutator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're duplicating it here, i'd prefer if we could factor it out now

tests/python/contrib/test_uma/test_partition.py Outdated Show resolved Hide resolved
src/relay/backend/contrib/uma/relay_to_tir.cc Outdated Show resolved Hide resolved
#######################################################################
# Relay to TIR function registration
#######################################################################
self._register_tir_pass(PassPhase.TIR_PHASE_0, MyAiHwConv2dPass())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mbaret one thing we've been discussing is that it might make sense for these enums to roughly correspond to "guarantees about the form of the IRModule" (e.g. "all the primitive Functions are lowered" or "Buffers are defined"). To that end I think it would be great to avoid referencing e.g. TIR_PHASE_n in the enum unless those are distinct guarantees.

python/tvm/relay/backend/contrib/uma/_template/passes.py Outdated Show resolved Hide resolved
@MichaelJKlaiber
Copy link
Contributor Author

CI pipeline still fails for the target i386, see https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-12087/runs/12/nodes/360/steps/1207/log/?start=0
As this does not seem an issue introduced by UMA code, I will remove UMA for i386 build, i.e.

tests/scripts/task_config_build_i386.sh
echo set(USE_UMA OFF) >> config.cmake

What are your thoughts?
@cbalint13 @areusch @areusch @cgerum @manupa-arm @PaulPalomeroBernardo

Does anyone of you understand what is different from i386 to build_cpu?

Does it have to do with:
echo set(USE_LLVM llvm-config-4.0) >> config.cmake
VS
echo set(USE_LLVM llvm-config-11) >> config.cmake
?

@MichaelJKlaiber
Copy link
Contributor Author

MichaelJKlaiber commented Aug 3, 2022

Thanks to all the reviewers for the thorough reviews and the great suggestions and comment that improved UMA!
@manupa-arm @areusch @sunggg @cbalint13 @kslavka @lhutton1

We integrated the vast majority of the requests in this review into the PR.
There were many great suggestion that we hadn't considered before, but improved the overall quality of the UMA contribution, e.g.

  • Tutorial as RST for gallery
  • Adding a case test case where the TVM default pipeline and the UMA pipeline are both used
  • ...

This PR has been open for a quite long time now, it has improved steadily and we feel that it is ready now. Please let us know you thoughts.

There are a couple of (small) things we left out for the is PR, and would ask for you opinion to move it to (very near future) PRs:

  • move OutlineCompilerFunctionsMutator out of UMA @areusch @manupa-arm
  • replace AOTTestRunner by project API @manupa-arm
  • annotations of UMALower._operator_strategies @lhutton1
    (these steps will be moved to the tracking issue, for the case that the PR is granted)

Further steps see current tracking issue #11260

It is also great that WE as a community identified items where further discussion is required, e.g.:

  • UMA CLI / tvmc integration
  • Demo use-cases, e.g. running BERT on Ethos-U with UMA
  • ...
    I'd move them to an updated version of the UMA RFC or the discussion forum

Thanks to everyone contributing to UMA, esp @cgerum @PaulPalomeroBernardo

CC: @SebastianBoblest @UlrikHjort

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @MichaelJKlaiber ! I found a few small mostly mechanical fixes that would make this PR perfect, but overall it looks good to me. thanks for your (and everyone else involved) hard work on this!

apps/uma/_template/__init__.py Outdated Show resolved Hide resolved
apps/uma/_template/conv2dnchw.cc Show resolved Hide resolved
apps/uma/_template/passes.py Outdated Show resolved Hide resolved
apps/uma/_template/passes.py Outdated Show resolved Hide resolved
],
Optional[int],
]
] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noting i think this one isn't done yet

* "Compiler" attribute. Functions that do not match this condition remain
* unaltered.
*/
class OutlineCompilerFunctionsMutator : public MixedModeMutator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i'm fine to defer this to a follow-on PR if @manupa-arm is ok with that.

@areusch areusch requested a review from manupak August 4, 2022 15:47
Copy link
Contributor

@sunggg sunggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the exciting contribution and addressing my comments!
Just a super minor nit :)

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the all the hard work here and for all the changes.
I had a deeper look into the PR. Few nits and a concern with "includes" function for UMABackend.

*
*/
int
my_ai_hw_conv2dnchw(float* ifmap, float* weights, float* result, int oc, int iw, int ih, int ic,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This source looks like as it was not linted. Maybe we dont lint apps ..
Anyhow, would you be able to run clang-format on this file ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it doesnt look linted. It is what
./docker/lint.sh -i clang_format
gives me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leandron @areusch @driazati do you know any reason that this file will not be linted ?

include/tvm/relay/transform.h Outdated Show resolved Hide resolved
python/tvm/relay/backend/contrib/uma/api/lower.py Outdated Show resolved Hide resolved
python/tvm/relay/backend/contrib/uma/api/lower.py Outdated Show resolved Hide resolved
* "Compiler" attribute. Functions that do not match this condition remain
* unaltered.
*/
class OutlineCompilerFunctionsMutator : public MixedModeMutator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe with a TODO/issue then ?

apps/uma/_template/strategies.py Show resolved Hide resolved
src/relay/backend/contrib/uma/tir_to_runtime.cc Outdated Show resolved Hide resolved
src/relay/backend/contrib/uma/targets.cc Outdated Show resolved Hide resolved
@manupak
Copy link
Contributor

manupak commented Aug 5, 2022

replace AOTTestRunner by project API @manupa-arm

I am happy with AOTTestRunner as we envision that example/test to be useful for a embedded dev who wants to integrate few models to the their own embedded project. Also IIRC, this was @areusch suggestion :)

In same time, it might be also valuable to add a Project API for users that quickly want spin up a project to test the inference.
So they both have value and their place.

Therefore, I would say not "replace" but maybe just extend/add ? -- Also happy to push it to another PR

@areusch
Copy link
Contributor

areusch commented Aug 9, 2022

thanks @MichaelJKlaiber for all the hard work here!

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
* Add minimal working structure for generic interface

* Separate target definition from codegen

* Update file structure to support multiple NPU targets

* Add scheduling and pass support to codegen

* Update schedule function and pass registration

* Add generic partitioner for relay graph partitioning

* Add pattern-based relay graph partitioning and AOT codegen

* Update API

* Add UltraTrail relay passes and schedule function

* Update UltraTrail relay passes

* Add tir_to_runtime hook for UltraTrail

* Add operator strategy registration to lowering

* Add option to pass constants as attributes

* Refactor naming: Generic to UMA

* Change API to single user-facing backend class UMABackend

* Add initial codegen API

* [UMA] add a generic packed function to register targets

* Restructure files and add initial codegen

* Minor code cleanup

* Add UMA config and MergeCompilerRegion example

* Move UMA configuration to init parameters

* Add python hooks for C-codegen. Still has known restrictons

* Fix relay_to_tir hook to keep virtual device in main function

* Remove register schedules, scheduling is moved to passes for now

* Remove extract constants since non-scalar constants are now supported by TVM

* API documentation and some code fixes and cleanup

* Fix typo

* Fix UMA lowering

* Prototype for UMA-based target attribute registration

* Add default option and type deduction to register_target_attr

* Change pass phases to enum

* [Relay] Plumb external codegen target via Target.current() for all external codegen paths

(See https://discuss.tvm.apache.org/t/byoc-supporting-cutlass-byoc-with-collage/12796/6 for
context, which in turn is part of Collage (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md).

We want both old-style (via relay.ext.$toolchain) and new-style (via "RelayToTIR" Pass
attribute on target kind) external codegen to be able to access the current 'external codegen'
Target instance via Target.current().

 - For old-style, plumb the true Target through TEComplier and push it on the context
   stack before calling relay.ext.$toolchain.

 - For new-style, pass the CompilationConfig to the RelayToTIRTargetHook pass, make the jump from
   "Compiler" attribute value to Target via the new CompilationConfig::FindPrimitiveTargetForKind
   method, and push on the stack before invoking the custom "RelayToTIR" pass.

While working on this discovered RelayToTIRTargetHook was incompatible with the VM's compilation
flow since RelayToTIRTargetHook assumes all "Compiler" attributed functions are inlined. Generalize
it to support both inline and global function styles.

Extend Target::IsExternalCodegen to recognize target kinds with "RelayToTIR" attributes as
external.

Update target hooks unit test to exercise new support for outline-style, picking up the current target,
and compiling via the VM.

* Use current target in lowering

* Use attr:kRelayToTIR

* Remove erronousely commited quick fix

* Towards test cases for uma

* Add test_uma

* Initial UMA structure for version 1

* [UMA]: conv2d unit test

* [UMA] update of tutorial

* [UMA] update of pass format, still issue with conv2d c code

* [UMA] refactoring of test_uma_lowering_with_umalower.py

* [UMA] refactoring of test_uma_lowering_with_umalower.py

* [UMA] Adding backend, codegen, patterns, strategies and run file for MyAiHw

* [UMA] update towards my_ai_hw usecase

* [UMA] working testcase for conv2d with uma

* [UMA] testcase

* [UMA] uma lower.py: replaced outdated function create_prim_func_from_outputs to be compatible withe latest content of "main"

* UMA: Move torch import to top to avoid free(): invalid pointer error

* Add stub files for targets

* Add tests for ultratrail codegen

* Adopt my_ai_hw accelerator for new target definition

* Add unit test for target attributes

* Test string arguments

* Extend target test

* [UMA] tutorial first versin

* [UMA] moved unit tests to contrib

* [UMA] renaming interfaces

* Fix umalower_tests in ci

* make uma a python module

* [UMA] Update of UMAv1 API + added testcases + tutorialV1

* [UMA] UMAv1

* [UMA] cmake file updated

* AOT test infrastructure adapted

* UMA: add __init__.py for uma.api

* Finish uma tests

* Use upstream version of dmlc-core

* [UMA] tir_to_runtime documentation update

* [UMA] cleanup

* [UMA] fix for test_partition

* [UMA] lint fix

* [UMA] lint fix

* [UMA] lint fix

* [UMA] lint fix

* [UMA] fix of build scripts for arm and i386

* Fix remaining linter errors

* [UMA] CMakeLists.txt added UMA tvm_option

* [UMA] added UMA tvm_option

* [UMA] guard against multiple registrations

* [UMA] fixed comments as pointed out in PR 12087

* [UMA] fixed comments as pointed out in PR 12087

* [UMA] skip uma tests if uma is not available

* [UMA] added UMA rst

* [UMA] Moved tutorial to RST file in gallery

* [UMA] moved uma cli to apps

* [UMA] change requests according to PR-12087

* [UMA] update and sync of uma_cli and tutorial

* [UMA] update of template passe: remove Pad block of Conv2D

* [UMA] lint updates

* [UMA] Test updates

* [UMA] fixes according to comments from PR 12087 discussion

* [UMA] lint updates

* [UMA] moved UMA _template file to apps

* [UMA] lint

* [UMA] Remove exceptions when dispatching over targets

* [UMA] vanilla pattern update

* [UMA] added mobilenet integration test

* [UMA] clang lint

* Remove tir to runtime

* [UMA] Use sequential for UMA relay passes

* Use comparison against BYOC flow in test_partition

* [UMA] tutorial update: moved code blocks to RST

* [UMA] tutorial update and lint fixes

* [UMA]  removing UMA from i386 build, as there is a fail in the CI pipeline due to missing CLANG for i386

* [BYOC-DNNL] covered case for sum node without attr

* [UMA] pylint

* [UMA] pylint

* [UMA] aot fix

* [UMA] Changes PR review

* [UMA] cc lint

* [UMA] cc lint

* Use better function name for te_lowering and annotate current target at TE functions

Co-authored-by: Paul Palomero Bernardo <paulpb@outlook.com>
Co-authored-by: Christoph Gerum <christoph.gerum@uni-tuebingen.de>
Co-authored-by: mbs-octoml <mbs@octoml.ai>
Co-authored-by: Christoph Gerum <gerum@informatik.uni-tuebingen.de>
mikeseven pushed a commit to mikeseven/tvm that referenced this pull request Sep 27, 2023
* Add minimal working structure for generic interface

* Separate target definition from codegen

* Update file structure to support multiple NPU targets

* Add scheduling and pass support to codegen

* Update schedule function and pass registration

* Add generic partitioner for relay graph partitioning

* Add pattern-based relay graph partitioning and AOT codegen

* Update API

* Add UltraTrail relay passes and schedule function

* Update UltraTrail relay passes

* Add tir_to_runtime hook for UltraTrail

* Add operator strategy registration to lowering

* Add option to pass constants as attributes

* Refactor naming: Generic to UMA

* Change API to single user-facing backend class UMABackend

* Add initial codegen API

* [UMA] add a generic packed function to register targets

* Restructure files and add initial codegen

* Minor code cleanup

* Add UMA config and MergeCompilerRegion example

* Move UMA configuration to init parameters

* Add python hooks for C-codegen. Still has known restrictons

* Fix relay_to_tir hook to keep virtual device in main function

* Remove register schedules, scheduling is moved to passes for now

* Remove extract constants since non-scalar constants are now supported by TVM

* API documentation and some code fixes and cleanup

* Fix typo

* Fix UMA lowering

* Prototype for UMA-based target attribute registration

* Add default option and type deduction to register_target_attr

* Change pass phases to enum

* [Relay] Plumb external codegen target via Target.current() for all external codegen paths

(See https://discuss.tvm.apache.org/t/byoc-supporting-cutlass-byoc-with-collage/12796/6 for
context, which in turn is part of Collage (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md).

We want both old-style (via relay.ext.$toolchain) and new-style (via "RelayToTIR" Pass
attribute on target kind) external codegen to be able to access the current 'external codegen'
Target instance via Target.current().

 - For old-style, plumb the true Target through TEComplier and push it on the context
   stack before calling relay.ext.$toolchain.

 - For new-style, pass the CompilationConfig to the RelayToTIRTargetHook pass, make the jump from
   "Compiler" attribute value to Target via the new CompilationConfig::FindPrimitiveTargetForKind
   method, and push on the stack before invoking the custom "RelayToTIR" pass.

While working on this discovered RelayToTIRTargetHook was incompatible with the VM's compilation
flow since RelayToTIRTargetHook assumes all "Compiler" attributed functions are inlined. Generalize
it to support both inline and global function styles.

Extend Target::IsExternalCodegen to recognize target kinds with "RelayToTIR" attributes as
external.

Update target hooks unit test to exercise new support for outline-style, picking up the current target,
and compiling via the VM.

* Use current target in lowering

* Use attr:kRelayToTIR

* Remove erronousely commited quick fix

* Towards test cases for uma

* Add test_uma

* Initial UMA structure for version 1

* [UMA]: conv2d unit test

* [UMA] update of tutorial

* [UMA] update of pass format, still issue with conv2d c code

* [UMA] refactoring of test_uma_lowering_with_umalower.py

* [UMA] refactoring of test_uma_lowering_with_umalower.py

* [UMA] Adding backend, codegen, patterns, strategies and run file for MyAiHw

* [UMA] update towards my_ai_hw usecase

* [UMA] working testcase for conv2d with uma

* [UMA] testcase

* [UMA] uma lower.py: replaced outdated function create_prim_func_from_outputs to be compatible withe latest content of "main"

* UMA: Move torch import to top to avoid free(): invalid pointer error

* Add stub files for targets

* Add tests for ultratrail codegen

* Adopt my_ai_hw accelerator for new target definition

* Add unit test for target attributes

* Test string arguments

* Extend target test

* [UMA] tutorial first versin

* [UMA] moved unit tests to contrib

* [UMA] renaming interfaces

* Fix umalower_tests in ci

* make uma a python module

* [UMA] Update of UMAv1 API + added testcases + tutorialV1

* [UMA] UMAv1

* [UMA] cmake file updated

* AOT test infrastructure adapted

* UMA: add __init__.py for uma.api

* Finish uma tests

* Use upstream version of dmlc-core

* [UMA] tir_to_runtime documentation update

* [UMA] cleanup

* [UMA] fix for test_partition

* [UMA] lint fix

* [UMA] lint fix

* [UMA] lint fix

* [UMA] lint fix

* [UMA] fix of build scripts for arm and i386

* Fix remaining linter errors

* [UMA] CMakeLists.txt added UMA tvm_option

* [UMA] added UMA tvm_option

* [UMA] guard against multiple registrations

* [UMA] fixed comments as pointed out in PR 12087

* [UMA] fixed comments as pointed out in PR 12087

* [UMA] skip uma tests if uma is not available

* [UMA] added UMA rst

* [UMA] Moved tutorial to RST file in gallery

* [UMA] moved uma cli to apps

* [UMA] change requests according to PR-12087

* [UMA] update and sync of uma_cli and tutorial

* [UMA] update of template passe: remove Pad block of Conv2D

* [UMA] lint updates

* [UMA] Test updates

* [UMA] fixes according to comments from PR 12087 discussion

* [UMA] lint updates

* [UMA] moved UMA _template file to apps

* [UMA] lint

* [UMA] Remove exceptions when dispatching over targets

* [UMA] vanilla pattern update

* [UMA] added mobilenet integration test

* [UMA] clang lint

* Remove tir to runtime

* [UMA] Use sequential for UMA relay passes

* Use comparison against BYOC flow in test_partition

* [UMA] tutorial update: moved code blocks to RST

* [UMA] tutorial update and lint fixes

* [UMA]  removing UMA from i386 build, as there is a fail in the CI pipeline due to missing CLANG for i386

* [BYOC-DNNL] covered case for sum node without attr

* [UMA] pylint

* [UMA] pylint

* [UMA] aot fix

* [UMA] Changes PR review

* [UMA] cc lint

* [UMA] cc lint

* Use better function name for te_lowering and annotate current target at TE functions

Co-authored-by: Paul Palomero Bernardo <paulpb@outlook.com>
Co-authored-by: Christoph Gerum <christoph.gerum@uni-tuebingen.de>
Co-authored-by: mbs-octoml <mbs@octoml.ai>
Co-authored-by: Christoph Gerum <gerum@informatik.uni-tuebingen.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants