Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]TECompiler: Staged refactor and removal of compile engine #7518

Merged
merged 1 commit into from
Jul 8, 2021

Conversation

csullivan
Copy link
Contributor

@csullivan csullivan commented Feb 24, 2021

This is a staged removal of the use of compile engine. The motivation is to bring TIR compilation into the main flow of the compiler rather than producing and compiling it via a callback into the compile engine. By replacing Relay primitive function calls with TIR prim function calls that contain the lowered TIR we enable,

  • An intermediate stage in the lowering process where Relay and TIR coexist.
  • The ability to add passes at this intermediate stage,
    • For example memory planning which can infer user provided information from TE and the resulting TIR.

We are starting with a proof of concept by refactoring the GraphRuntimeCodegen to use an introduced TIR/TE compiler instead of the compile engine directly. In the new flow,

  • The TE/TIR compiler lowers TE in the LowerTensorExpr pass
  • Replaces relay.Function(attr:primitive) with a PrimFnCall that contains the lowered TIR
  • Runs GraphPlanMemory
  • Finally runs GraphRuntimeCodegen::VisitExpr to lower to graph JSON

[RFC][Relay] TECompiler: Rewrite existing compile engine to match updated compiler flow

@jroesch jroesch changed the title [WIP] Staged refactor and removal of compile engine [RFC][WIP] TECompiler: Staged refactor and removal of compile engine Feb 24, 2021
@jroesch jroesch mentioned this pull request Feb 24, 2021
7 tasks
@jroesch jroesch force-pushed the remove-compile-engine branch from 077cc4e to 00178dc Compare March 19, 2021 00:39
@jroesch jroesch changed the title [RFC][WIP] TECompiler: Staged refactor and removal of compile engine [RFC]TECompiler: Staged refactor and removal of compile engine Mar 24, 2021
@jroesch
Copy link
Member

jroesch commented Mar 24, 2021

Need to port fix from #7703 but otherwise ready for review.

@jroesch
Copy link
Member

jroesch commented Mar 24, 2021

Modulo some left over polish work and documentation I think this is ready for review @icemelon9 @comaniac @csullivan @tkonolige @rkimball @junrushao1994 @areusch @mehrdadh

@jroesch
Copy link
Member

jroesch commented Mar 24, 2021

cc @mbaret

@csullivan csullivan marked this pull request as ready for review March 24, 2021 16:03
src/relay/backend/graph_runtime_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/graph_runtime_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/graph_runtime_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/graph_runtime_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/graph_runtime_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/te_compiler_cache.cc Outdated Show resolved Hide resolved
src/relay/ir/function.cc Show resolved Hide resolved
src/runtime/graph/graph_runtime.cc Outdated Show resolved Hide resolved
tests/python/relay/test_backend_graph_runtime.py Outdated Show resolved Hide resolved
src/driver/driver_api.cc Show resolved Hide resolved
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did an initial pass

src/driver/driver_api.cc Show resolved Hide resolved
src/relay/backend/graph_plan_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/graph_plan_memory.cc Outdated Show resolved Hide resolved
targets_ = targets;
}

LoweredOutput Codegen(relay::Function func) {
auto pf = GetPackedFunc("relay.backend.GraphPlanMemory");
storage_device_map_ = (*pf)(func);
// Jared: why do we do this? just call C++ API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point we should support overriding this. it seems like you'd need to build a C++ impl right now to do that, so I don't know that we need this hook now. but, we should plan to put some customization back in the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should ask what do we allow customization of what and where ? I think allowing people to wholesale replace the lowering/memory/planning/etc in an ad-hoc way "seems bad", which effectively is all of these hooks (that I'm removing) do right now. They allow you to just arbitrarily overload parts of the compiler with custom code which is pretty non-compositional and hard to reason about since in certain cases, i.e Python loaded or not the behavior of the compiler is completely different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to allow people to overload GraphPlanMemory with its current implementation. If GraphPlanMemory were to support memory pools and tensor pinning, then I think there will be cases where people want to implement custom algorithms to place tensors in different memory pools.

I also think the reason this "seems bad" is that there is no good interface/data structure defined here. There's no reason we can't define one--in fact, I think we are hurting compiler readability by not doing so. Here we are keeping the interface to a sort of Pythonic "informal set of lists" in place of a proper data structure. Here and downstream of here, there are many instances where both TVM code and people trying to integrate with TVM code would benefit from effectively exporting StorageToken as a user-facing data structure and clearly defining its meaning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we need to move up the stack to address this, because everyone is only thinking about their own memory planning needs and not really looking at the large picture.

src/relay/backend/te_compiler_cache.cc Outdated Show resolved Hide resolved
tests/python/relay/test_backend_graph_runtime.py Outdated Show resolved Hide resolved
tests/python/relay/test_backend_graph_runtime.py Outdated Show resolved Hide resolved
tests/python/relay/test_backend_graph_runtime.py Outdated Show resolved Hide resolved
@jroesch jroesch force-pushed the remove-compile-engine branch from 41a090e to 8141aa9 Compare April 7, 2021 17:12
src/relay/backend/graph_executor_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/graph_executor_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/graph_executor_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/graph_executor_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/te_compiler.cc Outdated Show resolved Hide resolved
src/runtime/graph_executor/graph_executor.cc Outdated Show resolved Hide resolved
src/runtime/graph_executor/graph_executor.cc Outdated Show resolved Hide resolved
src/target/llvm/llvm_module.cc Show resolved Hide resolved
tests/python/relay/test_backend_graph_executor.py Outdated Show resolved Hide resolved
tests/python/relay/test_backend_graph_executor.py Outdated Show resolved Hide resolved
src/relay/backend/te_compiler.h Outdated Show resolved Hide resolved
src/relay/backend/te_compiler.h Outdated Show resolved Hide resolved
src/relay/backend/te_compiler.h Show resolved Hide resolved
src/relay/backend/te_compiler.h Outdated Show resolved Hide resolved
src/relay/backend/te_compiler_cache.h Outdated Show resolved Hide resolved
src/relay/backend/te_compiler.h Outdated Show resolved Hide resolved
src/relay/backend/te_compiler_cache.cc Show resolved Hide resolved
@jroesch jroesch force-pushed the remove-compile-engine branch from 6dab6be to 6fb66c1 Compare April 14, 2021 17:00
@areusch areusch added the status: need update need update based on feedbacks label Apr 30, 2021
@jroesch jroesch force-pushed the remove-compile-engine branch from 05409d0 to a561718 Compare May 3, 2021 20:36
@jroesch jroesch force-pushed the remove-compile-engine branch from a561718 to 9192543 Compare June 15, 2021 18:10
@csullivan csullivan force-pushed the remove-compile-engine branch from 9192543 to cdff959 Compare June 15, 2021 20:51
@jroesch jroesch force-pushed the remove-compile-engine branch 2 times, most recently from 6452ca4 to f82e60d Compare June 29, 2021 21:07
@jroesch jroesch force-pushed the remove-compile-engine branch from 09a0b14 to a86752c Compare June 29, 2021 23:37
@jroesch jroesch force-pushed the remove-compile-engine branch from e31ca8a to fe91baf Compare July 6, 2021 17:19
Duplicate the CompileEngine interface.

Refactor the graph_runtime_codegen to invoke the new LowerTE pass

More changes

Things appear to be working

Some tracing to get Relay code to flow through too.

Disable some assertions as exp.

Tweak printing for now

Fix a few bugs: (#13)

1. Don't add relay main function to list of lowered TIR functions
2. Don't skip visiting call to relay function in graph runtime codegen

Remove debug prints.

Start refactoring

Split out shared data structures

Fix implicit duplicate decl of IsDynamic

Clean up handling of name + global prim fn

Clean up the code and debug issue introduced by previous hack

Clean up the debugging

Do C++ lint clean up

Update src/relay/backend/graph_executor_codegen.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Clean up handling of external functions

Add more error messages

More clean up

Update src/runtime/graph_executor/graph_executor.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Update src/runtime/graph_executor/graph_executor.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Update src/relay/backend/te_compiler.h

Co-authored-by: Haichen Shen <shenhaichen@gmail.com>

Update src/relay/backend/te_compiler.h

Co-authored-by: Haichen Shen <shenhaichen@gmail.com>

Fix

CR

More CR

Format

Fix lowering path for C++

Fix tests

Remove uncessary change

Clean up a few more things

CI fix

Fix the default context

Fix

Fix broken test cases

Update

Fix

WIP

Clean up storage data structures

WIP

WIP

Fix build errors

Remove TVMLower

Fix lint

Lint again

fix black

Move UpdateMainWorkspaceSize into te_compiler.cc

Fix link errors

Formatting

Change UpdateMainWorkspaceSize to return Map<String, FunctionInfo>

Workaround for GCC 5 error caused by enums in maps (GCC 5 is on i386 CI)

Testing how functions should be named

Lint

Change how function metadata is updated

Attempt to update aot_executor_codegen to use new StaticMemoryPlan instead of storage_device_map

Pass memory plan through LowerTE into UpdateMainWorkspaceSize so that we don't need to run GraphPlanMemory an extra time

Fix return in UpdateMainWorkspaceSize

Lint

Try to fix UpdateMainWorkspaceSize

Fix construction of static memory plan

Clean up code while debugging

Adding UpdateWorkspaceSize back

Add closure + call to UpdateFunctionMetadata (WIP)

UpdateFunctionMetadata builds; weird error with device ctx map though. Not sure if it came from this change or something else

Add some debugging of UpdateMainWorkspaceSize

Starting to move UpdateFunctionMetadata call to use process_fn infra

UWhat target should be passed to UpdateFunctionMetadata?

UpdateFunctionMetadata is not workinggg

Added some comments about UpdateFunctionMetadata for Jared

Fix the creation of function metadata

Try another stab at cleaning up the information

Fix

Port StorageInfo and StaticMemoryPlan data structure (apache#8297)

Restoring reshape opt

Fix tests

Caught a nasty typo from Lily, Map::Set does not mutate

Format

Disable stupid Google style warning

Rebase cleanup

Formatting

Add docstring for storage info

Black

Post rebase fix

Remove prints

Disable assert that doesn't make sense for now

Fix lint

Add copying attrs from relay node to graph node; still need to figure out how to do this in the case of global vars

Work with Lily to fix graph attrs

Try to figure out where extra arguments are coming from; fix merge

passes the profiling test

Clean up

Fix profile test

Remove debugging

Add attributes for BYOC uTVM case

Format

Dumb typo

Another fix for byoc

Format

Fix last 3 failing tests

Format

Fix final two test cases

Format

Fix lint

Fix again

Fix

Fix auto scheduler code

Fix issue

Address CR comment

Format
@electriclilies electriclilies force-pushed the remove-compile-engine branch from e225ff7 to 562e93a Compare July 8, 2021 00:01
@jroesch jroesch merged commit 9c66587 into apache:main Jul 8, 2021
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
Duplicate the CompileEngine interface.

Refactor the graph_runtime_codegen to invoke the new LowerTE pass

More changes

Things appear to be working

Some tracing to get Relay code to flow through too.

Disable some assertions as exp.

Tweak printing for now

Fix a few bugs: (apache#13)

1. Don't add relay main function to list of lowered TIR functions
2. Don't skip visiting call to relay function in graph runtime codegen

Remove debug prints.

Start refactoring

Split out shared data structures

Fix implicit duplicate decl of IsDynamic

Clean up handling of name + global prim fn

Clean up the code and debug issue introduced by previous hack

Clean up the debugging

Do C++ lint clean up

Update src/relay/backend/graph_executor_codegen.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Clean up handling of external functions

Add more error messages

More clean up

Update src/runtime/graph_executor/graph_executor.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Update src/runtime/graph_executor/graph_executor.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Update src/relay/backend/te_compiler.h

Co-authored-by: Haichen Shen <shenhaichen@gmail.com>

Update src/relay/backend/te_compiler.h

Co-authored-by: Haichen Shen <shenhaichen@gmail.com>

Fix

CR

More CR

Format

Fix lowering path for C++

Fix tests

Remove uncessary change

Clean up a few more things

CI fix

Fix the default context

Fix

Fix broken test cases

Update

Fix

WIP

Clean up storage data structures

WIP

WIP

Fix build errors

Remove TVMLower

Fix lint

Lint again

fix black

Move UpdateMainWorkspaceSize into te_compiler.cc

Fix link errors

Formatting

Change UpdateMainWorkspaceSize to return Map<String, FunctionInfo>

Workaround for GCC 5 error caused by enums in maps (GCC 5 is on i386 CI)

Testing how functions should be named

Lint

Change how function metadata is updated

Attempt to update aot_executor_codegen to use new StaticMemoryPlan instead of storage_device_map

Pass memory plan through LowerTE into UpdateMainWorkspaceSize so that we don't need to run GraphPlanMemory an extra time

Fix return in UpdateMainWorkspaceSize

Lint

Try to fix UpdateMainWorkspaceSize

Fix construction of static memory plan

Clean up code while debugging

Adding UpdateWorkspaceSize back

Add closure + call to UpdateFunctionMetadata (WIP)

UpdateFunctionMetadata builds; weird error with device ctx map though. Not sure if it came from this change or something else

Add some debugging of UpdateMainWorkspaceSize

Starting to move UpdateFunctionMetadata call to use process_fn infra

UWhat target should be passed to UpdateFunctionMetadata?

UpdateFunctionMetadata is not workinggg

Added some comments about UpdateFunctionMetadata for Jared

Fix the creation of function metadata

Try another stab at cleaning up the information

Fix

Port StorageInfo and StaticMemoryPlan data structure (apache#8297)

Restoring reshape opt

Fix tests

Caught a nasty typo from Lily, Map::Set does not mutate

Format

Disable stupid Google style warning

Rebase cleanup

Formatting

Add docstring for storage info

Black

Post rebase fix

Remove prints

Disable assert that doesn't make sense for now

Fix lint

Add copying attrs from relay node to graph node; still need to figure out how to do this in the case of global vars

Work with Lily to fix graph attrs

Try to figure out where extra arguments are coming from; fix merge

passes the profiling test

Clean up

Fix profile test

Remove debugging

Add attributes for BYOC uTVM case

Format

Dumb typo

Another fix for byoc

Format

Fix last 3 failing tests

Format

Fix final two test cases

Format

Fix lint

Fix again

Fix

Fix auto scheduler code

Fix issue

Address CR comment

Format

Co-authored-by: Jared Roesch <roeschinc@gmail.com>
zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Mar 4, 2022
Duplicate the CompileEngine interface.

Refactor the graph_runtime_codegen to invoke the new LowerTE pass

More changes

Things appear to be working

Some tracing to get Relay code to flow through too.

Disable some assertions as exp.

Tweak printing for now

Fix a few bugs: (apache#13)

1. Don't add relay main function to list of lowered TIR functions
2. Don't skip visiting call to relay function in graph runtime codegen

Remove debug prints.

Start refactoring

Split out shared data structures

Fix implicit duplicate decl of IsDynamic

Clean up handling of name + global prim fn

Clean up the code and debug issue introduced by previous hack

Clean up the debugging

Do C++ lint clean up

Update src/relay/backend/graph_executor_codegen.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Clean up handling of external functions

Add more error messages

More clean up

Update src/runtime/graph_executor/graph_executor.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Update src/runtime/graph_executor/graph_executor.cc

Co-authored-by: Chris Sullivan <csullivan@octoml.ai>

Update src/relay/backend/te_compiler.h

Co-authored-by: Haichen Shen <shenhaichen@gmail.com>

Update src/relay/backend/te_compiler.h

Co-authored-by: Haichen Shen <shenhaichen@gmail.com>

Fix

CR

More CR

Format

Fix lowering path for C++

Fix tests

Remove uncessary change

Clean up a few more things

CI fix

Fix the default context

Fix

Fix broken test cases

Update

Fix

WIP

Clean up storage data structures

WIP

WIP

Fix build errors

Remove TVMLower

Fix lint

Lint again

fix black

Move UpdateMainWorkspaceSize into te_compiler.cc

Fix link errors

Formatting

Change UpdateMainWorkspaceSize to return Map<String, FunctionInfo>

Workaround for GCC 5 error caused by enums in maps (GCC 5 is on i386 CI)

Testing how functions should be named

Lint

Change how function metadata is updated

Attempt to update aot_executor_codegen to use new StaticMemoryPlan instead of storage_device_map

Pass memory plan through LowerTE into UpdateMainWorkspaceSize so that we don't need to run GraphPlanMemory an extra time

Fix return in UpdateMainWorkspaceSize

Lint

Try to fix UpdateMainWorkspaceSize

Fix construction of static memory plan

Clean up code while debugging

Adding UpdateWorkspaceSize back

Add closure + call to UpdateFunctionMetadata (WIP)

UpdateFunctionMetadata builds; weird error with device ctx map though. Not sure if it came from this change or something else

Add some debugging of UpdateMainWorkspaceSize

Starting to move UpdateFunctionMetadata call to use process_fn infra

UWhat target should be passed to UpdateFunctionMetadata?

UpdateFunctionMetadata is not workinggg

Added some comments about UpdateFunctionMetadata for Jared

Fix the creation of function metadata

Try another stab at cleaning up the information

Fix

Port StorageInfo and StaticMemoryPlan data structure (apache#8297)

Restoring reshape opt

Fix tests

Caught a nasty typo from Lily, Map::Set does not mutate

Format

Disable stupid Google style warning

Rebase cleanup

Formatting

Add docstring for storage info

Black

Post rebase fix

Remove prints

Disable assert that doesn't make sense for now

Fix lint

Add copying attrs from relay node to graph node; still need to figure out how to do this in the case of global vars

Work with Lily to fix graph attrs

Try to figure out where extra arguments are coming from; fix merge

passes the profiling test

Clean up

Fix profile test

Remove debugging

Add attributes for BYOC uTVM case

Format

Dumb typo

Another fix for byoc

Format

Fix last 3 failing tests

Format

Fix final two test cases

Format

Fix lint

Fix again

Fix

Fix auto scheduler code

Fix issue

Address CR comment

Format

Co-authored-by: Jared Roesch <roeschinc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need update need update based on feedbacks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants