Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Collage] CollagePartition pass #12086

Merged
merged 4 commits into from
Jul 14, 2022
Merged

Conversation

mbs-octoml
Copy link
Contributor

@mbs-octoml mbs-octoml commented Jul 13, 2022

See https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md.

This adds the main CollagePartition pass, which:

  1. Inspects all the targets in the CompilationConfig and builds
    PartitionSpecs describing how to generate speculative CandidatePartitions
    for them.
  2. Runs the above rules on the model to collect all the candidates.
  3. Eliminates candidates whose target contradicts any constraints already
    imposed by, eg, device planning.
  4. Eagerly estimates the cost of each candidate.
  5. Performs a shortest path search to chose an 'optimal' set of candidate
    partitions so as to minimize estimated model latency, such that every sub-expression
    node is contained in exactly one candidate partition.
  6. Coalesces adjacent optimal candidates which ended up on the same target.
  7. Rewrites the model according to the chosen optimal partitioning.

As for the existing partition_for_ methods, the result of
CollagePartition can then be built using regular TVM.

Very special thanks to @mbaret for authoring test_pass_collage_partition.py.

Logic to prune the candidates after step 3 will be in a follow up PR since it
deserves its own testing. A demonstration driver will also come as a follow up.

@mbs-octoml
Copy link
Contributor Author

@mbaret here's the big one!

See https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md.

This adds the main CollagePartition pass, which:
 1. Inspects all the targets in the CompilationConfig and builds
    PartitionSpecs describing how to generate speculative CandidatePartitions
    for them.
 2. Runs the above rules on the model to collect all the candidates.
 3. Eliminates candidates whose target contradicts any constraints already
    imposed by, eg, device planning.
 4. Eagerly estimates the cost of each candidate.
 5. Performs a shortest path search to chose an 'optimal' set of candidate
    partitions so as to minimize estimated model latency, such that every sub-expression
    node is contained in exactly one candidate partition.
 6. Coalesces adjacent optimal candidates which ended up on the same target.
 7. Rewrites the model according to the chosen optimal partitioning.

As for the existing partition_for_<external codegen name> methods, the result of
CollagePartition can then be built using regular TVM.

Very special thanks to @mbaret for authoring test_pass_collage_partition.py.

Logic to prune the candidates after step 3 will be in a follow up PR since it
deserves its own testing. A demonstration driver will also come as a follow up.
@mbs-octoml
Copy link
Contributor Author

Rebased onto main.

Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mbaret mbaret merged commit 7661ba8 into apache:main Jul 14, 2022
@mbs-octoml mbs-octoml deleted the mbs-collage-partitioner branch July 14, 2022 21:56
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
* [Collage] CollagePartition pass

See https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md.

This adds the main CollagePartition pass, which:
 1. Inspects all the targets in the CompilationConfig and builds
    PartitionSpecs describing how to generate speculative CandidatePartitions
    for them.
 2. Runs the above rules on the model to collect all the candidates.
 3. Eliminates candidates whose target contradicts any constraints already
    imposed by, eg, device planning.
 4. Eagerly estimates the cost of each candidate.
 5. Performs a shortest path search to chose an 'optimal' set of candidate
    partitions so as to minimize estimated model latency, such that every sub-expression
    node is contained in exactly one candidate partition.
 6. Coalesces adjacent optimal candidates which ended up on the same target.
 7. Rewrites the model according to the chosen optimal partitioning.

As for the existing partition_for_<external codegen name> methods, the result of
CollagePartition can then be built using regular TVM.

Very special thanks to @mbaret for authoring test_pass_collage_partition.py.

Logic to prune the candidates after step 3 will be in a follow up PR since it
deserves its own testing. A demonstration driver will also come as a follow up.

* - lints

* - more lints

* - use the _ffi_api properly
mikeseven pushed a commit to mikeseven/tvm that referenced this pull request Sep 27, 2023
* [Collage] CollagePartition pass

See https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md.

This adds the main CollagePartition pass, which:
 1. Inspects all the targets in the CompilationConfig and builds
    PartitionSpecs describing how to generate speculative CandidatePartitions
    for them.
 2. Runs the above rules on the model to collect all the candidates.
 3. Eliminates candidates whose target contradicts any constraints already
    imposed by, eg, device planning.
 4. Eagerly estimates the cost of each candidate.
 5. Performs a shortest path search to chose an 'optimal' set of candidate
    partitions so as to minimize estimated model latency, such that every sub-expression
    node is contained in exactly one candidate partition.
 6. Coalesces adjacent optimal candidates which ended up on the same target.
 7. Rewrites the model according to the chosen optimal partitioning.

As for the existing partition_for_<external codegen name> methods, the result of
CollagePartition can then be built using regular TVM.

Very special thanks to @mbaret for authoring test_pass_collage_partition.py.

Logic to prune the candidates after step 3 will be in a follow up PR since it
deserves its own testing. A demonstration driver will also come as a follow up.

* - lints

* - more lints

* - use the _ffi_api properly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants