-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Flux Transformer RoPE use a custom IREE kernel #871
base: main
Are you sure you want to change the base?
Conversation
To match the required dimension order, axis permutation is performed. This change introduces a significant (10X) deterioration in performance. |
4ff5c3e
to
3b751f4
Compare
The large performance problem has been addressed by iree-org/iree#19822. |
|
||
|
||
def compute_rotary_embedding_table( | ||
positions: torch.Tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: would it make more sense to just rename _compute_rotary_embedding_table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to use this function outside of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i mean, instead of:
- copying compute_rotary_embedding_table so we can use it outside the class
- make old _compute_rotary_embedding_table redirect to compute_rotary_embedding_table
Just
- rename _compute_rotary_embedding_table to compute_rotary_embedding_table and use it outside the class
- change all referenes to _compute_rotary_embedding_table to use compute_rotary_embedding_table instead
The latter requires an IDE and is slightly more work, but does not leave a stub/redirect function behind.
We assume that the custom kernel would yield better performance instead of using PyTorch ops.
3b751f4
to
48e1b04
Compare
This PR is waiting on iree-org/iree#19829. |
We assume that the custom kernel would yield better performance instead of using PyTorch ops.