-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoTVM optimization? #2244
Comments
We are already applying auto-tune implicitly in many cases. The current mechanism dlight is already somewhat auto-tuned then coded into the rule, but indeed can be further tweaked, see examples like apache/tvm#16932 Our general philosophy here is that we would like to decouple auto-tune from build, so it is still possible to autotune, then the results are then applied with the configuration found, right now some of these configs are directly coded in template. Starting from the dlight space is likely better for LLM specific use-cases |
I'm sorry I don't know what dlight is - I know I tried autotvm with a vision model and the iterative search for optimization took very long so I'm assuming a full on optimization for an LLM with iterative search could take hours to get to the most performant code? In contrast MLC compilation in my orange pi takes seconds even when maxing out optimizations. Is there a way to optimize an MLC model like we do in relay? Please point me in the right direction, thanks! |
Relax is actually a better iteration from relay that address some of the long time compilation issues. The prebuilt are already better space so it is better optimized. You can think about relax as something with abetter starting pt than AutoTM. dlight code is here |
I recently gone through this tutorial: https://tvm.apache.org/docs/tutorial/autotvm_relay_x86.html
Model execution performance on Orange Pi Mali improved quite a lot during the optimization process; crucially, the optimization is not a fixed set of optimizations but rather an iterative search that improves model inference performance on your specific hardware.
In contrast, it looks like the MLC compilation using Relax, even when using the maximum optimization settings, involves a set of fixed optimizations and that there is no equivalent iterative search.
I wonder if an iterative search the likes of AutoTVM could make remarkable improvement on inference speeds for LLMs on MLC for certain hardware.
Thoughts?
The text was updated successfully, but these errors were encountered: