-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LLaMA 2 example for DirectML #701
Conversation
@@ -759,6 +759,43 @@ def to_json(self, check_object: bool = False): | |||
return serialize_to_json(config, check_object) | |||
|
|||
|
|||
class CompositePyTorchModel(PyTorchModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between this one and OptimumModel? could we leverage OptimumModel or unify them? It seems the two class is similiar.
…avignol/directml-llama-sample-2
For reference: Also, with nightly ort/optimum, Olive optimize llama2 as following: |
I decoupled OptimumModel and CompositePyTorch model. They look similar at first glance, but they don't have much in common aside from the |
That's what I thought too when looking at the code. |
This adds the LLaMA 2 optimizations for DirectML with examples, and a sample ChatApp that was inspired from this but stripped down to an MVP and customized for DirectML.
It also adds a "CompositePyTorchModel" which follows the same principle as the Composite Optimum model but for raw pytorch models instead.