Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aqt peak memory performance worse than subclass apis #624

Open
HDCharles opened this issue Aug 7, 2024 · 0 comments
Open

aqt peak memory performance worse than subclass apis #624

HDCharles opened this issue Aug 7, 2024 · 0 comments

Comments

@HDCharles
Copy link
Contributor

see

#623

the new aqt apis have similar runtime perf but worse peak memory than the old apis for int8 dynamic and int4wo quant

probably this is due to needing to transpose the weight tensor for int8 dynamic, unsure for int4wo

yanbing-j pushed a commit to yanbing-j/ao that referenced this issue Dec 9, 2024
* remove debug print statements and run linter

* use unified quantizer architecture

* use unified quantizer architecture

* use unified quantizer architecture

* typos & lint

* typos & lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant