aqt peak memory performance worse than subclass apis #624

HDCharles · 2024-08-07T00:35:16Z

see

the new aqt apis have similar runtime perf but worse peak memory than the old apis for int8 dynamic and int4wo quant

probably this is due to needing to transpose the weight tensor for int8 dynamic, unsure for int4wo

* remove debug print statements and run linter * use unified quantizer architecture * use unified quantizer architecture * use unified quantizer architecture * typos & lint * typos & lint

gau-nernst mentioned this issue Aug 14, 2024

Add experimental INT8 quantized training #644

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aqt peak memory performance worse than subclass apis #624

aqt peak memory performance worse than subclass apis #624

HDCharles commented Aug 7, 2024

aqt peak memory performance worse than subclass apis #624

aqt peak memory performance worse than subclass apis #624

Comments

HDCharles commented Aug 7, 2024