Flexible Pack DType #1158

Qubitium · 2025-01-24T07:55:34Z

Allow quantized weights to be packed in:

Some kernels/gpus may benefit to the different alignments for concurrent dequant

Completed Kernels:

Qubitium and others added 14 commits January 24, 2025 15:55

init flex dtype

dc28d12

bits

6eca433

fix tensors_per_storage_dtype

406653e

add storage_np_dtype

c29dcbb

wrong np dtype

c1b5acb

dtype err

ebfe746

cleanup

44d239c

add int64

9ca4124

simplify

9a39ff8

simplify

fdda28e

move control to cfg.pack_dtype

53dd6a2

disable int64 (ppl failed)

b30f7c6

add flex pack dtype to triton

9ea27f9

wrong qzero buffer dtype

e820660

Qubitium marked this pull request as ready for review January 26, 2025 02:46

Qubitium changed the title ~~Flex Storage DType~~ Flexible Pack DType Jan 26, 2025

marlin add prelim int32 refractor

1a6e9aa

Qubitium merged commit d6b9d9d into main Jan 26, 2025
2 of 4 checks passed

Qubitium deleted the flex-storage-dtype branch January 26, 2025 07:38

Provide feedback