What's Changed
⚡ Offload tokenizer fixes to Toke(n)icer pkg.
⚡ Optimized lm_head
quant time and vram usage.
⚡ Optimized DeekSeek v3/R1
model quant vram usage.
⚡ 3x speed-up for Torch kernel when using Pytorch >= 2.5.0 with model.compile().
⚡ New calibration_dataset_concat_size
option to enable calibration data concat mode to mimic original GPTQ data packing strategy which may improve quant speed and accuracy for datasets like wikitext2.
🐛 Fixed Optimum compat and XPU
/IPEX
auto kernel selection regresion in v1.8.1
- Fix init arg order and
optimum
compat by @CSY-ModelCloud in #1240 - [FIX][Optimize] lm_head quantize by @ZX-ModelCloud in #1239
- [Model] [DeepSpeek] un-merge
gate_proj
andup_proj
by @LRL-ModelCloud in #1241 - Use Toke(n)icer by @CL-ModelCloud in #1242
#1244 - Add Tokenicer Test by @CL-ModelCloud in #1245
- prepare for 1.8.2 release by @Qubitium in #1243
- simplify calls to tokenicer by @CL-ModelCloud in #1246
- Update requirements.txt by @Qubitium in #1248
- fix trust_remote was lost by @CSY-ModelCloud in #1249
- fix trust_remote was lost by @CSY-ModelCloud in #1250
- prepare for 1.8.5 release by @Qubitium in #1251
- fix unit tests & tweak logic for selecting backends by @CSY-ModelCloud in #1253
- install tokenicer form git & do ruff by @CSY-ModelCloud in #1254
- fix k,v is not a dict by @CSY-ModelCloud in #1255
- fix not enough values to unpack (expected 2, got 1) by @CSY-ModelCloud in #1256
- fix sglang test requires numpy<2.0 by @CSY-ModelCloud in #1258
- fix ipex backend by @jiqing-feng in #1259
- ipex should be packable, reverted pr #1259 importer.py changes by @CSY-ModelCloud in #1260
- remove sentencepiece by @CSY-ModelCloud in #1261
- speed up torch dequantize by @Qubitium in #1262
- Add
calibration_dataset_concat_size
option/mode by @LRL-ModelCloud in #1257 - add transformers test by @CSY-ModelCloud in #1264
- Add kernel torch.compile hook by @Qubitium in #1265
- [FIX]fix vl model prepare_dataset by @LRL-ModelCloud in #1266
Full Changelog: v1.8.1...v1.9.0