[CPU] Decompression support for hybrid models #20973

v-Golubev · 2023-11-08T19:29:19Z

This PR contains changes which add decompression support for the models which have both compressed and quantized layers.

Details:

Zero point with convert is supported in CPU specific transformations and optimizations
Decompression related passes in common transformations part moved to a separate manager
Added decompression related callbacks for Cleanup LPT to avoid decompression subgraph folding at the LPT stage

Tickets:

CVS-124535

src/plugins/intel_cpu/src/graph_optimizer.cpp

src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp

…t#20973)

### Details: - set LPT callbacks to handle compression and avoid constant folding for it (taken from #20973) - Allow u8/i8 output data type for compressed onednn FC - Disable Dequantize propagation through Transpose if it's a dependency of SDPA to keep Transpose+SDPA fusion

github-actions bot added category: CPU OpenVINO CPU plugin category: LP transformations OpenVINO Low Precision transformations labels Nov 8, 2023

v-Golubev changed the title ~~Vg/cpu/hybrid decompression quantization support~~ [CPU] Decompression support for hybrid models Nov 8, 2023

v-Golubev marked this pull request as ready for review November 9, 2023 10:09

v-Golubev requested review from a team as code owners November 9, 2023 10:09

v-Golubev force-pushed the vg/cpu/hybrid_decompression_quantization_support branch from 07d5d01 to ac7b93b Compare November 9, 2023 10:09

dmitry-gorokhov added this to the 2023.3 milestone Nov 9, 2023

v-Golubev force-pushed the vg/cpu/hybrid_decompression_quantization_support branch 2 times, most recently from ee62151 to c1b8244 Compare November 9, 2023 20:24

dmitry-gorokhov reviewed Nov 10, 2023

View reviewed changes

src/plugins/intel_cpu/src/graph_optimizer.cpp Outdated Show resolved Hide resolved

src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp Outdated Show resolved Hide resolved

src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp Show resolved Hide resolved

v-Golubev requested a review from dmitry-gorokhov November 10, 2023 09:11

v-Golubev assigned dmitry-gorokhov Nov 10, 2023

v-Golubev force-pushed the vg/cpu/hybrid_decompression_quantization_support branch from b891c27 to 9642a35 Compare November 13, 2023 16:05

v-Golubev added 6 commits November 13, 2023 17:06

MoveFCReshapeToWeights: zero point with convert support

dc6412a

FuseFCAndWeightsDecompression: zero point with convert support

2959a76

Decompresssion related passes are moved to a separate manager

9d04284

MatMulCompressedWeights tests: added FQ postops

1df5a43

Added decompression related callbacks for Cleanup LPT

0409f64

Review comments applied

9642a35

dmitry-gorokhov approved these changes Nov 14, 2023

View reviewed changes

dmitry-gorokhov merged commit b43b9f9 into openvinotoolkit:master Nov 14, 2023
66 checks passed

byungilm pushed a commit to byungilm/openvino that referenced this pull request Nov 17, 2023

[CPU] Weights decompression support for hybrid models (openvinotoolki…

c9fd6af

…t#20973)

allnes pushed a commit to allnes/openvino that referenced this pull request Nov 23, 2023

[CPU] Weights decompression support for hybrid models (openvinotoolki…

4bfb309

…t#20973)

vladimir-paramuzov mentioned this pull request Oct 18, 2024

[GPU] Fixes for hybrid quantization #27127

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Decompression support for hybrid models #20973

[CPU] Decompression support for hybrid models #20973

v-Golubev commented Nov 8, 2023 •

edited

Loading

[CPU] Decompression support for hybrid models #20973

[CPU] Decompression support for hybrid models #20973

Conversation

v-Golubev commented Nov 8, 2023 • edited Loading

Details:

Tickets:

v-Golubev commented Nov 8, 2023 •

edited

Loading