[WC] Align compression subgraphs for both weight input data types #2537

nikita-savelyevv · 2024-02-29T14:35:31Z

Changes

Precision configuration of input OV model for weight compression can be one of the following three:

Weights and activations are in FP32 precision (model is saved with compress_to_fp16=False)
Weights are in FP16 and activations are in FP32 (model is saved with compress_to_fp16=True)
Weight and activations are in FP16 (e.g., PT model is first halfed and then converted to OV)

This PR make compression subgraphs equal for all these three cases. Compression activations are always executed in FP16. So for the first case an additional f16 -> f32 Convert node is added.

codecov · 2024-02-29T14:37:50Z

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 77.93%. Comparing base (573b0c3) to head (4ce510a).
Report is 1 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff              @@
##           develop    #2537       +/-   ##
============================================
- Coverage    90.87%   77.93%   -12.94%     
============================================
  Files          494      494               
  Lines        45612    45416      -196     
============================================
- Hits         41449    35397     -6052     
- Misses        4163    10019     +5856

Files	Coverage Δ
.../algorithms/weight_compression/openvino_backend.py	`0.00% <0.00%> (-98.34%)`	⬇️

... and 107 files with indirect coverage changes

Flag	Coverage Δ
COMMON	`?`
ONNX	`?`
OPENVINO	`?`
TENSORFLOW	`30.10% <0.00%> (ø)`
TORCH	`65.96% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
common	`88.28% <ø> (-5.47%)`	⬇️
torch	`93.49% <ø> (-0.01%)`	⬇️
tensorflow	`93.74% <ø> (+1.00%)`	⬆️
onnx	`0.00% <ø> (-93.09%)`	⬇️
openvino	`25.70% <0.00%> (-68.47%)`	⬇️
ptq	`53.06% <0.00%> (-37.03%)`	⬇️

alexsu52

Testing models with different weight/inference precision are necessary.

nncf/quantization/algorithms/weight_compression/openvino_backend.py

nikita-savelyevv · 2024-03-19T15:47:55Z

WC manual test fails until #2569 is not merged.

alexsu52

Please check that you get the same graphs for PyTorch backend.

nncf/quantization/algorithms/weight_compression/openvino_backend.py

### Changes - Store compression scale if FP16 - Add type conversion to original data type after decompression Below are the compression subgraphs for the first conv2d in mobilenet_v2 after conversion to OV, this is similar to the table presented in #2537 . ![image](https://github.com/openvinotoolkit/nncf/assets/23343961/740953d6-2615-4c8f-bbd3-6cfae5585dfd) Compared to OV case, there is an additional Multiply node after the scale Multiply node. It seems to come from Batch Norm applied to the convolution. In case of PT weight compression it does not get merged into the weight as it does in OV case. ### Reason for changes Weight compression for PT backend fails when applied to model in half precision. The reason is that the scale is always in FP32, and hence decompression result is also in FP32, which conflicts with input type of FP16. ### Related tickets 134063 ### Tests Added test for half/full precision cases. Also added cases for different devices as it was thought that it may influence tracing in half precision.

nikita-savelyevv · 2024-03-27T10:45:21Z

post training weight compression test build 34 is green

alexsu52

LGTM

ljaljushkin · 2024-03-27T20:17:25Z

@alexsu52 @nikita-savelyevv
I've measured time for compression and total time for different weight compression cases:
develop
q
current PR

Seems like model inference takes almost twice longer on the validation dataset.
Does it mean that compressed model should be saved differently in tests and on customer side?
https://github.com/openvinotoolkit/nncf/blob/develop/tests/post_training/pipelines/lm_weight_compression.py#L174

nikita-savelyevv · 2024-04-04T19:16:16Z

@ljaljushkin Thanks for highlighting this!

The reason behind this is that during compression with group size, there is an additional Reshape node. In this PR, a Convert f16>f32 node is added after scale Multiply node. If Convert is added before Reshape node, then the performance drops. To fix this, I moved Convert node after Reshape node.

Before	After

With this, performance is maintained after changes in the PR:

Test case	Total time develop branch	Total time PR branch
tinyllama_data_free	04:18	04:21
tinyllama_data_aware	04:06	04:07
tinyllama_data_aware_awq	03:33	03:39
tinyllama_data_aware_awq_stateful	03:03	03:03

nikita-savelyevv · 2024-04-05T08:43:06Z

post_training_weight_compression test build 42 is green. Waiting for results of OV validation across different hardware.

nikita-savelyevv requested a review from a team as a code owner February 29, 2024 14:35

github-actions bot added NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Feb 29, 2024

Align compression subgraphs for both weight input data types

48682f1

Remove conditions added in the previous PR

4a78bf9

nikita-savelyevv changed the title ~~Align compression subgraphs for both weight input data types~~ [WC] Align compression subgraphs for both weight input data types Feb 29, 2024

nikita-savelyevv added 3 commits February 29, 2024 17:08

Transition to FP32 case subgraph

22d059b

Black

22f59fc

Tweak comment

16a7fe0

nikita-savelyevv requested review from alexsu52 and ljaljushkin March 1, 2024 07:25

alexsu52 requested changes Mar 1, 2024

View reviewed changes

nncf/quantization/algorithms/weight_compression/openvino_backend.py Outdated Show resolved Hide resolved

Rename test

b4c78e9

nikita-savelyevv marked this pull request as draft March 1, 2024 10:23

nikita-savelyevv added 3 commits March 19, 2024 14:35

Consider the case of f16 activations

1f1d882

Black

805f992

Tweak comment

633b443

nikita-savelyevv marked this pull request as ready for review March 19, 2024 14:11

nikita-savelyevv added 2 commits March 20, 2024 10:39

Reverted back to FP16 case

ba3f3b2

Black

b162d73

alexsu52 requested changes Mar 22, 2024

View reviewed changes

nncf/quantization/algorithms/weight_compression/openvino_backend.py Outdated Show resolved Hide resolved

Address suggested changes

ec887be

nikita-savelyevv mentioned this pull request Mar 22, 2024

[WC, PT] Store compression scale in f16 #2596

Merged

alexsu52 approved these changes Mar 27, 2024

View reviewed changes

Merge branch 'develop' into align-compression-subgraphs

e3c16f1

Move Convert after Reshape in case of grouped compression

7855ff6

ljaljushkin approved these changes Apr 5, 2024

View reviewed changes

nikita-savelyevv added question Further information is requested do not merge Should not be merged yet and removed question Further information is requested labels Apr 5, 2024

alexsu52 assigned ljaljushkin Apr 15, 2024

Merge branch 'develop' into align-compression-subgraphs

4ce510a

alexsu52 merged commit df81f44 into openvinotoolkit:develop Apr 19, 2024
11 checks passed

ljaljushkin mentioned this pull request May 21, 2024

Represent symmetrically quantized weights in signed data type #2434

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WC] Align compression subgraphs for both weight input data types #2537

[WC] Align compression subgraphs for both weight input data types #2537

nikita-savelyevv commented Feb 29, 2024 •

edited

Loading

codecov bot commented Feb 29, 2024 •

edited

Loading

alexsu52 left a comment •

edited

Loading

nikita-savelyevv commented Mar 19, 2024

alexsu52 left a comment

nikita-savelyevv commented Mar 27, 2024

alexsu52 left a comment

ljaljushkin commented Mar 27, 2024 •

edited

Loading

nikita-savelyevv commented Apr 4, 2024

nikita-savelyevv commented Apr 5, 2024

[WC] Align compression subgraphs for both weight input data types #2537

[WC] Align compression subgraphs for both weight input data types #2537

Conversation

nikita-savelyevv commented Feb 29, 2024 • edited Loading

Changes

codecov bot commented Feb 29, 2024 • edited Loading

Codecov Report

alexsu52 left a comment • edited Loading

Choose a reason for hiding this comment

nikita-savelyevv commented Mar 19, 2024

alexsu52 left a comment

Choose a reason for hiding this comment

nikita-savelyevv commented Mar 27, 2024

alexsu52 left a comment

Choose a reason for hiding this comment

ljaljushkin commented Mar 27, 2024 • edited Loading

nikita-savelyevv commented Apr 4, 2024

nikita-savelyevv commented Apr 5, 2024

nikita-savelyevv commented Feb 29, 2024 •

edited

Loading

codecov bot commented Feb 29, 2024 •

edited

Loading

alexsu52 left a comment •

edited

Loading

ljaljushkin commented Mar 27, 2024 •

edited

Loading