Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WC] Align compression subgraphs for both weight input data types #2537

Merged

Conversation

nikita-savelyevv
Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv commented Feb 29, 2024

Changes

Precision configuration of input OV model for weight compression can be one of the following three:

  1. Weights and activations are in FP32 precision (model is saved with compress_to_fp16=False)
  2. Weights are in FP16 and activations are in FP32 (model is saved with compress_to_fp16=True)
  3. Weight and activations are in FP16 (e.g., PT model is first halfed and then converted to OV)

This PR make compression subgraphs equal for all these three cases. Compression activations are always executed in FP16. So for the first case an additional f16 -> f32 Convert node is added.

image

@nikita-savelyevv nikita-savelyevv requested a review from a team as a code owner February 29, 2024 14:35
@github-actions github-actions bot added NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Feb 29, 2024
Copy link

codecov bot commented Feb 29, 2024

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 77.93%. Comparing base (573b0c3) to head (4ce510a).
Report is 1 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           develop    #2537       +/-   ##
============================================
- Coverage    90.87%   77.93%   -12.94%     
============================================
  Files          494      494               
  Lines        45612    45416      -196     
============================================
- Hits         41449    35397     -6052     
- Misses        4163    10019     +5856     
Files Coverage Δ
.../algorithms/weight_compression/openvino_backend.py 0.00% <0.00%> (-98.34%) ⬇️

... and 107 files with indirect coverage changes

Flag Coverage Δ
COMMON ?
ONNX ?
OPENVINO ?
TENSORFLOW 30.10% <0.00%> (ø)
TORCH 65.96% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 88.28% <ø> (-5.47%) ⬇️
torch 93.49% <ø> (-0.01%) ⬇️
tensorflow 93.74% <ø> (+1.00%) ⬆️
onnx 0.00% <ø> (-93.09%) ⬇️
openvino 25.70% <0.00%> (-68.47%) ⬇️
ptq 53.06% <0.00%> (-37.03%) ⬇️

@nikita-savelyevv nikita-savelyevv changed the title Align compression subgraphs for both weight input data types [WC] Align compression subgraphs for both weight input data types Feb 29, 2024
Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing models with different weight/inference precision are necessary.

@nikita-savelyevv nikita-savelyevv marked this pull request as draft March 1, 2024 10:23
@nikita-savelyevv nikita-savelyevv marked this pull request as ready for review March 19, 2024 14:11
@nikita-savelyevv
Copy link
Collaborator Author

WC manual test fails until #2569 is not merged.

Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that you get the same graphs for PyTorch backend.

alexsu52 pushed a commit that referenced this pull request Mar 26, 2024
### Changes

- Store compression scale if FP16
- Add type conversion to original data type after decompression

Below are the compression subgraphs for the first conv2d in mobilenet_v2
after conversion to OV, this is similar to the table presented in #2537
.

![image](https://github.com/openvinotoolkit/nncf/assets/23343961/740953d6-2615-4c8f-bbd3-6cfae5585dfd)
Compared to OV case, there is an additional Multiply node after the
scale Multiply node. It seems to come from Batch Norm applied to the
convolution. In case of PT weight compression it does not get merged
into the weight as it does in OV case.


### Reason for changes

Weight compression for PT backend fails when applied to model in half
precision. The reason is that the scale is always in FP32, and hence
decompression result is also in FP32, which conflicts with input type of
FP16.

### Related tickets

134063

### Tests

Added test for half/full precision cases. Also added cases for different
devices as it was thought that it may influence tracing in half
precision.
@nikita-savelyevv
Copy link
Collaborator Author

post training weight compression test build 34 is green

Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ljaljushkin
Copy link
Contributor

ljaljushkin commented Mar 27, 2024

@alexsu52 @nikita-savelyevv
I've measured time for compression and total time for different weight compression cases:
develop
imageq
current PR
image

Seems like model inference takes almost twice longer on the validation dataset.
Does it mean that compressed model should be saved differently in tests and on customer side?
https://github.com/openvinotoolkit/nncf/blob/develop/tests/post_training/pipelines/lm_weight_compression.py#L174

@nikita-savelyevv
Copy link
Collaborator Author

@ljaljushkin Thanks for highlighting this!

The reason behind this is that during compression with group size, there is an additional Reshape node. In this PR, a Convert f16>f32 node is added after scale Multiply node. If Convert is added before Reshape node, then the performance drops. To fix this, I moved Convert node after Reshape node.

Before After
image image

With this, performance is maintained after changes in the PR:

Test case Total time
develop branch
Total time
PR branch
tinyllama_data_free 04:18 04:21
tinyllama_data_aware 04:06 04:07
tinyllama_data_aware_awq 03:33 03:39
tinyllama_data_aware_awq_stateful 03:03 03:03

@nikita-savelyevv nikita-savelyevv added question Further information is requested do not merge Should not be merged yet and removed question Further information is requested labels Apr 5, 2024
@nikita-savelyevv
Copy link
Collaborator Author

post_training_weight_compression test build 42 is green. Waiting for results of OV validation across different hardware.

@alexsu52 alexsu52 merged commit df81f44 into openvinotoolkit:develop Apr 19, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge Should not be merged yet NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants