-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WC] Align compression subgraphs for both weight input data types #2537
[WC] Align compression subgraphs for both weight input data types #2537
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #2537 +/- ##
============================================
- Coverage 90.87% 77.93% -12.94%
============================================
Files 494 494
Lines 45612 45416 -196
============================================
- Hits 41449 35397 -6052
- Misses 4163 10019 +5856
... and 107 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing models with different weight/inference precision are necessary.
nncf/quantization/algorithms/weight_compression/openvino_backend.py
Outdated
Show resolved
Hide resolved
WC manual test fails until #2569 is not merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that you get the same graphs for PyTorch backend.
nncf/quantization/algorithms/weight_compression/openvino_backend.py
Outdated
Show resolved
Hide resolved
### Changes - Store compression scale if FP16 - Add type conversion to original data type after decompression Below are the compression subgraphs for the first conv2d in mobilenet_v2 after conversion to OV, this is similar to the table presented in #2537 . ![image](https://github.com/openvinotoolkit/nncf/assets/23343961/740953d6-2615-4c8f-bbd3-6cfae5585dfd) Compared to OV case, there is an additional Multiply node after the scale Multiply node. It seems to come from Batch Norm applied to the convolution. In case of PT weight compression it does not get merged into the weight as it does in OV case. ### Reason for changes Weight compression for PT backend fails when applied to model in half precision. The reason is that the scale is always in FP32, and hence decompression result is also in FP32, which conflicts with input type of FP16. ### Related tickets 134063 ### Tests Added test for half/full precision cases. Also added cases for different devices as it was thought that it may influence tracing in half precision.
post training weight compression test build 34 is green |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@alexsu52 @nikita-savelyevv Seems like model inference takes almost twice longer on the validation dataset. |
@ljaljushkin Thanks for highlighting this! The reason behind this is that during compression with group size, there is an additional Reshape node. In this PR, a Convert f16>f32 node is added after scale Multiply node. If Convert is added before Reshape node, then the performance drops. To fix this, I moved Convert node after Reshape node.
With this, performance is maintained after changes in the PR:
|
post_training_weight_compression test build 42 is green. Waiting for results of OV validation across different hardware. |
Changes
Precision configuration of input OV model for weight compression can be one of the following three:
compress_to_fp16=False
)compress_to_fp16=True
)This PR make compression subgraphs equal for all these three cases. Compression activations are always executed in FP16. So for the first case an additional f16 -> f32 Convert node is added.