-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fuse conv and batch_norm #3769
fuse conv and batch_norm #3769
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3769
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f84a2f5 with merge base 56a6855 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D57895439 |
Summary: When `batchnorm` is applied after `conv` in a model, we can fuse the weight and bias of `batchnorm` into `conv` and thereafter remove the `batchnorm` node. We implement this fusion through graph transforms and apply it in `vulkan_preprocess.py`. This change can reduce both the latency and memory. We illustrate the performance improvement with Mobilenet_v2. - The model has 52 conv+batch_norm instances. After fusing, when we export the model as in D57475757, `_native_batch_norm_legit_no_training` doesn't show up anymore. - The performance has been improved as below. In particular, inference latency has been reduced from 161 ms to 148 ms. | fuse | Loading(ms) | vmRss(KB) | vmaBlock(KB) | Inference(ms) | vmRss(KB) | vmaBlock(KB) | | -------- | ------- | ------- | ------- | ------- | | Yes | 380 | 22928 | 65536 | 148 | 24296 | 65536 | | No | 473 | 26036 | 65536 | 161 | 27416 | 65536 | Differential Revision: D57895439
e9ad21d
to
f84a2f5
Compare
This pull request was exported from Phabricator. Differential Revision: D57895439 |
This pull request has been merged in 4f2f7e0. |
Summary: We implemented [operators fusion](pytorch#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's. As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M. Differential Revision: D60257047
Summary: Pull Request resolved: pytorch#4427 We implemented [operators fusion](pytorch#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's. As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M. Reviewed By: SS-JIA Differential Revision: D60257047
Summary: Pull Request resolved: pytorch#4427 We implemented [operators fusion](pytorch#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's. As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M. Reviewed By: SS-JIA Differential Revision: D60257047
Summary: Pull Request resolved: pytorch#4427 We implemented [operators fusion](pytorch#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's. As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M. Reviewed By: SS-JIA Differential Revision: D60257047
Summary: Pull Request resolved: pytorch#4427 We implemented [operators fusion](pytorch#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's. As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M. Reviewed By: SS-JIA Differential Revision: D60257047
Summary: Pull Request resolved: #4427 We implemented [operators fusion](#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's. As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M. Reviewed By: SS-JIA Differential Revision: D60257047 fbshipit-source-id: ca9e0f38d53187edff9dba45fdeffa619fde51a7
Summary:
batch_norm
anymore| fuse | Loading(ms) | vmRss(KB) | vmaBlock(KB) | Inference(ms) | vmRss(KB) | vmaBlock(KB) |
| -------- | ------- | ------- | ------- | ------- |
| Yes | 380 | 22928 | 65536 | 148 | 24296 | 65536 |
| No | 473 | 26036 | 65536 | 161 | 27416 | 65536 |
Differential Revision: D57895439