Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuse conv and batch_norm #3769

Closed
wants to merge 1 commit into from

Conversation

copyrightly
Copy link
Contributor

Summary:

  • note that in the printed ops, there isn't batch_norm anymore
  • 52 conv+batch_norm instances have been fused

| fuse | Loading(ms) | vmRss(KB) | vmaBlock(KB) | Inference(ms) | vmRss(KB) | vmaBlock(KB) |
| -------- | ------- | ------- | ------- | ------- |
| Yes | 380 | 22928 | 65536 | 148 | 24296 | 65536 |
| No | 473 | 26036 | 65536 | 161 | 27416 | 65536 |

Differential Revision: D57895439

Copy link

pytorch-bot bot commented May 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3769

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f84a2f5 with merge base 56a6855 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 29, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57895439

Summary:

When `batchnorm` is applied after `conv` in a model, we can fuse the weight and bias of `batchnorm` into `conv` and thereafter remove the `batchnorm` node. We implement this fusion through graph transforms and apply it in `vulkan_preprocess.py`.

This change can reduce both the latency and memory. We illustrate the performance improvement with Mobilenet_v2. 

- The model has 52 conv+batch_norm instances. After fusing, when we export the model as in D57475757, `_native_batch_norm_legit_no_training` doesn't show up anymore.
- The performance has been improved as below. In particular, inference latency has been reduced from 161 ms to 148 ms.

| fuse  | Loading(ms) | vmRss(KB) | vmaBlock(KB) | Inference(ms) | vmRss(KB) | vmaBlock(KB) |
| -------- | ------- | ------- | ------- | ------- |
| Yes  | 380 | 22928 | 65536 | 148 | 24296 | 65536 |
| No  | 473 | 26036 | 65536 | 161 | 27416 | 65536 |

Differential Revision: D57895439
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57895439

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 4f2f7e0.

copyrightly added a commit to copyrightly/executorch that referenced this pull request Jul 25, 2024
Summary:
We implemented [operators fusion](pytorch#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's.

As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M.

Differential Revision: D60257047
copyrightly added a commit to copyrightly/executorch that referenced this pull request Jul 25, 2024
Summary:
Pull Request resolved: pytorch#4427

We implemented [operators fusion](pytorch#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's.

As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M.

Reviewed By: SS-JIA

Differential Revision: D60257047
copyrightly added a commit to copyrightly/executorch that referenced this pull request Jul 25, 2024
Summary:
Pull Request resolved: pytorch#4427

We implemented [operators fusion](pytorch#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's.

As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M.

Reviewed By: SS-JIA

Differential Revision: D60257047
copyrightly added a commit to copyrightly/executorch that referenced this pull request Jul 26, 2024
Summary:
Pull Request resolved: pytorch#4427

We implemented [operators fusion](pytorch#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's.

As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M.

Reviewed By: SS-JIA

Differential Revision: D60257047
copyrightly added a commit to copyrightly/executorch that referenced this pull request Jul 26, 2024
Summary:
Pull Request resolved: pytorch#4427

We implemented [operators fusion](pytorch#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's.

As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M.

Reviewed By: SS-JIA

Differential Revision: D60257047
facebook-github-bot pushed a commit that referenced this pull request Jul 26, 2024
Summary:
Pull Request resolved: #4427

We implemented [operators fusion](#3769) (`conv+bn`) which fused `conv` and `bn`'s weights and biases, but the old parameters are not deleted. Hence we saw that VK model's size is nearly twice large as CPU's.

As regards mobilenet_v2, before this diff CPU vs VK: 14M vs 22M. After this diff, both of them have 14M.

Reviewed By: SS-JIA

Differential Revision: D60257047

fbshipit-source-id: ca9e0f38d53187edff9dba45fdeffa619fde51a7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants