From ce447dcef4407532c4c78ae688152f383958fd0c Mon Sep 17 00:00:00 2001 From: Kimish Patel Date: Mon, 8 Apr 2024 16:43:35 -0700 Subject: [PATCH] Update iphone 15 pro benchmarking numbers (#2927) Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/2927 ATT Created from CodeHub with https://fburl.com/edit-in-codehub Reviewed By: mergennachin Differential Revision: D55895703 fbshipit-source-id: 5466b44224b8ebf7b88d846354683da0c1f6a801 --- examples/models/llama2/README.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/examples/models/llama2/README.md b/examples/models/llama2/README.md index 17abc9f5bc..af49994cfd 100644 --- a/examples/models/llama2/README.md +++ b/examples/models/llama2/README.md @@ -32,15 +32,19 @@ Note that groupsize less than 128 was not enabled, since such model were still t ## Performance -Performance was measured on Samsung Galaxy S22, S23, S24 and One Plus 12. Measurement performance is in terms of tokens/second. +Performance was measured on Samsung Galaxy S22, S24, One Plus 12 and iPhone 15 max Pro. Measurement performance is in terms of tokens/second. |Device | Groupwise 4-bit (128) | Groupwise 4-bit (256) |--------| ---------------------- | --------------- -|Galaxy S22 | 8.15 tokens/second | 8.3 tokens/second | -|Galaxy S24 | 10.66 tokens/second | 11.26 tokens/second | -|One plus 12 | 11.55 tokens/second | 11.6 tokens/second | -|iPhone 15 pro | x | x | +|Galaxy S22* | 8.15 tokens/second | 8.3 tokens/second | +|Galaxy S24* | 10.66 tokens/second | 11.26 tokens/second | +|One plus 12* | 11.55 tokens/second | 11.6 tokens/second | +|Galaxy S22** | 5.5 tokens/second | 5.9 tokens/second | +|iPhone 15 pro** | ~6 tokens/second | ~6 tokens/second | +*: Measured via adb binary based [workflow](#step-5-run-benchmark-on) + +**: Measured via app based [workflow](#step-6-build-mobile-apps) # Instructions @@ -241,7 +245,6 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de - Enabling LLama2 7b and other architectures via Vulkan - Enabling performant execution of widely used quantization schemes. -TODO # Notes This example tries to reuse the Python code, with minimal modifications to make it compatible with current ExecuTorch: