You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ResNet models perform image classification - they take images as input and classify the major object in the image into a
set of pre-defined classes. ResNet models provide very high accuracies with affordable model sizes. They are ideal for cases when high accuracy of classification is required.
ResNet models consist of residual blocks and came up to counter the effect of deteriorating accuracies with more layers due to network not learning the initial layers.
ResNet v1 uses post-activation for the residual blocks. The models below have 8 and 32 layers with ResNet v1 architecture.
(source: https://keras.io/api/applications/resnet/)
The model is quantized in int8 using tensorflow lite converter.
In addition, we introduce a new model family inspired from ResNet v1 which takes benefit from hybrid quantization.
Later on, they are named as ST ResNet 8 Hybrid v1 and ST ResNet 8 Hybrid v2.
By hybrid quantization, we mean that whenever it is possible, some network layers are quantized for weights and/or activations on less than 8 bits.
We used Larq library to define and train these models. In particular, in our topology some layers/activations are kept in 8 bits while others are in binary.
Please note that since this quantization is performed during training (Quantization Aware Training), these networks no longer need to be converted with tensorflow lite.
STM32Cube.AI is able to import them directly in .h5 format and to generate the corresponding optimized FW code.
Even if many layers are in binary, these models provide comparable accuracy to the full 8-bit ResNet v1 8 but have a significantly lower inference time.
The models are quantized using tensorflow lite converter.
Network inputs / outputs
For an image resolution of NxM and P classes
Input Shape
Description
(1, N, M, 3)
Single NxM RGB image with UINT8 values between 0 and 255
Output Shape
Description
(1, P)
Per-class confidence for P classes in FLOAT32
Recommended Platforms
Platform
Supported
Optimized
STM32L0
[]
[]
STM32L4
[x]
[]
STM32U5
[x]
[]
STM32H7
[x]
[x]
STM32MP1
[x]
[x]*
STM32MP2
[x]
[]
STM32N6
[x]
[]
Only for Cifar 100 models
Performances
Metrics
Measures are done with default STM32Cube.AI configuration with enabled input / output allocated option.
tfs stands for "training from scratch", meaning that the model weights were randomly initialized before training.
tl stands for "transfer learning", meaning that the model backbone weights were initialized from a pre-trained model, then only the last layer was unfrozen during the training.
fft stands for "full fine-tuning", meaning that the full model weights were initialized from a transfer learning pre-trained model, and all the layers were unfrozen during the training.
Reference MCU memory footprint based on Cifar 10 dataset (see Accuracy for details on dataset)
[2]
J, ARUN PANDIAN; GOPAL, GEETHARAMANI (2019), "Data for: Identification of Plant Leaf Diseases Using a 9-layer Deep Convolutional Neural Network", Mendeley Data, V1, doi: 10.17632/tywbtsjrjv.1
[3]
L. Bossard, M. Guillaumin, and L. Van Gool, "Food-101 -- Mining Discriminative Components with Random Forests." European Conference on Computer Vision, 2014.