Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change docs for action recognition #1940

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -27,21 +27,23 @@ Refer to our tutorial for more information on how to train, validate, and optimi
Models
******

We support `X3D <https://arxiv.org/abs/2004.04730>`_ for action classification. X3D is a deep learning model that was proposed in the paper "X3D: Expanding Architectures for Efficient Video Recognition" by Christoph Feichtenhofer. The model is an extension of the popular 2D convolutional neural network (CNN) architectures to the 3D domain, allowing it to efficiently process spatiotemporal information in videos.
Currently OpenVINO™ Training Extensions supports `X3D <https://arxiv.org/abs/2004.04730>`_ and `MoViNet <https://arxiv.org/pdf/2103.11511.pdf>`_ for action classification.

Currenly OpenVINO™ Training Extensions supports X3D-S model with below template:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
| Template ID | Name | Complexity (GFLOPs) | Model size (MB) |
+========================================================================================================================================================================================+=========+=====================+=========================+
| `Custom_Action_Classification_X3D <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/action/configs/classification/x3d/template.yaml>`_ | X3D | 2.49 | 3.79 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
| `Custom_Action_Classificaiton_MoViNet <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/action/configs/classification/movinet/template.yaml>`_ | MoViNet | 2.71 | 3.10 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
| Template ID | Name | Complexity (GFLOPs) | Model size (MB) |
+===============================================================================================================================================================================+=========+=====================+=========================+
| `Custom_Action_Classification_X3D <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/action/configs/classification/x3d/template.yaml>`_ | X3D | 2.49 | 3.79 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+


In the table below the **top-1 accuracy** on some academic datasets are presented. Each model is trained with single Nvidia GeForce RTX3090.
In the table below the **top-1 accuracy** on some academic datasets are presented. Each model is trained with single NVIDIA GeForce RTX 3090.

+-----------------------+------------+-----------------+
| Model name | HMDB51 | UCF101 |
+=======================+============+=================+
| X3D | 67.19 | 87.89 |
+-----------------------+------------+-----------------+
| MoViNet | 62.74 | 81.32 |
+-----------------------+------------+-----------------+
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,6 @@ According to the `documentation <https://mmaction2.readthedocs.io/en/latest/supp
│ │ │ │ ├── 20060723sfjffbartsinger_wave_f_cm_np1_ba_med_0
│ │ │ │ ├── ...
│ │ │ │ ├── winKen_wave_u_cm_np1_ri_bad_1
|

Once you have the dataset structured properly, copy ``mmaction2/data`` folder, which contains hmdb51 dataset, to ``training_extensions/data``.
Then, you can now convert it to the `CVAT <https://www.cvat.ai/>`_ format using the following command:
Expand Down Expand Up @@ -128,17 +127,18 @@ To see the list of supported templates, run the following command:

.. note::

OpenVINO™ Training Extensions is supporting only X3D model template now, other architecture will be supported in near future.
OpenVINO™ Training Extensions supports X3D and MoViNet template now, other architecture will be supported in future.

.. code-block::

(otx) ...$ otx find --task action_classification

+-----------------------+----------------------------------+------+----------------------------------------------------------------+
| TASK | ID | NAME | BASE PATH |
+-----------------------+----------------------------------+------+----------------------------------------------------------------+
| ACTION_CLASSIFICATION | Custom_Action_Classification_X3D | X3D | otx/algorithms/action/configs/classification/x3d/template.yaml |
+-----------------------+----------------------------------+------+----------------------------------------------------------------+
+-----------------------+--------------------------------------+---------+-----------------------------------------------------------------------+
| TASK | ID | NAME | BASE PATH |
+-----------------------+--------------------------------------+---------+-----------------------------------------------------------------------+
| ACTION_CLASSIFICATION | Custom_Action_Classification_X3D | X3D | ../otx/algorithms/action/configs/classification/x3d/template.yaml |
| ACTION_CLASSIFICATION | Custom_Action_Classification_MoViNet | MoViNet | ../otx/algorithms/action/configs/classification/movinet/template.yaml |
+-----------------------+--------------------------------------+---------+-----------------------------------------------------------------------+

All commands will be run on the X3D model. It's a light model, that achieves competitive accuracy while keeping the inference fast.

Expand Down Expand Up @@ -254,7 +254,7 @@ Optimization
*************

1. You can further optimize the model with ``otx optimize``.
Currently, only POT is supported for action classsification. NNCF will be supported in near future.
Currently, quantization jobs that include POT is supported for X3D template. MoViNet will be supported in near future.
Refer to :doc:`optimization explanation <../../../explanation/additional_features/models_optimization>` section for more details on model optimization.

2. Example command for optimizing
Expand All @@ -275,4 +275,4 @@ Keep in mind that POT will take some time (generally less than NNCF optimization
efficient model representation ready-to-use action classification model.

The following tutorials provide further steps on how to :doc:`deploy <../deploy>` and use your model in the :doc:`demonstration mode <../demo>` and visualize results.
The examples are provided with an object detection model, but it is easy to apply them for action classification by substituting the object detection model with classification one.
The examples are provided with an object detection model, but it is easy to apply them for action classification by substituting the object detection model with classification one.
Original file line number Diff line number Diff line change
Expand Up @@ -153,3 +153,74 @@ We will get a similar to this validation output after some validation time (abou
.. note::

Currently we don't support export and optimize task in action detection. We will support these features very near future.


*********
Export
*********

1. ``otx export`` exports a trained Pytorch `.pth` model to the OpenVINO™ Intermediate Representation (IR) format.
It allows running the model on the Intel hardware much more efficiently, especially on the CPU. Also, the resulting IR model is required to run POT optimization. IR model consists of two files: ``openvino.xml`` for weights and ``openvino.bin`` for architecture.

2. Run the command line below to export the trained model
and save the exported model to the ``openvino_models`` folder.

.. code-block::

(otx) ...$ otx export

2023-03-24 15:03:35,993 - mmdeploy - INFO - Export PyTorch model to ONNX: /tmp/OTX-task-ffw8llin/openvino.onnx.
2023-03-24 15:03:44,450 - mmdeploy - INFO - Args for Model Optimizer: mo --input_model="/tmp/OTX-task-ffw8llin/openvino.onnx" --output_dir="/tmp/OTX-task-ffw8llin/" --output="bboxes,labels" --input="input" --input_shape="[1, 3, 32, 256, 256]" --mean_values="[123.675, 116.28, 103.53]" --scale_values="[58.395, 57.12, 57.375]" --source_layout=bctwh
2023-03-24 15:03:46,707 - mmdeploy - INFO - [ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/OTX-task-ffw8llin/openvino.xml
[ SUCCESS ] BIN file: /tmp/OTX-task-ffw8llin/openvino.bin

2023-03-24 15:03:46,707 - mmdeploy - INFO - Successfully exported OpenVINO model: /tmp/OTX-task-ffw8llin/openvino.xml
2023-03-24 15:03:46,756 - mmaction - INFO - Exporting completed


3. Check the accuracy of the IR model and the consistency between the exported model and the PyTorch model,
using ``otx eval`` and passing the IR model path to the ``--load-weights`` parameter.

.. code-block::

(otx) ...$ otx eval --test-data-roots ../data/JHMDB_5%/test \
--load-weights model-exported/openvino.xml \
--save-performance model-exported/performance.json

...

Performance(score: 0.0, dashboard: (3 metric groups))

.. note::

Unfortunately, openvino has trouble in export from ONNX file, which comes from torch 1.13.
You can get proper openvino IR when you downgrade torch version to 1.12.1 when exporting.


*************
Optimization
*************

1. You can further optimize the model with ``otx optimize``.
Currently, only POT is supported for action detection. NNCF will be supported in near future.
Refer to :doc:`optimization explanation <../../../explanation/additional_features/models_optimization>` section for more details on model optimization.

2. Example command for optimizing
OpenVINO™ model (.xml) with OpenVINO™ POT.

.. code-block::

(otx) ...$ otx optimize --load-weights openvino_models/openvino.xml \
--save-model-to pot_model

...

Performance(score: 0.0, dashboard: (3 metric groups))

Keep in mind that POT will take some time (generally less than NNCF optimization) without logging to optimize the model.

3. Now, you have fully trained, optimized and exported an
efficient model representation ready-to-use action detection model.