Gesture recognition demo

mpashchenkov · Nov 22, 2021 · 1aefc46 · 1aefc46
1 parent 8441861
commit 1aefc46
Show file tree

Hide file tree

Showing 17 changed files with 1,706 additions and 2 deletions.
diff --git a/demos/README.md b/demos/README.md
@@ -23,8 +23,9 @@ The Open Model Zoo includes the following demos:
 - [Face Recognition Python\* Demo](./face_recognition_demo/python/README.md) - The interactive face recognition demo.
 - [Formula Recognition Python\* Demo](./formula_recognition_demo/python/README.md) - The demo demonstrates how to run Im2latex formula recognition models and recognize latex formulas.
 - [Gaze Estimation C++ Demo](./gaze_estimation_demo/cpp/README.md) - Face detection followed by gaze estimation, head pose estimation and facial landmarks regression.
-- [Gaze Estimation C++ G-API Demo](./gaze_estimation_demo/cpp_gapi/README.md) - Face detection followed by gaze estimation, head pose estimation and facial landmarks regression. G-API version.
+- [Gaze Estimation C++ G-API\* Demo](./gaze_estimation_demo/cpp_gapi/README.md) - Face detection followed by gaze estimation, head pose estimation and facial landmarks regression. G-API version.
 - [Gesture Recognition Python\* Demo](./gesture_recognition_demo/python/README.md) - Demo application for Gesture Recognition algorithm (e.g. American Sign Language gestures), which classifies gesture actions that are being performed on input video.
+- [Gesture Recognition C++ G-API\* Demo](./gesture_recognition_demo/cpp_gapi/README.md) - Demo application for Gesture Recognition algorithm (e.g. American Sign Language gestures), which classifies gesture actions that are being performed on input video. G-API version.
 - [GPT-2 Text Prediction Python\* Demo](./gpt2_text_prediction_demo/python/README.md) - GPT-2 text prediction demo.
 - [Handwritten Text Recognition Python\* Demo](./handwritten_text_recognition_demo/python/README.md) - The demo demonstrates how to run Handwritten Japanese Recognition models and Handwritten Simplified Chinese Recognition models.
 - [Human Pose Estimation C++ Demo](./human_pose_estimation_demo/cpp/README.md) - Human pose estimation demo.
@@ -37,7 +38,7 @@ The Open Model Zoo includes the following demos:
 - [Image Translation Python\* Demo](./image_translation_demo/python/README.md) - Demo application to synthesize a photo-realistic image based on exemplar image.
 - [Instance Segmentation Python\* Demo](./instance_segmentation_demo/python/README.md) - Inference of instance segmentation networks trained in `Detectron` or `maskrcnn-benchmark`.
 - [Interactive Face Detection C++ Demo](./interactive_face_detection_demo/cpp/README.md) - Face Detection coupled with Age/Gender, Head-Pose, Emotion, and Facial Landmarks detectors. Supports video and camera inputs.
-- [Interactive Face Detection G-API Demo](./interactive_face_detection_demo/cpp_gapi/README.md) - G-API based Face Detection coupled with Age/Gender, Head-Pose, Emotion, and Facial Landmarks detectors. Supports video and camera inputs.
+- [Interactive Face Detection G-API\* Demo](./interactive_face_detection_demo/cpp_gapi/README.md) - G-API based Face Detection coupled with Age/Gender, Head-Pose, Emotion, and Facial Landmarks detectors. Supports video and camera inputs.
 - [Machine Translation Python\* Demo](./machine_translation_demo/python/README.md) - The demo demonstrates how to run non-autoregressive machine translation models.
 - [Mask R-CNN C++ Demo for TensorFlow\* Object Detection API](./mask_rcnn_demo/cpp/README.md) - Inference of instance segmentation networks created with TensorFlow\* Object Detection API.
 - [Monodepth Python\* Demo](./monodepth_demo/python/README.md) - The demo demonstrates how to run monocular depth estimation models.

diff --git a/demos/gesture_recognition_demo/cpp_gapi/CMakeLists.txt b/demos/gesture_recognition_demo/cpp_gapi/CMakeLists.txt
@@ -0,0 +1,12 @@
+# Copyright (C) 2021 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+#
+
+file(GLOB_RECURSE SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
+file(GLOB_RECURSE HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp)
+
+add_demo(NAME gesture_recognition_demo_gapi
+    SOURCES ${SOURCES}
+    HEADERS ${HEADERS}
+    INCLUDE_DIRECTORIES "${CMAKE_CURRENT_SOURCE_DIR}/include"
+    DEPENDENCIES monitors)
diff --git a/demos/gesture_recognition_demo/cpp_gapi/README.md b/demos/gesture_recognition_demo/cpp_gapi/README.md
@@ -0,0 +1,125 @@
+# G-API Gesture Recognition Demo
+
+This demo demonstrates how to run Gesture (e.g. American Sign Language (ASL) gestures) Recognition models using OpenVINO&trade; toolkit.
+
+## How It Works
+
+The demo application expects a gesture recognition model in the Intermediate Representation (IR) format.
+
+As input, the demo application takes:
+
+* a path to a video file or a device node of a webcam specified with a command line argument `--input`
+* a path to a file in JSON format with gesture class names `--class_map`
+
+The demo workflow is the following:
+
+1. The demo application reads video frames one by one, runs person detector that extracts ROI, tracks the ROI of very first person. Additional process is used to prepare the batch of frames with constant framerate.
+2. Batch of frames and extracted ROI are passed to artificial neural network that predicts the gesture.
+3. The app visualizes results of its work as graphical window where following objects are shown:
+    - Input frame with detected ROI.
+    - Last recognized gesture.
+    - Performance characteristics.
+
+> **NOTE**: By default, Open Model Zoo demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the demo application or reconvert your model using the Model Optimizer tool with the `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](https://docs.openvino.ai/latest/openvino_docs_MO_DG_prepare_model_convert_model_Converting_Model.html#general-conversion-parameters).
+
+## Creating a Gallery for gestures window
+
+To show gestures on a additional window, the demo needs a gallery of reference videos. Each video should contain an appropriate name of gesture. You can create the gallery from an arbitrary list of images:
+
+1. Put videos containing gestures to a separate empty folder. Each video must have only one gesture.
+2. Run the `python3 <omz_dir>/demos/gesture_recognition_demo/cpp_gapi/create_list.py --classes_map <path_to_a_file_with_gesture_classes> --gesture_storage <path_to_directory_with_gesture_videos>` command, which will create a `gesture_gallery.json` file with list of gestures and paths to appropriate videos.
+
+## Preparing to Run
+
+For demo input image or video files, refer to the section **Media Files Available for Demos** in the [Open Model Zoo Demos Overview](../../README.md).
+The list of models supported by the demo is in `<omz_dir>/demos/gesture_recognition_demo/cpp_gapi/models.lst` file.
+This file can be used as a parameter for [Model Downloader](../../../tools/model_tools/README.md) and Converter to download and, if necessary, convert models to OpenVINO Inference Engine format (\*.xml + \*.bin).
+
+An example of using the Model Downloader:
+
+```sh
+omz_downloader --list models.lst
+```
+
+An example of using the Model Converter:
+
+```sh
+omz_converter --list models.lst
+```
+
+### Supported Models
+
+* asl-recognition-0004
+* common-sign-language-0001
+* common-sign-language-0002
+* person-detection-asl-0001
+
+> **NOTE**: Refer to the tables [Intel's Pre-Trained Models Device Support](../../../models/intel/device_support.md) and [Public Pre-Trained Models Device Support](../../../models/public/device_support.md) for the details on models inference support at different devices.
+
+## Running
+
+Running the application with the `-h` option yields the following usage message:
+
+```
+InferenceEngine:
+    API version ............ <version>
+    Build .................. <number>
+
+gesture_recognition_demo_gapi [OPTION]
+
+Options:
+    -h                       Show this help message and exit.
+    -m_a                     Required. Path to an .xml file with a trained gesture recognition model.
+    -m_d                     Required. Path to an .xml file with a trained person detector model.
+    -i                       Required. Path to a video file or a device node of a webcam.
+    -o                       Optional. Name of the output file(s) to save.
+    -limit                   Optional. Number of frames to store in output. If -1 is set, all frames are stored.
+    -c                       Required. Path to a file with gesture classes.
+    -s                       Optional. Path to a directory with video samples of gestures.
+    -t                       Optional. Threshold for the predicted score of an action.
+    -d_d "<device>"          Optional. Target device for Person Detection network (the list of available devices is shown below).
+    -d_a "<device>"          Optional. Target device for Gesture Recognition (the list of available devices is shown below).
+    -no_show                 Optional. Don't show output.
+    -u                       Optional. List of monitors to show initially.
+```
+
+Running the application with an empty list of options yields an error message.
+
+For example, to do inference on a CPU, run the following command:
+
+```sh
+./gesture_recognition_demo_gapi -m_a <path_to_model>/asl-recognition-0004.xml \
+    -m_d <path_to_model>/person-detection-asl-0001.xml \
+    -i 0 \
+    -c <omz_dir>/data/dataset_classes/msasl100.json
+```
+
+### Run-Time Control Keys
+
+The demo starts in person tracking mode and to switch it in the action recognition mode you should press `0-9` button with appropriate detection ID (the number in top-left of each bounding box). If frame contains only one person, they will be chosen automatically. After that you can switch back to tracking mode by pressing space button.
+
+An example of file with class names can be found within the OMZ directory:
+
+* MS-ASL-100: `<omz_dir>/data/dataset_classes/msasl100.json`
+* Jester-27: `<omz_dir>/data/dataset_classes/jester27.json`
+* Common-Sign-Language-12: `<omz_dir>/data/dataset_classes/common_sign_language12.json`
+
+ **NOTE**: To run the demo application with video examples of gestures specify the `-s` key with valid path to the directory with video samples. The name of each video sample should be the valid name of gesture from `<omz_dir>/data/dataset_classes/msasl100.json` file. To navigate between samples use 'f' and 'b' keys for iterating next and previous video sample, respectively.
+
+You can save processed results to a Motion JPEG AVI file or separate JPEG or PNG files using the `-o` option:
+
+* To save processed results in an AVI file, specify the name of the output file with `avi` extension, for example: `-o output.avi`.
+* To save processed results as images, specify the template name of the output image file with `jpg` or `png` extension, for example: `-o output_%03d.jpg`. The actual file names are constructed from the template at runtime by replacing regular expression `%03d` with the frame number, resulting in the following: `output_000.jpg`, `output_001.jpg`, and so on.
+To avoid disk space overrun in case of continuous input stream, like camera, you can limit the amount of data stored in the output file(s) with the `limit` option. The default value is 1000. To change it, you can apply the `-limit N` option, where `N` is the number of frames to store.
+
+>**NOTE**: Windows* systems may not have the Motion JPEG codec installed by default. If this is the case, OpenCV FFMPEG backend can be downloaded by the PowerShell script provided with the OpenVINO install package and located at `<INSTALL_DIR>/opencv/ffmpeg-download.ps1`. Run the script with Administrative privileges. Alternatively, you can save results as images.
+
+## Demo Output
+
+The application uses OpenCV to display gesture recognition result and current inference performance.
+
+## See Also
+
+* [Open Model Zoo Demos](../../README.md)
+* [Model Optimizer](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html)
+* [Model Downloader](../../../tools/model_tools/README.md)
diff --git a/demos/gesture_recognition_demo/cpp_gapi/create_list.py b/demos/gesture_recognition_demo/cpp_gapi/create_list.py
@@ -0,0 +1,57 @@
+#!/usr/bin/env python3
+'''
+ Copyright (C) 2021 Intel Corporation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+'''
+
+import glob
+import os
+import json
+import argparse
+
+parser = argparse.ArgumentParser(description='This is')
+
+parser.add_argument('--gesture_storage',
+                    help='Path to the gesture directory')
+
+parser.add_argument('--classes_map',
+                    help='Path to the classes file')
+
+args = parser.parse_args()
+
+with open(args.classes_map) as json_file:
+    data = json.load(json_file)
+
+dir = args.gesture_storage
+files_list = []
+for name in data:
+    list = glob.glob(dir + name + '.mp4') + glob.glob(dir + name + '.avi')
+    if len(list):
+        files_list.append(list[0])
+
+labels = []
+objects = {}
+
+for file in files_list:
+    label = file.rpartition(os.sep)[2].rpartition('.')[0]
+    path = os.path.abspath(file)
+
+    if label in labels:
+        raise Exception('An item with the label {} already exists in the gallery!'.format(label))
+    else:
+        labels.append(label)
+        objects[label] = [path]
+
+with open('gesture_gallery.json', 'w') as outfile:
+    json.dump(objects, outfile, indent=4)
diff --git a/demos/gesture_recognition_demo/cpp_gapi/gesture_recognition_demo_gapi.hpp b/demos/gesture_recognition_demo/cpp_gapi/gesture_recognition_demo_gapi.hpp
@@ -0,0 +1,64 @@
+// Copyright (C) 2021 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+///////////////////////////////////////////////////////////////////////////////////////////////////
+#pragma once
+
+#include <gflags/gflags.h>
+#include <utils/default_flags.hpp>
+
+DEFINE_INPUT_FLAGS
+DEFINE_OUTPUT_FLAGS
+
+static const char help_message[] = "Print a usage message.";
+static const char camera_resolution_message[] = "Optional. Set camera resolution in format WxH.";
+static const char person_detection_model_message[] = "Required. Path to an .xml file with a trained person detector model.";
+static const char action_recognition_model_message[] = "Required. Path to an .xml file with a trained gesture recognition model.";
+static const char target_device_message_d[] = "Optional. Target device for Person Detection network. "
+                                               "The demo will look for a suitable plugin for a specified device. Default value is \"CPU\".";
+static const char target_device_message_a[] = "Optional. Target device for Action Recognition network. "
+                                               "The demo will look for a suitable plugin for a specified device. Default value is \"CPU\".";
+static const char thresh_output_message[] = "Optional. Threshold for the predicted score of an action. The default value is 0.4.";
+static const char class_map_message[] = "Required. Path to a file with gesture classes.";
+static const char samples_dir_message[] = "Optional. Path to a .json file that contains paths to samples of gestures.";
+static const char no_show_message[] = "Optional. Don't show output.";
+static const char utilization_monitors_message[] = "Optional. List of monitors to show initially.";
+
+DEFINE_bool(h, false, help_message);
+DEFINE_string(res, "1280x720", camera_resolution_message);
+DEFINE_string(m_a, "", action_recognition_model_message);
+DEFINE_string(m_d, "", person_detection_model_message);
+DEFINE_string(d_a, "CPU", target_device_message_a);
+DEFINE_string(d_d, "CPU", target_device_message_d);
+DEFINE_string(c, "", class_map_message);
+DEFINE_string(s, "", samples_dir_message);
+DEFINE_double(t, 0.8, thresh_output_message);
+DEFINE_bool(no_show, false, no_show_message);
+DEFINE_string(u, "", utilization_monitors_message);
+
+/**
+* \brief This function shows a help message
+*/
+
+static void showUsage() {
+    std::cout << std::endl;
+    std::cout << "gesture_recognition_demo_gapi [OPTION]" << std::endl;
+    std::cout << "Options:" << std::endl;
+    std::cout << std::endl;
+    std::cout << "    -h                       " << help_message << std::endl;
+    std::cout << "    -i                       " << input_message << std::endl;
+    std::cout << "    -loop                    " << loop_message << std::endl;
+    std::cout << "    -o \"<path>\"              " << output_message << std::endl;
+    std::cout << "    -limit \"<num>\"           " << limit_message << std::endl;
+    std::cout << "    -res \"<WxH>\"             " << camera_resolution_message << std::endl;
+    std::cout << "    -m_d \"<path>\"           " << person_detection_model_message << std::endl;
+    std::cout << "    -m_a \"<path>\"           " << action_recognition_model_message << std::endl;
+    std::cout << "    -d_d \"<device>\"         " << target_device_message_d << std::endl;
+    std::cout << "    -d_a \"<device>\"         " << target_device_message_a << std::endl;
+    std::cout << "    -no_show                 " << no_show_message << std::endl;
+    std::cout << "    -c                       " << class_map_message << std::endl;
+    std::cout << "    -s                       " << samples_dir_message << std::endl;
+    std::cout << "    -t                       " << thresh_output_message << std::endl;
+    std::cout << "    -u                       " << utilization_monitors_message << std::endl;
+}
diff --git a/demos/gesture_recognition_demo/cpp_gapi/include/custom_kernels.hpp b/demos/gesture_recognition_demo/cpp_gapi/include/custom_kernels.hpp
@@ -0,0 +1,72 @@
+// Copyright (C) 2021 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <opencv2/gapi.hpp>
+#include <opencv2/gapi/cpu/gcpukernel.hpp>
+#include <opencv2/gapi/gkernel.hpp>
+#include <atomic>
+
+#include "tracker.hpp"
+
+static std::atomic<size_t> current_person_id{ 0 };
+
+namespace custom {
+G_API_OP(GetFastFrame,
+         <cv::GMat(cv::GArray<cv::GMat>, cv::Size)>, "custom.get_fast_frame") {
+    static cv::GMatDesc outMeta(const cv::GArrayDesc &in,
+                                const cv::Size& frame_size) {
+        return cv::GMatDesc{CV_8U, 3, frame_size};
+    }
+};
+
+G_API_OP(ExtractBoundingBox,
+         <cv::GArray<TrackedObject>(cv::GMat,
+                                    cv::GMat,
+                                    cv::Scalar)>,
+         "custom.bb_extract") {
+    static cv::GArrayDesc outMeta(const cv::GMatDesc &in,
+                                  const cv::GMatDesc&,
+                                  const cv::Scalar) {
+        return cv::empty_array_desc();
+    }
+};
+
+G_API_OP(TrackPerson,
+         <cv::GArray<TrackedObject>(cv::GMat,
+                                    cv::GArray<TrackedObject>)>,
+         "custom.track") {
+    static cv::GArrayDesc outMeta(const cv::GMatDesc &in,
+                                  const cv::GArrayDesc&) {
+        return cv::empty_array_desc();
+    }
+};
+
+G_API_OP(ConstructClip,
+         <cv::GArray<cv::GMat>(const cv::GArray<cv::GMat>,
+                               const cv::GArray<TrackedObject>,
+                               const cv::Scalar,
+                               const cv::Size)>,
+         "custom.construct_clip") {
+    static cv::GArrayDesc outMeta(const cv::GArrayDesc&,
+                                  const cv::GArrayDesc&,
+                                  const cv::Scalar&,
+                                  const cv::Size&) {
+        return cv::empty_array_desc();
+    }
+};
+
+G_API_OP(GestureRecognitionPostprocessing,
+         <cv::GOpaque<int>(cv::GArray<cv::GMat>,
+                           float)>,
+         "custom.ar_postproc") {
+    static cv::GOpaqueDesc outMeta(const cv::GArrayDesc&,
+                                   const float) {
+        return cv::empty_gopaque_desc();
+    }
+};
+
+cv::gapi::GKernelPackage kernels();
+} // namespace custom