openvinotoolkit · eaidova · Feb 9, 2021 · Jan 26, 2021 · Jan 26, 2021 · Feb 1, 2021
diff --git a/ci/prepare-documentation.py b/ci/prepare-documentation.py
@@ -70,6 +70,7 @@
         'machine_translation',
         'monocular_depth_estimation',
         'optical_character_recognition',
+        'place_recognition',
         'question_answering',
         'semantic_segmentation',
         'sound_classification',

diff --git a/data/dataset_definitions.yml b/data/dataset_definitions.yml
@@ -1151,3 +1151,10 @@ datasets:
     data_source: Cityscapes/data
     annotation: cityscapes.pickle
     dataset_meta: cityscapes.json
+
+  - name: pitts30k_val
+    data_source: pitts250k
+    annotation_conversion:
+      converter: place_recognition
+      split_file: pitts250k/datasets/pitts30k_val.mat
+    annotation: pitts30k_val.pickle
diff --git a/demos/place_recognition_demo/python/README.md b/demos/place_recognition_demo/python/README.md
@@ -0,0 +1,93 @@
+# Place Recognition Python\* Demo
+
+This demo demonstrates how to run Place Recognition models using OpenVINO&trade;.
+
+> **NOTE**: Only batch size of 1 is supported.
+
+## How It Works
+
+The demo application expects a place recognition model in the Intermediate Representation (IR) format.
+
+As input, the demo application takes:
+* a path to an image
+* a path to a folder with images
+* a path to a video file or a device node of a web-camera
+
+The demo workflow is the following:
+
+1. The demo application reads input frames.
+2. Extracted input frame is passed to artificial neural network that computes embedding vector.
+3. Then the demo application searches computed embedding in gallery of images in order to determine which image in the gallery is the most similar to what one can see on frame.
+4. The app visualizes results of it work as graphical window where following objects are shown.
+    - Input frame.
+    - Top-10 most similar images from the gallery.
+    - Performance characteristics.
+
+> **NOTE**: By default, Open Model Zoo demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html).
+
+## Running
+
+Run the application with the `-h` option to see the following usage message:
+
+```
+usage: place_recognition_demo.py [-h] -m MODEL -i INPUT -gf GALLERY_FOLDER
+                                 [--gallery_size GALLERY_SIZE] [--loop]
+                                 [-o OUTPUT] [-limit OUTPUT_LIMIT] [-d DEVICE]
+                                 [-l CPU_EXTENSION] [--no_show]
+                                 [-u UTILIZATION_MONITORS]
+
+Options:
+  -h, --help            Show this help message and exit.
+  -m MODEL, --model MODEL
+                        Required. Path to an .xml file with a trained model.
+  -i INPUT, --input INPUT
+                        Required. An input to process. The input must be a
+                        single image, a folder of images, video file or camera
+                        id.
+  -gf GALLERY_FOLDER, --gallery_folder GALLERY_FOLDER
+                        Required. Path to a folder with images in the gallery.
+  --gallery_size GALLERY_SIZE
+                        Optional. Number of images from the gallery used for
+                        processing
+  --loop                Optional. Enable reading the input in a loop.
+  -o OUTPUT, --output OUTPUT
+                        Optional. Name of output to save.
+  -limit OUTPUT_LIMIT, --output_limit OUTPUT_LIMIT
+                        Optional. Number of frames to store in output. If -1
+                        is set, all frames are stored.
+  -d DEVICE, --device DEVICE
+                        Optional. Specify the target device to infer on: CPU,
+                        GPU, FPGA, HDDL or MYRIAD. The demo will look for a
+                        suitable plugin for device specified (by default, it
+                        is CPU).
+  -l CPU_EXTENSION, --cpu_extension CPU_EXTENSION
+                        Optional. Required for CPU custom layers. Absolute
+                        path to a shared library with the kernels
+                        implementations.
+  --no_show             Optional. Do not visualize inference results.
+  -u UTILIZATION_MONITORS, --utilization_monitors UTILIZATION_MONITORS
+                        Optional. List of monitors to show initially.
+```
+
+Running the application with an empty list of options yields the short version of the usage message and an error message.
+
+To run the demo, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](../../../tools/downloader/README.md). The list of models supported by the demo is in [models.lst](./models.lst).
+
+> **NOTE**: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (`*.xml` + `*.bin`) using the [Model Optimizer tool](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html).
+
+To run the demo, please provide paths to the model in the IR format, to directory with gallery images, and to an input video, image, or folder with images:
+```bash
+python place_recognition_demo.py \
+-m /home/user/netvlad-tf.xml \
+-i /home/user/image.jpg \
+-gf /home/user/gallery_folder
+```
+
+## Demo Output
+
+The application uses OpenCV to display gallery searching result and current inference performance.
+
+## See Also
+* [Using Open Model Zoo demos](../../README.md)
+* [Model Optimizer](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html)
+* [Model Downloader](../../../tools/downloader/README.md)
diff --git a/demos/place_recognition_demo/python/models.lst b/demos/place_recognition_demo/python/models.lst
@@ -0,0 +1,2 @@
+# This file can be used with the --list option of the model downloader.
+netvlad-tf
diff --git a/demos/place_recognition_demo/python/place_recognition_demo.py b/demos/place_recognition_demo/python/place_recognition_demo.py
@@ -0,0 +1,138 @@
+#!/usr/bin/env python3
+"""
+ Copyright (c) 2021 Intel Corporation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+"""
+
+import logging as log
+from pathlib import Path
+import sys
+import time
+from argparse import ArgumentParser, SUPPRESS
+
+import cv2
+import numpy as np
+
+from place_recognition_demo.place_recognition import PlaceRecognition
+from place_recognition_demo.visualizer import visualize
+
+sys.path.append(str(Path(__file__).resolve().parents[2] / 'common/python'))
+
+import monitors
+from images_capture import open_images_capture
+
+
+def build_argparser():
+    """ Returns argument parser. """
+
+    parser = ArgumentParser(add_help=False)
+    args = parser.add_argument_group('Options')
+    args.add_argument('-h', '--help', action='help', default=SUPPRESS,
+                      help='Show this help message and exit.')
+    args.add_argument('-m', '--model',
+                      help='Required. Path to an .xml file with a trained model.',
+                      required=True, type=Path)
+    args.add_argument('-i', '--input', required=True,
+                      help='Required. An input to process. The input must be a single image, '
+                           'a folder of images, video file or camera id.')
+    args.add_argument('-gf', '--gallery_folder',
+                      help='Required. Path to a folder with images in the gallery.',
+                      required=True, type=Path)
+    args.add_argument('--gallery_size', required=False, type=int,
+                      help='Optional. Number of images from the gallery used for processing')
+    args.add_argument('--loop', default=False, action='store_true',
+                      help='Optional. Enable reading the input in a loop.')
+    args.add_argument('-o', '--output', required=False,
+                      help='Optional. Name of output to save.')
+    args.add_argument('-limit', '--output_limit', required=False, default=1000, type=int,
+                      help='Optional. Number of frames to store in output. '
+                           'If -1 is set, all frames are stored.')
+    args.add_argument('-d', '--device',
+                      help='Optional. Specify the target device to infer on: CPU, GPU, FPGA, HDDL '
+                           'or MYRIAD. The demo will look for a suitable plugin for device '
+                           'specified (by default, it is CPU).',
+                      default='CPU', type=str)
+    args.add_argument("-l", "--cpu_extension",
+                      help="Optional. Required for CPU custom layers. Absolute path to "
+                           "a shared library with the kernels implementations.", type=str,
+                      default=None)
+    args.add_argument('--no_show', action='store_true',
+                      help='Optional. Do not visualize inference results.')
+    args.add_argument('-u', '--utilization_monitors', default='', type=str,
+                      help='Optional. List of monitors to show initially.')
+    return parser
+
+
+def time_elapsed(func, *args):
+    """ Auxiliary function that helps to measure elapsed time. """
+
+    start_time = time.perf_counter()
+    res = func(*args)
+    elapsed = time.perf_counter() - start_time
+    return elapsed, res
+
+
+def main():
+    """ Main function. """
+
+    log.basicConfig(format='[ %(levelname)s ] %(message)s', level=log.INFO, stream=sys.stdout)
+    args = build_argparser().parse_args()
+
+    place_recognition = PlaceRecognition(args.model, args.device, args.gallery_folder, args.cpu_extension,
+                                         args.gallery_size)
+
+    cap = open_images_capture(args.input, args.loop)
+
+    compute_embeddings_times = []
+    search_in_gallery_times = []
+
+    frames_processed = 0
+    presenter = monitors.Presenter(args.utilization_monitors, 0)
+    video_writer = cv2.VideoWriter()
+
+    while True:
+        frame = cap.read()
+
+        if frame is None:
+            if frames_processed == 0:
+                raise ValueError("Can't read an image from the input")
+            break
+
+        elapsed, probe_embedding = time_elapsed(place_recognition.compute_embedding, frame)
+        compute_embeddings_times.append(elapsed)
+
+        elapsed, (sorted_indexes, distances) = time_elapsed(place_recognition.search_in_gallery, probe_embedding)
+        search_in_gallery_times.append(elapsed)
+
+        image, key = visualize(frame, [str(place_recognition.impaths[i]) for i in sorted_indexes],
+                               distances[sorted_indexes], place_recognition.input_size,
+                               np.mean(compute_embeddings_times), np.mean(search_in_gallery_times),
+                               imshow_delay=3, presenter=presenter, no_show=args.no_show)
+
+        if args.output and not video_writer.open(args.output, cv2.VideoWriter_fourcc(*'MJPG'), cap.fps(),
+                                                 (image.shape[1], image.shape[0])):
+            raise RuntimeError("Can't open video writer")
+
+        frames_processed += 1
+        if video_writer.isOpened() and (args.output_limit <= 0 or frames_processed <= args.output_limit):
+            video_writer.write(image)
+
+        if key == 27:
+            break
+
+    print(presenter.reportMeans())
+
+
+if __name__ == '__main__':
+    sys.exit(main() or 0)
diff --git a/demos/place_recognition_demo/python/place_recognition_demo/__init__.py b/demos/place_recognition_demo/python/place_recognition_demo/__init__.py
diff --git a/demos/place_recognition_demo/python/place_recognition_demo/common.py b/demos/place_recognition_demo/python/place_recognition_demo/common.py
@@ -0,0 +1,40 @@
+"""
+ Copyright (c) 2021 Intel Corporation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+"""
+
+import numpy as np
+import cv2
+
+
+def max_central_square_crop(image):
+    ''' Makes max-sized central squared crop. '''
+
+    height, width = image.shape[:2]
+
+    if width > height:
+        image = image[:, (width - height) // 2:(width - height) // 2 + height]
+    else:
+        image = image[(height - width) // 2:(height - width) // 2 + width, :]
+
+    return image
+
+
+def crop_resize(image, input_size):
+    ''' Makes max-sized central squared crop and resize to input_size '''
+
+    image = max_central_square_crop(image)
+    image = cv2.resize(image, (input_size[1], input_size[0]))
+    image = np.expand_dims(image, axis=0)
+    return image
diff --git a/demos/place_recognition_demo/python/place_recognition_demo/place_recognition.py b/demos/place_recognition_demo/python/place_recognition_demo/place_recognition.py
@@ -0,0 +1,89 @@
+"""
+ Copyright (c) 2021 Intel Corporation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+"""
+
+import numpy as np
+
+import cv2
+from tqdm import tqdm
+
+from place_recognition_demo.common import crop_resize
+
+from openvino.inference_engine import IECore # pylint: disable=no-name-in-module
+
+
+class IEModel: # pylint: disable=too-few-public-methods
+    """ Class that allows working with Inference Engine model. """
+
+    def __init__(self, model_path, device, cpu_extension):
+        ie = IECore()
+        if cpu_extension and device == 'CPU':
+            ie.add_extension(cpu_extension, 'CPU')
+
+        self.net = ie.read_network(model_path, model_path.with_suffix('.bin'))
+        self.input_name = next(iter(self.net.input_info))
+        self.output_name = next(iter(self.net.outputs))
+        self.input_size = self.net.input_info[self.input_name].input_data.shape
+        self.exec_net = ie.load_network(network=self.net, device_name=device)
+
+    def predict(self, image):
+        ''' Takes input image and returns L2-normalized embedding vector. '''
+
+        assert len(image.shape) == 4
+        image = np.transpose(image, (0, 3, 1, 2))
+        out = self.exec_net.infer(inputs={self.input_name: image})[self.output_name]
+        return out
+
+
+class PlaceRecognition:
+    """ Class representing Place Recognition algorithm. """
+
+    def __init__(self, model_path, device, gallery_path, cpu_extension, gallery_size):
+        self.impaths = (list(gallery_path.rglob("*.jpg")))[:gallery_size or None]
+        self.model = IEModel(model_path, device, cpu_extension)
+        self.input_size = self.model.input_size[2:]
+        self.embeddings = self.compute_gallery_embeddings()
+
+    def compute_embedding(self, image):
+        ''' Takes input image and computes embedding vector. '''
+
+        image = crop_resize(image, self.input_size)
+        embedding = self.model.predict(image)
+        return embedding
+
+    def search_in_gallery(self, embedding):
+        ''' Takes input embedding vector and searches it in the gallery. '''
+
+        distances = np.linalg.norm(embedding - self.embeddings, axis=1, ord=2)
+        sorted_indexes = np.argsort(distances)
+        return sorted_indexes, distances
+
+    def compute_gallery_embeddings(self):
+        ''' Computes embedding vectors for the gallery images. '''
+
+        images = []
+
+        for full_path in tqdm(self.impaths, desc='Reading gallery images.'):
+            image = cv2.imread(str(full_path))
+            if image is None:
+                print("ERROR: cannot process image, full_path =", str(full_path))
+                continue
+            image = crop_resize(image, self.input_size)
+            images.append(image)
+
+        embeddings = np.vstack([self.model.predict(image) for image in tqdm(
+            images, desc='Computing embeddings of gallery images.')])
+
+        return embeddings
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# This file can be used with the --list option of the model downloader.
		netvlad-tf