Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added NetVLAD model #2021

Merged
merged 6 commits into from
Feb 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/prepare-documentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@
'machine_translation',
'monocular_depth_estimation',
'optical_character_recognition',
'place_recognition',
'question_answering',
'semantic_segmentation',
'sound_classification',
Expand Down
7 changes: 7 additions & 0 deletions data/dataset_definitions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1151,3 +1151,10 @@ datasets:
data_source: Cityscapes/data
annotation: cityscapes.pickle
dataset_meta: cityscapes.json

- name: pitts30k_val
data_source: pitts250k
annotation_conversion:
converter: place_recognition
split_file: pitts250k/datasets/pitts30k_val.mat
eaidova marked this conversation as resolved.
Show resolved Hide resolved
annotation: pitts30k_val.pickle
93 changes: 93 additions & 0 deletions demos/place_recognition_demo/python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Place Recognition Python\* Demo
IRDonch marked this conversation as resolved.
Show resolved Hide resolved

This demo demonstrates how to run Place Recognition models using OpenVINO™.

> **NOTE**: Only batch size of 1 is supported.

## How It Works

The demo application expects a place recognition model in the Intermediate Representation (IR) format.

As input, the demo application takes:
* a path to an image
* a path to a folder with images
* a path to a video file or a device node of a web-camera

The demo workflow is the following:

1. The demo application reads input frames.
2. Extracted input frame is passed to artificial neural network that computes embedding vector.
3. Then the demo application searches computed embedding in gallery of images in order to determine which image in the gallery is the most similar to what one can see on frame.
4. The app visualizes results of it work as graphical window where following objects are shown.
- Input frame.
- Top-10 most similar images from the gallery.
- Performance characteristics.

> **NOTE**: By default, Open Model Zoo demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html).

## Running

Run the application with the `-h` option to see the following usage message:

```
usage: place_recognition_demo.py [-h] -m MODEL -i INPUT -gf GALLERY_FOLDER
[--gallery_size GALLERY_SIZE] [--loop]
[-o OUTPUT] [-limit OUTPUT_LIMIT] [-d DEVICE]
[-l CPU_EXTENSION] [--no_show]
[-u UTILIZATION_MONITORS]

Options:
-h, --help Show this help message and exit.
-m MODEL, --model MODEL
Required. Path to an .xml file with a trained model.
-i INPUT, --input INPUT
Required. An input to process. The input must be a
single image, a folder of images, video file or camera
id.
-gf GALLERY_FOLDER, --gallery_folder GALLERY_FOLDER
Required. Path to a folder with images in the gallery.
--gallery_size GALLERY_SIZE
Optional. Number of images from the gallery used for
processing
--loop Optional. Enable reading the input in a loop.
-o OUTPUT, --output OUTPUT
Optional. Name of output to save.
-limit OUTPUT_LIMIT, --output_limit OUTPUT_LIMIT
Optional. Number of frames to store in output. If -1
is set, all frames are stored.
-d DEVICE, --device DEVICE
Optional. Specify the target device to infer on: CPU,
GPU, FPGA, HDDL or MYRIAD. The demo will look for a
suitable plugin for device specified (by default, it
is CPU).
-l CPU_EXTENSION, --cpu_extension CPU_EXTENSION
Optional. Required for CPU custom layers. Absolute
path to a shared library with the kernels
implementations.
--no_show Optional. Do not visualize inference results.
-u UTILIZATION_MONITORS, --utilization_monitors UTILIZATION_MONITORS
Optional. List of monitors to show initially.
```

Running the application with an empty list of options yields the short version of the usage message and an error message.

To run the demo, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](../../../tools/downloader/README.md). The list of models supported by the demo is in [models.lst](./models.lst).

> **NOTE**: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (`*.xml` + `*.bin`) using the [Model Optimizer tool](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html).

To run the demo, please provide paths to the model in the IR format, to directory with gallery images, and to an input video, image, or folder with images:
```bash
python place_recognition_demo.py \
-m /home/user/netvlad-tf.xml \
-i /home/user/image.jpg \
-gf /home/user/gallery_folder
```

## Demo Output

The application uses OpenCV to display gallery searching result and current inference performance.

## See Also
* [Using Open Model Zoo demos](../../README.md)
* [Model Optimizer](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html)
* [Model Downloader](../../../tools/downloader/README.md)
2 changes: 2 additions & 0 deletions demos/place_recognition_demo/python/models.lst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# This file can be used with the --list option of the model downloader.
netvlad-tf
138 changes: 138 additions & 0 deletions demos/place_recognition_demo/python/place_recognition_demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
#!/usr/bin/env python3
"""
Copyright (c) 2021 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

import logging as log
from pathlib import Path
import sys
import time
from argparse import ArgumentParser, SUPPRESS

import cv2
import numpy as np

from place_recognition_demo.place_recognition import PlaceRecognition
from place_recognition_demo.visualizer import visualize

sys.path.append(str(Path(__file__).resolve().parents[2] / 'common/python'))

import monitors
from images_capture import open_images_capture


def build_argparser():
""" Returns argument parser. """

parser = ArgumentParser(add_help=False)
args = parser.add_argument_group('Options')
args.add_argument('-h', '--help', action='help', default=SUPPRESS,
help='Show this help message and exit.')
args.add_argument('-m', '--model',
help='Required. Path to an .xml file with a trained model.',
required=True, type=Path)
args.add_argument('-i', '--input', required=True,
help='Required. An input to process. The input must be a single image, '
'a folder of images, video file or camera id.')
args.add_argument('-gf', '--gallery_folder',
help='Required. Path to a folder with images in the gallery.',
required=True, type=Path)
args.add_argument('--gallery_size', required=False, type=int,
help='Optional. Number of images from the gallery used for processing')
args.add_argument('--loop', default=False, action='store_true',
help='Optional. Enable reading the input in a loop.')
args.add_argument('-o', '--output', required=False,
help='Optional. Name of output to save.')
args.add_argument('-limit', '--output_limit', required=False, default=1000, type=int,
help='Optional. Number of frames to store in output. '
'If -1 is set, all frames are stored.')
args.add_argument('-d', '--device',
help='Optional. Specify the target device to infer on: CPU, GPU, FPGA, HDDL '
'or MYRIAD. The demo will look for a suitable plugin for device '
'specified (by default, it is CPU).',
default='CPU', type=str)
args.add_argument("-l", "--cpu_extension",
help="Optional. Required for CPU custom layers. Absolute path to "
"a shared library with the kernels implementations.", type=str,
default=None)
args.add_argument('--no_show', action='store_true',
help='Optional. Do not visualize inference results.')
args.add_argument('-u', '--utilization_monitors', default='', type=str,
help='Optional. List of monitors to show initially.')
return parser


def time_elapsed(func, *args):
""" Auxiliary function that helps to measure elapsed time. """

start_time = time.perf_counter()
res = func(*args)
elapsed = time.perf_counter() - start_time
return elapsed, res


def main():
""" Main function. """

log.basicConfig(format='[ %(levelname)s ] %(message)s', level=log.INFO, stream=sys.stdout)
args = build_argparser().parse_args()

place_recognition = PlaceRecognition(args.model, args.device, args.gallery_folder, args.cpu_extension,
args.gallery_size)

cap = open_images_capture(args.input, args.loop)

compute_embeddings_times = []
search_in_gallery_times = []

frames_processed = 0
presenter = monitors.Presenter(args.utilization_monitors, 0)
video_writer = cv2.VideoWriter()

while True:
frame = cap.read()

if frame is None:
if frames_processed == 0:
raise ValueError("Can't read an image from the input")
break

elapsed, probe_embedding = time_elapsed(place_recognition.compute_embedding, frame)
compute_embeddings_times.append(elapsed)

elapsed, (sorted_indexes, distances) = time_elapsed(place_recognition.search_in_gallery, probe_embedding)
search_in_gallery_times.append(elapsed)

image, key = visualize(frame, [str(place_recognition.impaths[i]) for i in sorted_indexes],
distances[sorted_indexes], place_recognition.input_size,
np.mean(compute_embeddings_times), np.mean(search_in_gallery_times),
imshow_delay=3, presenter=presenter, no_show=args.no_show)

if args.output and not video_writer.open(args.output, cv2.VideoWriter_fourcc(*'MJPG'), cap.fps(),
(image.shape[1], image.shape[0])):
raise RuntimeError("Can't open video writer")

frames_processed += 1
if video_writer.isOpened() and (args.output_limit <= 0 or frames_processed <= args.output_limit):
video_writer.write(image)

if key == 27:
break

print(presenter.reportMeans())


if __name__ == '__main__':
sys.exit(main() or 0)
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
"""
Copyright (c) 2021 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

import numpy as np
import cv2


def max_central_square_crop(image):
''' Makes max-sized central squared crop. '''

height, width = image.shape[:2]

if width > height:
image = image[:, (width - height) // 2:(width - height) // 2 + height]
else:
image = image[(height - width) // 2:(height - width) // 2 + width, :]

return image


def crop_resize(image, input_size):
''' Makes max-sized central squared crop and resize to input_size '''

image = max_central_square_crop(image)
image = cv2.resize(image, (input_size[1], input_size[0]))
image = np.expand_dims(image, axis=0)
return image
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
"""
Copyright (c) 2021 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

import numpy as np

import cv2
from tqdm import tqdm

from place_recognition_demo.common import crop_resize

from openvino.inference_engine import IECore # pylint: disable=no-name-in-module


class IEModel: # pylint: disable=too-few-public-methods
""" Class that allows working with Inference Engine model. """

def __init__(self, model_path, device, cpu_extension):
ie = IECore()
if cpu_extension and device == 'CPU':
ie.add_extension(cpu_extension, 'CPU')

self.net = ie.read_network(model_path, model_path.with_suffix('.bin'))
self.input_name = next(iter(self.net.input_info))
self.output_name = next(iter(self.net.outputs))
self.input_size = self.net.input_info[self.input_name].input_data.shape
self.exec_net = ie.load_network(network=self.net, device_name=device)

def predict(self, image):
''' Takes input image and returns L2-normalized embedding vector. '''

assert len(image.shape) == 4
image = np.transpose(image, (0, 3, 1, 2))
out = self.exec_net.infer(inputs={self.input_name: image})[self.output_name]
return out


class PlaceRecognition:
""" Class representing Place Recognition algorithm. """

def __init__(self, model_path, device, gallery_path, cpu_extension, gallery_size):
self.impaths = (list(gallery_path.rglob("*.jpg")))[:gallery_size or None]
self.model = IEModel(model_path, device, cpu_extension)
self.input_size = self.model.input_size[2:]
self.embeddings = self.compute_gallery_embeddings()

def compute_embedding(self, image):
''' Takes input image and computes embedding vector. '''

image = crop_resize(image, self.input_size)
embedding = self.model.predict(image)
return embedding

def search_in_gallery(self, embedding):
''' Takes input embedding vector and searches it in the gallery. '''

distances = np.linalg.norm(embedding - self.embeddings, axis=1, ord=2)
sorted_indexes = np.argsort(distances)
return sorted_indexes, distances

def compute_gallery_embeddings(self):
''' Computes embedding vectors for the gallery images. '''

images = []

for full_path in tqdm(self.impaths, desc='Reading gallery images.'):
image = cv2.imread(str(full_path))
if image is None:
print("ERROR: cannot process image, full_path =", str(full_path))
continue
image = crop_resize(image, self.input_size)
eaidova marked this conversation as resolved.
Show resolved Hide resolved
images.append(image)

embeddings = np.vstack([self.model.predict(image) for image in tqdm(
images, desc='Computing embeddings of gallery images.')])

return embeddings
Loading